Atomic Data Operation Protocol
In Edustudio
, we view the dataset from three stages: rawdata
, middata
, cachedata
.
We treat the whole data processing as multiple atomic operations called atomic operation sequence.
The first atomic operation, inheriting the protocol class BaseRaw2Mid
, is the process from raw data to middle data.
The following atomic operations, inheriting the protocol class BaseMid2Cache
, construct the process from middle data to cache data.
Partial Atomic Operation Table
In the following, we give a table to display some existing atomic operations. For more detailed Atomic Operation Table, please see the user_guide/Atomic Data Operation List
Raw2Mid
For the conversion from rawdata to middata, we implement a specific atomic data operation prefixed with R2M
for each dataset.
name |
Corresponding datase |
---|---|
R2M_ASSIST_0910 |
ASSISTment 2009-2010 |
R2M_FrcSub |
Frcsub |
R2M_ASSIST_1213 |
ASSISTment 2012-2013 |
R2M_Math1 |
Math1 |
R2M_Math2 |
Math2 |
R2M_AAAI_2023 |
AAAI 2023 Global Knowledge Tracing Challenge |
R2M_Algebra_0506 |
Algebra 2005-2006 |
R2M_ASSIST_1516 |
ASSISTment 2015-2016 |
Mid2Cache
common
name |
description |
---|---|
M2C_Label2Int |
convert label column into discrete values |
M2C_MergeDividedSplits |
merge train/valid/test set into one dataframe |
M2C_ReMapId |
ReMap Column ID |
M2C_GenQMat |
Generate Q-matrix |
CD
name |
description |
---|---|
M2C_RandomDataSplit4CD |
Split datasets Randomly for CD |
M2C_FilterRecords4CD |
Filter students or exercises whose number of interaction records is less than a threshold |
KT
name |
description |
---|---|
M2C_BuildSeqInterFeats |
Build Sequential Features and Split dataset |
M2C_KCAsExer |
Treat knowledge concept as exercise |
M2C_GenKCSeq |
Generate knowledge concept seq |
M2C_GenUnFoldKCSeq |
Unfold knowledge concepts |