Atomic Data Operation Protocol
In Edustudio, we view the dataset from three stages: rawdata, middata, cachedata.
We treat the whole data processing as multiple atomic operations called atomic operation sequence.
The first atomic operation, inheriting the protocol class BaseRaw2Mid, is the process from raw data to middle data.
The following atomic operations, inheriting the protocol class BaseMid2Cache, construct the process from middle data to cache data.
Partial Atomic Operation Table
In the following, we give a table to display some existing atomic operations. For more detailed Atomic Operation Table, please see the user_guide/Atomic Data Operation List
Raw2Mid
For the conversion from rawdata to middata, we implement a specific atomic data operation prefixed with R2M for each dataset.
name |
Corresponding datase |
|---|---|
R2M_ASSIST_0910 |
ASSISTment 2009-2010 |
R2M_FrcSub |
Frcsub |
R2M_ASSIST_1213 |
ASSISTment 2012-2013 |
R2M_Math1 |
Math1 |
R2M_Math2 |
Math2 |
R2M_AAAI_2023 |
AAAI 2023 Global Knowledge Tracing Challenge |
R2M_Algebra_0506 |
Algebra 2005-2006 |
R2M_ASSIST_1516 |
ASSISTment 2015-2016 |
Mid2Cache
common
name |
description |
|---|---|
M2C_Label2Int |
convert label column into discrete values |
M2C_MergeDividedSplits |
merge train/valid/test set into one dataframe |
M2C_ReMapId |
ReMap Column ID |
M2C_GenQMat |
Generate Q-matrix |
CD
name |
description |
|---|---|
M2C_RandomDataSplit4CD |
Split datasets Randomly for CD |
M2C_FilterRecords4CD |
Filter students or exercises whose number of interaction records is less than a threshold |
KT
name |
description |
|---|---|
M2C_BuildSeqInterFeats |
Build Sequential Features and Split dataset |
M2C_KCAsExer |
Treat knowledge concept as exercise |
M2C_GenKCSeq |
Generate knowledge concept seq |
M2C_GenUnFoldKCSeq |
Unfold knowledge concepts |