Customize Data Template
There are three common basic templates: BaseDataTPL
, GeneralDataTPL
and EduDataTPL
. The GeneralDataTPL
inherits from BaseDataTPL
, supporting general functions such as DataCache
、Data Atomic Operation
、Multiple Folds Training
and DataFiles Protocol
, focusing on common interaction features. The EduDataTPL
inherits from GeneralDataTPL
, focusing on features of students and exercises on the basis of GeneralDataTPL
.
BaseDataTPL
BaseDataTPL
is affiliated with Basic Architecture
and provide basic data protocols. All data templates should inherit the BaseDataTPL
.
Usage Scenario
If a user is intended to abandon existing data processing method defined in GeneralDataTPL
, the customized data template inheriting from BaseDataTPL
is reasonable. The protocol in BaseDataTPL
is minimal, providing users with a wide range of playable space.
Protocols
Description based protocols
BaseDataTPL inherits from
torch.utils.data.Dataset
.
Variable based protocol
name |
description |
---|---|
default_cfg |
default configuration of data template |
self.datatpl_cfg |
configuration of data template |
self.evaltpl_cfg |
configuration of evaluate template |
self.traintpl_cfg |
configuration of training template |
self.modeltpl_cfg |
configuration of model template |
self.frame_cfg |
configuration of framework |
self.logger |
logger object |
Function based protocol
name |
description |
---|---|
from_cfg |
the entry point of create instance. |
_check_params |
check rationality of configuration |
get_extra_data |
return a dict object and the framework will pass this to model instance. |
_copy |
copy method of current instance |
get_default_cfg |
return default_cfg of current class and ancestral classes |
Use Case
The best use case of BaseDataTPL
is the GenetalDataTPL
. Please see below.
GeneralDataTPL
The GeneralDataTPL
inherits from BaseDataTPL
and is affiliated with Inherited Architecture
. It support general functions such as DataCache
、Data Atomic Operation
、Multiple Folds Training
and DataFiles Protocol
, focusing on common interaction features.
Usage Scenario
If new data template focuses on interaction features only and exploits existing functions (such as DataCache
), the customized data template inheriting from BaseDataTPL
is appropriate.
Protocols
Description based protocols
Data Cache, see the corresponding chapter
Data Atomic Operation, see the corresponding chapter
Data Files Protocol, see the corresponding chapter
Multi Folds Training
The data template inherits from
torch.utils.data.Dataset
in pytorch. In the GeneralDataTPL, we set a status for current data template.Status of fold_id: specify current data template is served to which fold, the
self.dict_main
stores the current interaction dataStatus of train/val/test/manager: specify current data template is served to which stage. The manager status is the initial status, and other status is a copied object of manager status.
Variable based protocol
name |
description |
---|---|
self.common_str2df |
The dictionary object read from files will be passed into the sequence of atomic operations |
self.hasValidDataset |
Under the train/val/test setting, determine if a validation set exists |
self.df |
If dataset is not divided, self.df will store the dataframe object from interaction csv file |
self.df_train |
If dataset is divided, self.df will store the dataframe object from training interaction csv file |
self.df_valid |
If dataset is divided, self.df will store the dataframe object from validation interaction csv file |
self.df_test |
If dataset is divided, self.df will store the dataframe object from test interaction csv file |
self.status |
Store current status of current template including fold_id and train/val/test status |
self.df_train_folds |
Store the dataframe object of training data of each fold |
self.df_valid_folds |
Store the dataframe object of validation data of each fold |
self.df_test_folds |
Store the dataframe object of test data of each fold |
self.dict_train_folds |
Store the dictionary object of training data of each fold |
self.dict_valid_folds |
Store the dictionary object of validation data of each fold |
self.dict_test_folds |
Store the dictionary object of test data of each fold |
self.dict_main |
store the dictionary object of current status (i.e., train/val/test) |
Function based protocol
name |
description |
---|---|
load_data |
load data files into python object |
process_data |
process middata |
build_datasets |
copy current data template into multiple dataset objects |
build_dataloaders |
build data loaders from multiple dataset objects |
save_cache |
save cache process |
check_cache |
check if the imported cache matches the current settings |
load_cache |
load cache process |
collate_fn |
collate function when build data loaders |
__len__ |
defined in pytorch, get the length of samples |
__getitem__ |
defined in pytorch, get the specific sample |
df2dict |
convert self.df_train/val/test_folds into self.dict_train/val/test_folds |
set_info_for_fold |
set current data object when a fold_id is specified |
Use Cases
The best use case of GeneralDataTPL
is the EduDataTPL
. Please see below.
EduDataTPL
The EduDataTPL
inherits from GeneralDataTPL
, focusing on features of students and exercises on the basis of GeneralDataTPL
.
Usage Scenarios
On the basis of GeneralDataTPL, the EduDataTPL
additionally considers the features of students and exercises. Following Atomic Files Protocol, EduDataTPL
will read .stu.csv
and .exer.csv
file to load the data.
Protocols
Description based protocols
The features of students and exercises would be treated as the extra data, which means that the extra data would be pass to model as mentioned in
BaseDataTPL
protocol.
Variable based protocol
name |
description |
---|---|
self.df_stu |
Store the dataframe object of student features |
self.df_exer |
Store the dataframe object of exercise features |
self.hasStuFeats |
Determine if student features exists |
self.hasExerFeats |
Determine if exercise features exists |
self.hasQmat |
Determine if Q-matrix exists |
Use Cases
The use cases of EduDataTPL
please see other data templates inheriting from EduDataTPL
.