transformer_rankers.datasets.dataset.AbstractDataloader¶
-
class
transformer_rankers.datasets.dataset.
AbstractDataloader
(train_df, val_df, test_df, tokenizer, negative_sampler_train, negative_sampler_val, task_type, train_batch_size, val_batch_size, max_seq_len, sample_data, cache_path)[source]¶ Bases:
object
Abstract class for the DataLoaders. The class expects only relevant query-doc combinations in the dfs.
- Parameters
train_df – train pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.
val_df – validation pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.
test_df – test pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.
tokenizer – transformer tokenizer.
negative_sampler_train – negative sampling class for the training set. Has .sample() function.
negative_sampler_val – negative sampling class for the validation/test set. Has .sample() function.
train_batch_size – int containing the number of instances in a batch for training.
val_batch_size – int containing the number of instances in a batch for validation/test.
max_seq_len – int containing the maximum sentence length when processing inputs.
sample_data – int containing whether the data was sampled (num_samples) or not (-1).
cache_path – str with the path to cache the dataset already in torch tensors format.
-
__init__
(train_df, val_df, test_df, tokenizer, negative_sampler_train, negative_sampler_val, task_type, train_batch_size, val_batch_size, max_seq_len, sample_data, cache_path)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(train_df, val_df, test_df, …)Initialize self.
get_pytorch_dataloaders
()