transformer_rankers.datasets.dataset.AbstractDataloader¶

class transformer_rankers.datasets.dataset.AbstractDataloader(train_df, val_df, test_df, tokenizer, negative_sampler_train, negative_sampler_val, task_type, train_batch_size, val_batch_size, max_seq_len, sample_data, cache_path)[source]¶

Bases: object

Abstract class for the DataLoaders. The class expects only relevant query-doc combinations in the dfs.

Parameters

train_df – train pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.
val_df – validation pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.
test_df – test pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.
tokenizer – transformer tokenizer.
negative_sampler_train – negative sampling class for the training set. Has .sample() function.
negative_sampler_val – negative sampling class for the validation/test set. Has .sample() function.
train_batch_size – int containing the number of instances in a batch for training.
val_batch_size – int containing the number of instances in a batch for validation/test.
max_seq_len – int containing the maximum sentence length when processing inputs.
sample_data – int containing whether the data was sampled (num_samples) or not (-1).
cache_path – str with the path to cache the dataset already in torch tensors format.

__init__(train_df, val_df, test_df, tokenizer, negative_sampler_train, negative_sampler_val, task_type, train_batch_size, val_batch_size, max_seq_len, sample_data, cache_path)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(train_df, val_df, test_df, …)	Initialize self.
`get_pytorch_dataloaders`()