transformer_rankers.datasets.dataset.AbstractDataloader

class transformer_rankers.datasets.dataset.AbstractDataloader(train_df, val_df, test_df, tokenizer, negative_sampler_train, negative_sampler_val, task_type, train_batch_size, val_batch_size, max_seq_len, sample_data, cache_path)[source]

Bases: object

Abstract class for the DataLoaders. The class expects only relevant query-doc combinations in the dfs.

Parameters
  • train_df – train pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.

  • val_df – validation pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.

  • test_df – test pandas DataFrame containing columns the first containing the ‘query’ the second one relevant ‘document’.

  • tokenizer – transformer tokenizer.

  • negative_sampler_train – negative sampling class for the training set. Has .sample() function.

  • negative_sampler_val – negative sampling class for the validation/test set. Has .sample() function.

  • train_batch_size – int containing the number of instances in a batch for training.

  • val_batch_size – int containing the number of instances in a batch for validation/test.

  • max_seq_len – int containing the maximum sentence length when processing inputs.

  • sample_data – int containing whether the data was sampled (num_samples) or not (-1).

  • cache_path – str with the path to cache the dataset already in torch tensors format.

__init__(train_df, val_df, test_df, tokenizer, negative_sampler_train, negative_sampler_val, task_type, train_batch_size, val_batch_size, max_seq_len, sample_data, cache_path)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(train_df, val_df, test_df, …)

Initialize self.

get_pytorch_dataloaders()