transformer_rankers.datasets.dataset.WeaklySupervisedQueryDocumentDataset

class transformer_rankers.datasets.dataset.WeaklySupervisedQueryDocumentDataset(data, tokenizer, data_partition, negative_sampler, task_type, max_seq_len, sample_data, cache_path, cache_mode='memmap')[source]

Bases: torch.utils.data.dataset.Dataset

Dataset for pointwise learning with <Query,Document> pairs with Weak Supervision. The weak supervision is determined by the negative_sampler scores and it is only applied to the negative sampled documents.

For example, if negative_sampler = bm25, the scores of the negative samples will be the bm25 score of that negative sample instead of 0.

No generative models, e.g. T5, are supported here.

IMPORTANT:All weak supervision coming from the NS must not include labels 1. Weak supervision has to be (0–1]. This is because in other modules we assume everything that is <1 to be label 0, including evaluation.

__init__(data, tokenizer, data_partition, negative_sampler, task_type, max_seq_len, sample_data, cache_path, cache_mode='memmap')[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(data, tokenizer, data_partition, …)

Initialize self.