transformer_rankers.datasets.dataset.WeaklySupervisedQueryDocumentDataset¶
-
class
transformer_rankers.datasets.dataset.
WeaklySupervisedQueryDocumentDataset
(data, tokenizer, data_partition, negative_sampler, task_type, max_seq_len, sample_data, cache_path, cache_mode='memmap')[source]¶ Bases:
torch.utils.data.dataset.Dataset
Dataset for pointwise learning with <Query,Document> pairs with Weak Supervision. The weak supervision is determined by the negative_sampler scores and it is only applied to the negative sampled documents.
For example, if negative_sampler = bm25, the scores of the negative samples will be the bm25 score of that negative sample instead of 0.
No generative models, e.g. T5, are supported here.
IMPORTANT:All weak supervision coming from the NS must not include labels 1. Weak supervision has to be (0–1]. This is because in other modules we assume everything that is <1 to be label 0, including evaluation.
-
__init__
(data, tokenizer, data_partition, negative_sampler, task_type, max_seq_len, sample_data, cache_path, cache_mode='memmap')[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(data, tokenizer, data_partition, …)Initialize self.
-