transformer_rankers.negative_samplers.negative_sampling.SentenceBERTNegativeSampler¶
- 
class transformer_rankers.negative_samplers.negative_sampling.SentenceBERTNegativeSampler(candidates, num_candidates_samples, embeddings_file, sample_data, pre_trained_model='bert-base-nli-stsb-mean-tokens', seed=42)[source]¶
- Bases: - object- Sample candidates from a list of candidates using dense embeddings from sentenceBERT. - Parameters
- candidates – list of str containing the candidates 
- num_candidates_samples – int containing the number of negative samples for each query. 
- embeddings_file – str containing the path to cache the embeddings. 
- sample_data – int indicating amount of candidates in the index (-1 if all) 
- pre_trained_model – str containing the pre-trained sentence embedding model, e.g. bert-base-nli-stsb-mean-tokens. 
 
 - 
__init__(candidates, num_candidates_samples, embeddings_file, sample_data, pre_trained_model='bert-base-nli-stsb-mean-tokens', seed=42)[source]¶
- Initialize self. See help(type(self)) for accurate signature. 
 - Methods - __init__(candidates, num_candidates_samples, …)- Initialize self. - sample(query_str, relevant_docs)- Samples from a list of candidates using dot product sentenceBERT similarity. - 
sample(query_str, relevant_docs)[source]¶
- Samples from a list of candidates using dot product sentenceBERT similarity. - If the samples match the relevant doc, then removes it and re-samples randomly. The method uses faiss index to be efficient. - Parameters
- query_str – the str of the query to be used for the dense similarity matching. 
- relevant_docs – list with the str of the relevant documents, to avoid sampling them as negative sample. 
 
- Returns
- First the sampled_documents, their respective scores and then indicators if the NS retrieved the relevant document, and if so at which position.