transformer_rankers.models.losses.label_smoothing.LabelSmoothingCrossEntropy¶
- 
class transformer_rankers.models.losses.label_smoothing.LabelSmoothingCrossEntropy(smoothing=0.1)[source]¶
- Bases: - torch.nn.modules.module.Module- Label Smoothing implementation from https://github.com/huanglianghua (https://github.com/pytorch/pytorch/issues/7455). - Label smoothing is a regularization technique that encourages the model to be less confident in its predictions, from “Rethinking the Inception Architecture for Computer Vision” (https://arxiv.org/abs/1512.00567). - 
__init__(smoothing=0.1)[source]¶
- Initializes internal Module state, shared by both nn.Module and ScriptModule. 
 - Methods - __init__([smoothing])- Initializes internal Module state, shared by both nn.Module and ScriptModule. - add_module(name, module)- Adds a child module to the current module. - apply(fn)- Applies - fnrecursively to every submodule (as returned by- .children()) as well as self.- bfloat16()- Casts all floating point parameters and buffers to - bfloat16datatype.- buffers([recurse])- Returns an iterator over module buffers. - children()- Returns an iterator over immediate children modules. - cpu()- Moves all model parameters and buffers to the CPU. - cuda([device])- Moves all model parameters and buffers to the GPU. - double()- Casts all floating point parameters and buffers to - doubledatatype.- eval()- Sets the module in evaluation mode. - Set the extra representation of the module - float()- Casts all floating point parameters and buffers to float datatype. - forward(inputs, target)- Defines the computation performed at every call. - half()- Casts all floating point parameters and buffers to - halfdatatype.- load_state_dict(state_dict[, strict])- Copies parameters and buffers from - state_dictinto this module and its descendants.- modules()- Returns an iterator over all modules in the network. - named_buffers([prefix, recurse])- Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. - Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. - named_modules([memo, prefix])- Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. - named_parameters([prefix, recurse])- Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. - parameters([recurse])- Returns an iterator over module parameters. - register_backward_hook(hook)- Registers a backward hook on the module. - register_buffer(name, tensor)- Adds a persistent buffer to the module. - register_forward_hook(hook)- Registers a forward hook on the module. - Registers a forward pre-hook on the module. - register_parameter(name, param)- Adds a parameter to the module. - requires_grad_([requires_grad])- Change if autograd should record operations on parameters in this module. - share_memory()- state_dict([destination, prefix, keep_vars])- Returns a dictionary containing a whole state of the module. - to(*args, **kwargs)- Moves and/or casts the parameters and buffers. - train([mode])- Sets the module in training mode. - type(dst_type)- Casts all parameters and buffers to - dst_type.- Sets gradients of all model parameters to zero. - Attributes - dump_patches- 
add_module(name, module)[source]¶
- Adds a child module to the current module. - The module can be accessed as an attribute using the given name. - Parameters
- name (string) – name of the child module. The child module can be accessed from this module using the given name 
- module (Module) – child module to be added to the module. 
 
 
 - 
apply(fn)[source]¶
- Applies - fnrecursively to every submodule (as returned by- .children()) as well as self. Typical use includes initializing the parameters of a model (see also nn-init-doc).- Parameters
- fn ( - Module-> None) – function to be applied to each submodule
- Returns
- self 
- Return type
- Module 
 - Example: - >>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 
 - 
bfloat16()[source]¶
- Casts all floating point parameters and buffers to - bfloat16datatype.- Returns
- self 
- Return type
- Module 
 
 - 
buffers(recurse=True)[source]¶
- Returns an iterator over module buffers. - Parameters
- recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. 
- Yields
- torch.Tensor – module buffer 
 - Example: - >>> for buf in model.buffers(): >>> print(type(buf), buf.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L) 
 - 
children()[source]¶
- Returns an iterator over immediate children modules. - Yields
- Module – a child module 
 
 - 
cuda(device=None)[source]¶
- Moves all model parameters and buffers to the GPU. - This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized. - Parameters
- device (int, optional) – if specified, all parameters will be copied to that device 
- Returns
- self 
- Return type
- Module 
 
 - 
double()[source]¶
- Casts all floating point parameters and buffers to - doubledatatype.- Returns
- self 
- Return type
- Module 
 
 - 
eval()[source]¶
- Sets the module in evaluation mode. - This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. - Dropout,- BatchNorm, etc.- This is equivalent with - self.train(False).- Returns
- self 
- Return type
- Module 
 
 - 
extra_repr()[source]¶
- Set the extra representation of the module - To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable. 
 - 
float()[source]¶
- Casts all floating point parameters and buffers to float datatype. - Returns
- self 
- Return type
- Module 
 
 - 
forward(inputs, target)[source]¶
- Defines the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - 
half()[source]¶
- Casts all floating point parameters and buffers to - halfdatatype.- Returns
- self 
- Return type
- Module 
 
 - 
load_state_dict(state_dict, strict=True)[source]¶
- Copies parameters and buffers from - state_dictinto this module and its descendants. If- strictis- True, then the keys of- state_dictmust exactly match the keys returned by this module’s- state_dict()function.- Parameters
- state_dict (dict) – a dict containing parameters and persistent buffers. 
- strict (bool, optional) – whether to strictly enforce that the keys in - state_dictmatch the keys returned by this module’s- state_dict()function. Default:- True
 
- Returns
- missing_keys is a list of str containing the missing keys 
- unexpected_keys is a list of str containing the unexpected keys 
 
- Return type
- NamedTuplewith- missing_keysand- unexpected_keysfields
 
 - 
modules()[source]¶
- Returns an iterator over all modules in the network. - Yields
- Module – a module in the network 
 - Note - Duplicate modules are returned only once. In the following example, - lwill be returned only once.- Example: - >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True) 
 - 
named_buffers(prefix='', recurse=True)[source]¶
- Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. - Parameters
- prefix (str) – prefix to prepend to all buffer names. 
- recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. 
 
- Yields
- (string, torch.Tensor) – Tuple containing the name and buffer 
 - Example: - >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size()) 
 - 
named_children()[source]¶
- Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. - Yields
- (string, Module) – Tuple containing a name and child module 
 - Example: - >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module) 
 - 
named_modules(memo=None, prefix='')[source]¶
- Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. - Yields
- (string, Module) – Tuple of name and module 
 - Note - Duplicate modules are returned only once. In the following example, - lwill be returned only once.- Example: - >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True)) 
 - 
named_parameters(prefix='', recurse=True)[source]¶
- Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. - Parameters
- prefix (str) – prefix to prepend to all parameter names. 
- recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. 
 
- Yields
- (string, Parameter) – Tuple containing the name and parameter 
 - Example: - >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size()) 
 - 
parameters(recurse=True)[source]¶
- Returns an iterator over module parameters. - This is typically passed to an optimizer. - Parameters
- recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. 
- Yields
- Parameter – module parameter 
 - Example: - >>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L) 
 - 
register_backward_hook(hook)[source]¶
- Registers a backward hook on the module. - The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature: - hook(module, grad_input, grad_output) -> Tensor or None - The - grad_inputand- grad_outputmay be tuples if the module has multiple inputs or outputs. The hook should not modify its arguments, but it can optionally return a new gradient with respect to input that will be used in place of- grad_inputin subsequent computations.- Returns
- a handle that can be used to remove the added hook by calling - handle.remove()
- Return type
- torch.utils.hooks.RemovableHandle
 - Warning - The current implementation will not have the presented behavior for complex - Modulethat perform many operations. In some failure cases,- grad_inputand- grad_outputwill only contain the gradients for a subset of the inputs and outputs. For such- Module, you should use- torch.Tensor.register_hook()directly on a specific input or output to get the required gradients.
 - 
register_buffer(name, tensor)[source]¶
- Adds a persistent buffer to the module. - This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s - running_meanis not a parameter, but is part of the persistent state.- Buffers can be accessed as attributes using given names. - Parameters
- name (string) – name of the buffer. The buffer can be accessed from this module using the given name 
- tensor (Tensor) – buffer to be registered. 
 
 - Example: - >>> self.register_buffer('running_mean', torch.zeros(num_features)) 
 - 
register_forward_hook(hook)[source]¶
- Registers a forward hook on the module. - The hook will be called every time after - forward()has computed an output. It should have the following signature:- hook(module, input, output) -> None or modified output - The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after - forward()is called.- Returns
- a handle that can be used to remove the added hook by calling - handle.remove()
- Return type
- torch.utils.hooks.RemovableHandle
 
 - 
register_forward_pre_hook(hook)[source]¶
- Registers a forward pre-hook on the module. - The hook will be called every time before - forward()is invoked. It should have the following signature:- hook(module, input) -> None or modified input - The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). - Returns
- a handle that can be used to remove the added hook by calling - handle.remove()
- Return type
- torch.utils.hooks.RemovableHandle
 
 - 
register_parameter(name, param)[source]¶
- Adds a parameter to the module. - The parameter can be accessed as an attribute using given name. - Parameters
- name (string) – name of the parameter. The parameter can be accessed from this module using the given name 
- param (Parameter) – parameter to be added to the module. 
 
 
 - 
requires_grad_(requires_grad=True)[source]¶
- Change if autograd should record operations on parameters in this module. - This method sets the parameters’ - requires_gradattributes in-place.- This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training). - Parameters
- requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: - True.
- Returns
- self 
- Return type
- Module 
 
 - 
state_dict(destination=None, prefix='', keep_vars=False)[source]¶
- Returns a dictionary containing a whole state of the module. - Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. - Returns
- a dictionary containing a whole state of the module 
- Return type
- dict 
 - Example: - >>> module.state_dict().keys() ['bias', 'weight'] 
 - 
to(*args, **kwargs)[source]¶
- Moves and/or casts the parameters and buffers. - This can be called as - Its signature is similar to - torch.Tensor.to(), but only accepts floating point desired- dtypes. In addition, this method will only cast the floating point parameters and buffers to- dtype(if given). The integral parameters and buffers will be moved- device, if that is given, but with dtypes unchanged. When- non_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.- See below for examples. - Note - This method modifies the module in-place. - Parameters
- device ( - torch.device) – the desired device of the parameters and buffers in this module
- dtype ( - torch.dtype) – the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module 
- memory_format ( - torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
 
- Returns
- self 
- Return type
- Module 
 - Example: - >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) 
 - 
train(mode=True)[source]¶
- Sets the module in training mode. - This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. - Dropout,- BatchNorm, etc.- Parameters
- mode (bool) – whether to set training mode ( - True) or evaluation mode (- False). Default:- True.
- Returns
- self 
- Return type
- Module 
 
 
-