fusions.mult package
Subpackages
Submodules
fusions.mult.mult module
Module contents
Implements the MultimodalTransformer Model. See https://github.com/yaohungt/Multimodal-Transformer for more.
- fusions.mult.LayerNorm(embedding_dim)
Generate LayerNorm Layer with given parameters.
- Parameters:
embedding_dim (int) – Embedding dimension
- Returns:
Initialized LayerNorm Module
- Return type:
nn.Module
- fusions.mult.Linear(in_features, out_features, bias=True)
Generate Linear Layer with given parameters and Xavier initialization.
- Parameters:
in_features (int) – Number of input features
out_features (int) – Number of output features
bias (bool, optional) – Whether to include a bias term or not. Defaults to True.
- Returns:
Initialized Linear Module.
- Return type:
nn.Module
- class fusions.mult.MULTModel(n_modalities, n_features, hyp_params=<class 'fusions.mult.MULTModel.DefaultHyperParams'>)
Bases:
ModuleImplements the MultimodalTransformer Model.
See https://github.com/yaohungt/Multimodal-Transformer for more.
- class DefaultHyperParams
Bases:
objectSet default hyperparameters for the model.
- all_steps = False
- attn_dropout = 0.1
- attn_dropout_modalities = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
- attn_mask = True
- embed_dim = 9
- embed_dropout = 0.25
- layers = 3
- num_heads = 3
- out_dropout = 0.0
- output_dim = 1
- relu_dropout = 0.1
- res_dropout = 0.1
- __init__(n_modalities, n_features, hyp_params=<class 'fusions.mult.MULTModel.DefaultHyperParams'>)
Construct a MulT model.
- forward(x)
Apply MultModel Module to Layer Input.
- Parameters:
x – layer input. Has size n_modalities * [batch_size, seq_len, n_features]
- get_network(mod1, mod2, mem, layers=-1)
Create TransformerEncoder network from layer information.
- class fusions.mult.SinusoidalPositionalEmbedding(embedding_dim, padding_idx=0, left_pad=0)
Bases:
ModuleThis module produces sinusoidal positional embeddings of any length.
Padding symbols are ignored, but it is necessary to specify whether padding is added on the left side (left_pad=True) or right side (left_pad=False).
- __init__(embedding_dim, padding_idx=0, left_pad=0)
Instantiate SinusoidalPositionalEmbedding Module.
- Parameters:
embedding_dim (int) – Embedding dimension
padding_idx (int, optional) – Padding index. Defaults to 0.
left_pad (int, optional) – Whether to pad from the left or not. Defaults to 0.
- forward(input)
Apply PositionalEncodings to Input.
Input is expected to be of size [bsz x seqlen].
- Parameters:
input (torch.Tensor) – Layer input
- Returns:
Layer output
- Return type:
torch.Tensor
- static get_embedding(num_embeddings, embedding_dim, padding_idx=None)
Build sinusoidal embeddings.
This matches the implementation in tensor2tensor, but differs slightly from the description in Section 3.5 of “Attention Is All You Need”.
- class fusions.mult.TransformerEncoder(embed_dim, num_heads, layers, attn_dropout=0.0, relu_dropout=0.0, res_dropout=0.0, embed_dropout=0.0, attn_mask=False)
Bases:
ModuleTransformer encoder consisting of args.encoder_layers layers.
Each layer is a
TransformerEncoderLayer.- Parameters:
embed_tokens (torch.nn.Embedding) – input embedding
num_heads (int) – number of heads
layers (int) – number of layers
attn_dropout (float) – dropout applied on the attention weights
relu_dropout (float) – dropout applied on the first layer of the residual block
res_dropout (float) – dropout applied on the residual block
attn_mask (bool) – whether to apply mask on the attention weights
- __init__(embed_dim, num_heads, layers, attn_dropout=0.0, relu_dropout=0.0, res_dropout=0.0, embed_dropout=0.0, attn_mask=False)
Initialize Transformer Encoder.
- Parameters:
embed_dim (int) – Embedding dimension
num_heads (int) – Number of heads
layers (int) – Number of layers
attn_dropout (float, optional) – Probability of dropout in attention mechanism. Defaults to 0.0.
relu_dropout (float, optional) – Probability of dropout after ReLU. Defaults to 0.0.
res_dropout (float, optional) – Probability of dropout in residual layer. Defaults to 0.0.
embed_dropout (float, optional) – Probability of dropout in embedding layer. Defaults to 0.0.
attn_mask (bool, optional) – Whether to apply a mask to the attention or not. Defaults to False.
- forward(x_in, x_in_k=None, x_in_v=None)
Apply Transformer Encoder to layer input.
- Parameters:
x_in (FloatTensor) – embedded input of shape (src_len, batch, embed_dim)
x_in_k (FloatTensor) – embedded input of shape (src_len, batch, embed_dim)
x_in_v (FloatTensor) – embedded input of shape (src_len, batch, embed_dim)
- Returns:
encoder_out (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)
encoder_padding_mask (ByteTensor): the positions of padding elements of shape (batch, src_len)
- Return type:
dict
- class fusions.mult.TransformerEncoderLayer(embed_dim, num_heads=4, attn_dropout=0.1, relu_dropout=0.1, res_dropout=0.1, attn_mask=False)
Bases:
ModuleImplements encoder layer block.
In the original paper each operation (multi-head attention or FFN) is postprocessed with: dropout -> add residual -> layernorm. In the tensor2tensor code they suggest that learning is more robust when preprocessing each layer with layernorm and postprocessing with: dropout -> add residual. We default to the approach in the paper, but the tensor2tensor approach can be enabled by setting args.encoder_normalize_before to
True.- __init__(embed_dim, num_heads=4, attn_dropout=0.1, relu_dropout=0.1, res_dropout=0.1, attn_mask=False)
Instantiate TransformerEncoderLayer Module.
- Parameters:
embed_dim (int) – Embedding dimension
num_heads (int, optional) – Number of heads. Defaults to 4.
attn_dropout (float, optional) – Dropout for attention mechanism. Defaults to 0.1.
relu_dropout (float, optional) – Dropout after ReLU. Defaults to 0.1.
res_dropout (float, optional) – Dropout after residual layer. Defaults to 0.1.
attn_mask (bool, optional) – Whether to apply an attention mask or not. Defaults to False.
- forward(x, x_k=None, x_v=None)
Apply TransformerEncoderLayer to Layer Input.
- Parameters:
x (Tensor) – input to the layer of shape (seq_len, batch, embed_dim)
encoder_padding_mask (ByteTensor) – binary ByteTensor of shape (batch, src_len) where padding elements are indicated by
1.x_k (Tensor) – same as x
x_v (Tensor) – same as x
- Returns:
encoded output of shape (batch, src_len, embed_dim)
- fusions.mult.buffered_future_mask(tensor, tensor2=None)
Generate buffered future mask.
- Parameters:
tensor (torch.Tensor) – Tensor to initialize mask from.
tensor2 (torch.Tensor, optional) – Tensor to initialize target mask from. Defaults to None.
- Returns:
Buffered future mask.
- Return type:
torch.Tensor
- fusions.mult.fill_with_neg_inf(t)
FP16-compatible function that fills a tensor with -inf.
- fusions.mult.make_positions(tensor, padding_idx, left_pad)
Replace non-padding symbols with their position numbers.
Position numbers begin at padding_idx+1. Padding symbols are ignored, but it is necessary to specify whether padding is added on the left side (left_pad=True) or right side (left_pad=False).
- Parameters:
tensor (torch.Tensor) – Tensor to generate padding on.
padding_idx (int) – Position numbers start at padding_idx + 1
left_pad (bool) – Whether to pad from the left or from the right.
- Returns:
Padded output
- Return type:
torch.Tensor