fusions.mult package

Subpackages

Submodules

fusions.mult.mult module

Module contents

Implements the MultimodalTransformer Model. See https://github.com/yaohungt/Multimodal-Transformer for more.

fusions.mult.LayerNorm(embedding_dim)

Generate LayerNorm Layer with given parameters.

Parameters:

embedding_dim (int) – Embedding dimension

Returns:

Initialized LayerNorm Module

Return type:

nn.Module

fusions.mult.Linear(in_features, out_features, bias=True)

Generate Linear Layer with given parameters and Xavier initialization.

Parameters:
  • in_features (int) – Number of input features

  • out_features (int) – Number of output features

  • bias (bool, optional) – Whether to include a bias term or not. Defaults to True.

Returns:

Initialized Linear Module.

Return type:

nn.Module

class fusions.mult.MULTModel(n_modalities, n_features, hyp_params=<class 'fusions.mult.MULTModel.DefaultHyperParams'>)

Bases: Module

Implements the MultimodalTransformer Model.

See https://github.com/yaohungt/Multimodal-Transformer for more.

class DefaultHyperParams

Bases: object

Set default hyperparameters for the model.

all_steps = False
attn_dropout = 0.1
attn_dropout_modalities = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
attn_mask = True
embed_dim = 9
embed_dropout = 0.25
layers = 3
num_heads = 3
out_dropout = 0.0
output_dim = 1
relu_dropout = 0.1
res_dropout = 0.1
__init__(n_modalities, n_features, hyp_params=<class 'fusions.mult.MULTModel.DefaultHyperParams'>)

Construct a MulT model.

forward(x)

Apply MultModel Module to Layer Input.

Parameters:

x – layer input. Has size n_modalities * [batch_size, seq_len, n_features]

get_network(mod1, mod2, mem, layers=-1)

Create TransformerEncoder network from layer information.

class fusions.mult.SinusoidalPositionalEmbedding(embedding_dim, padding_idx=0, left_pad=0)

Bases: Module

This module produces sinusoidal positional embeddings of any length.

Padding symbols are ignored, but it is necessary to specify whether padding is added on the left side (left_pad=True) or right side (left_pad=False).

__init__(embedding_dim, padding_idx=0, left_pad=0)

Instantiate SinusoidalPositionalEmbedding Module.

Parameters:
  • embedding_dim (int) – Embedding dimension

  • padding_idx (int, optional) – Padding index. Defaults to 0.

  • left_pad (int, optional) – Whether to pad from the left or not. Defaults to 0.

forward(input)

Apply PositionalEncodings to Input.

Input is expected to be of size [bsz x seqlen].

Parameters:

input (torch.Tensor) – Layer input

Returns:

Layer output

Return type:

torch.Tensor

static get_embedding(num_embeddings, embedding_dim, padding_idx=None)

Build sinusoidal embeddings.

This matches the implementation in tensor2tensor, but differs slightly from the description in Section 3.5 of “Attention Is All You Need”.

class fusions.mult.TransformerEncoder(embed_dim, num_heads, layers, attn_dropout=0.0, relu_dropout=0.0, res_dropout=0.0, embed_dropout=0.0, attn_mask=False)

Bases: Module

Transformer encoder consisting of args.encoder_layers layers.

Each layer is a TransformerEncoderLayer.

Parameters:
  • embed_tokens (torch.nn.Embedding) – input embedding

  • num_heads (int) – number of heads

  • layers (int) – number of layers

  • attn_dropout (float) – dropout applied on the attention weights

  • relu_dropout (float) – dropout applied on the first layer of the residual block

  • res_dropout (float) – dropout applied on the residual block

  • attn_mask (bool) – whether to apply mask on the attention weights

__init__(embed_dim, num_heads, layers, attn_dropout=0.0, relu_dropout=0.0, res_dropout=0.0, embed_dropout=0.0, attn_mask=False)

Initialize Transformer Encoder.

Parameters:
  • embed_dim (int) – Embedding dimension

  • num_heads (int) – Number of heads

  • layers (int) – Number of layers

  • attn_dropout (float, optional) – Probability of dropout in attention mechanism. Defaults to 0.0.

  • relu_dropout (float, optional) – Probability of dropout after ReLU. Defaults to 0.0.

  • res_dropout (float, optional) – Probability of dropout in residual layer. Defaults to 0.0.

  • embed_dropout (float, optional) – Probability of dropout in embedding layer. Defaults to 0.0.

  • attn_mask (bool, optional) – Whether to apply a mask to the attention or not. Defaults to False.

forward(x_in, x_in_k=None, x_in_v=None)

Apply Transformer Encoder to layer input.

Parameters:
  • x_in (FloatTensor) – embedded input of shape (src_len, batch, embed_dim)

  • x_in_k (FloatTensor) – embedded input of shape (src_len, batch, embed_dim)

  • x_in_v (FloatTensor) – embedded input of shape (src_len, batch, embed_dim)

Returns:

  • encoder_out (Tensor): the last encoder layer’s output of shape (src_len, batch, embed_dim)

  • encoder_padding_mask (ByteTensor): the positions of padding elements of shape (batch, src_len)

Return type:

dict

class fusions.mult.TransformerEncoderLayer(embed_dim, num_heads=4, attn_dropout=0.1, relu_dropout=0.1, res_dropout=0.1, attn_mask=False)

Bases: Module

Implements encoder layer block.

In the original paper each operation (multi-head attention or FFN) is postprocessed with: dropout -> add residual -> layernorm. In the tensor2tensor code they suggest that learning is more robust when preprocessing each layer with layernorm and postprocessing with: dropout -> add residual. We default to the approach in the paper, but the tensor2tensor approach can be enabled by setting args.encoder_normalize_before to True.

__init__(embed_dim, num_heads=4, attn_dropout=0.1, relu_dropout=0.1, res_dropout=0.1, attn_mask=False)

Instantiate TransformerEncoderLayer Module.

Parameters:
  • embed_dim (int) – Embedding dimension

  • num_heads (int, optional) – Number of heads. Defaults to 4.

  • attn_dropout (float, optional) – Dropout for attention mechanism. Defaults to 0.1.

  • relu_dropout (float, optional) – Dropout after ReLU. Defaults to 0.1.

  • res_dropout (float, optional) – Dropout after residual layer. Defaults to 0.1.

  • attn_mask (bool, optional) – Whether to apply an attention mask or not. Defaults to False.

forward(x, x_k=None, x_v=None)

Apply TransformerEncoderLayer to Layer Input.

Parameters:
  • x (Tensor) – input to the layer of shape (seq_len, batch, embed_dim)

  • encoder_padding_mask (ByteTensor) – binary ByteTensor of shape (batch, src_len) where padding elements are indicated by 1.

  • x_k (Tensor) – same as x

  • x_v (Tensor) – same as x

Returns:

encoded output of shape (batch, src_len, embed_dim)

fusions.mult.buffered_future_mask(tensor, tensor2=None)

Generate buffered future mask.

Parameters:
  • tensor (torch.Tensor) – Tensor to initialize mask from.

  • tensor2 (torch.Tensor, optional) – Tensor to initialize target mask from. Defaults to None.

Returns:

Buffered future mask.

Return type:

torch.Tensor

fusions.mult.fill_with_neg_inf(t)

FP16-compatible function that fills a tensor with -inf.

fusions.mult.make_positions(tensor, padding_idx, left_pad)

Replace non-padding symbols with their position numbers.

Position numbers begin at padding_idx+1. Padding symbols are ignored, but it is necessary to specify whether padding is added on the left side (left_pad=True) or right side (left_pad=False).

Parameters:
  • tensor (torch.Tensor) – Tensor to generate padding on.

  • padding_idx (int) – Position numbers start at padding_idx + 1

  • left_pad (bool) – Whether to pad from the left or from the right.

Returns:

Padded output

Return type:

torch.Tensor