Quick Start

This guide gets you from a fresh install to a trained multimodal model in a few minutes. Make sure you have followed the Installation steps first.

A first experiment: early fusion on CMU-MOSI

This example trains a simple early-fusion model on the CMU-MOSI sentiment dataset (three modalities: text, audio, vision).

First, download the MOSI data:

pip install gdown
gdown https://drive.google.com/u/0/uc?id=1szKIqO0t3Be_W91xvf6aYmsVVUa7wDHU
mkdir -p data/affect && mv mosi_raw.pkl data/affect/

Then train and test the model:

import torch
from datasets.affect.get_data import get_dataloader
from unimodals.common_models import GRU, MLP, Sequential, Identity
from fusions.common_fusions import ConcatEarly
from training_structures.Supervised_Learning import train, test
from utils.device import get_device

device = get_device()  # automatically selects CUDA, MPS (Apple Silicon), or CPU

# Load data (3 modalities: text, audio, vision)
traindata, validdata, testdata = get_dataloader(
    'data/affect/mosi_raw.pkl',
    data_type='mosi', max_pad=True, max_seq_len=50
)

# Define model components
encoders = [Identity().to(device) for _ in range(3)]
fusion = ConcatEarly().to(device)
head = Sequential(
    GRU(409, 512, dropout=True, has_padding=False, batch_first=True, last_only=True),
    MLP(512, 512, 1)
).to(device)

# Train
train(encoders, fusion, head, traindata, validdata,
      total_epochs=10, task="regression",
      optimtype=torch.optim.AdamW, lr=1e-3,
      save='results/models/mosi_ef_r0.pt', objective=torch.nn.L1Loss())

# Test
model = torch.load('results/models/mosi_ef_r0.pt', weights_only=False).to(device)
test(model, testdata, dataset='affect', is_packed=False,
     criterion=torch.nn.L1Loss(), task="posneg-classification", no_robust=True)

Note

Trained checkpoints are saved to results/models/ and robustness plots to results/images/ by default, so your experiment artifacts stay in one place instead of scattering across the repository root.

Quickest experiments to get started

If you just want to confirm your install works and see the full data → train → evaluate pipeline run end to end, these are the fastest entry points. All run on CPU with real data and the default 2-epoch example settings, except the MOSI code block above, which uses 10 epochs.

Experiment

Script

Data

CPU runtime

Model params

Stock prediction

examples/finance/stocks_late_fusion.py

Auto-downloads via yfinance

~20 s

7.4 K

AV-MNIST (late fusion)

examples/multimedia/avmnist_simple_late_fusion.py

2,000 real training examples

~26 s

260.9 K

Gentle Push (unimodal)

examples/gentle_push/unimodal_image.py --quick

10 real train / val / test trajectories

~36 s

3.9 M

Smallest / fastest overall: Stock prediction needs no manual download (data is fetched on first run via yfinance) and finishes in seconds, making it the best choice for a first smoke test or for quickly iterating on model architecture. AV-MNIST is the simplest multimodal starting point — its example already subsets to 2,000 training samples and 2 epochs.

The Gentle Push script without --quick trains on the full gentle_push_1000.hdf5 training file and is CPU-compatible, but it is not a quick smoke test on typical CPU-only machines.

Smoke-test results

The numbers below were captured from quick real-data CPU runs of the pipeline on one local machine. They illustrate relative speed and model size, not benchmark accuracy.

Metric

Stock

AV-MNIST

Gentle Push --quick

Total runtime

19.8 s

25.8 s

35.8 s

Training time

8.4 s

12.1 s

27.5 s

Inference time

0.27 s

6.15 s

2.84 s

Model parameters

7,393

260,922

3,879,898

Smoke-test metric

MSE 1.2406

Accuracy 0.5499

MSE 0.3309

Random initialization, data-fetch latency, and CPU model can move the results. For real benchmark numbers, download the datasets via the Downloading Datasets guide and train for the full epoch counts.

Running other experiments

Each dataset has dedicated example scripts under examples/:

# Affective computing
python examples/affect/affect_late_fusion.py

# Healthcare (requires MIMIC access)
python examples/healthcare/mimic_low_rank_tensor.py

# Robotics
python examples/robotics/LRTF.py
python examples/gentle_push/LF.py

# Finance (specify input and target stocks)
python examples/finance/stocks_late_fusion.py --input-stocks 'AAPL MSFT AMZN INTC AMD MSI' --target-stock 'MSFT'

# HCI
python examples/hci/enrico_simple_late_fusion.py

# Multimedia
python examples/multimedia/avmnist_simple_late_fusion.py
python examples/multimedia/mmimdb_simple_late_fusion.py