Quick Start
This guide gets you from a fresh install to a trained multimodal model in a few minutes. Make sure you have followed the Installation steps first.
A first experiment: early fusion on CMU-MOSI
This example trains a simple early-fusion model on the CMU-MOSI sentiment dataset (three modalities: text, audio, vision).
First, download the MOSI data:
pip install gdown
gdown https://drive.google.com/u/0/uc?id=1szKIqO0t3Be_W91xvf6aYmsVVUa7wDHU
mkdir -p data/affect && mv mosi_raw.pkl data/affect/
Then train and test the model:
import torch
from datasets.affect.get_data import get_dataloader
from unimodals.common_models import GRU, MLP, Sequential, Identity
from fusions.common_fusions import ConcatEarly
from training_structures.Supervised_Learning import train, test
from utils.device import get_device
device = get_device() # automatically selects CUDA, MPS (Apple Silicon), or CPU
# Load data (3 modalities: text, audio, vision)
traindata, validdata, testdata = get_dataloader(
'data/affect/mosi_raw.pkl',
data_type='mosi', max_pad=True, max_seq_len=50
)
# Define model components
encoders = [Identity().to(device) for _ in range(3)]
fusion = ConcatEarly().to(device)
head = Sequential(
GRU(409, 512, dropout=True, has_padding=False, batch_first=True, last_only=True),
MLP(512, 512, 1)
).to(device)
# Train
train(encoders, fusion, head, traindata, validdata,
total_epochs=10, task="regression",
optimtype=torch.optim.AdamW, lr=1e-3,
save='results/models/mosi_ef_r0.pt', objective=torch.nn.L1Loss())
# Test
model = torch.load('results/models/mosi_ef_r0.pt', weights_only=False).to(device)
test(model, testdata, dataset='affect', is_packed=False,
criterion=torch.nn.L1Loss(), task="posneg-classification", no_robust=True)
Note
Trained checkpoints are saved to results/models/ and robustness plots to
results/images/ by default, so your experiment artifacts stay in one
place instead of scattering across the repository root.
Quickest experiments to get started
If you just want to confirm your install works and see the full data → train → evaluate pipeline run end to end, these are the fastest entry points. All run on CPU with real data and the default 2-epoch example settings, except the MOSI code block above, which uses 10 epochs.
Experiment |
Script |
Data |
CPU runtime |
Model params |
|---|---|---|---|---|
Stock prediction |
|
Auto-downloads via |
~20 s |
7.4 K |
AV-MNIST (late fusion) |
|
2,000 real training examples |
~26 s |
260.9 K |
Gentle Push (unimodal) |
|
10 real train / val / test trajectories |
~36 s |
3.9 M |
Smallest / fastest overall: Stock prediction needs no manual download
(data is fetched on first run via yfinance) and finishes in seconds,
making it the best choice for a first smoke test or for quickly iterating on
model architecture. AV-MNIST is the simplest multimodal starting point — its
example already subsets to 2,000 training samples and 2 epochs.
The Gentle Push script without --quick trains on the full
gentle_push_1000.hdf5 training file and is CPU-compatible, but it is not a
quick smoke test on typical CPU-only machines.
Smoke-test results
The numbers below were captured from quick real-data CPU runs of the pipeline on one local machine. They illustrate relative speed and model size, not benchmark accuracy.
Metric |
Stock |
AV-MNIST |
Gentle Push |
|---|---|---|---|
Total runtime |
19.8 s |
25.8 s |
35.8 s |
Training time |
8.4 s |
12.1 s |
27.5 s |
Inference time |
0.27 s |
6.15 s |
2.84 s |
Model parameters |
7,393 |
260,922 |
3,879,898 |
Smoke-test metric |
MSE 1.2406 |
Accuracy 0.5499 |
MSE 0.3309 |
Random initialization, data-fetch latency, and CPU model can move the results. For real benchmark numbers, download the datasets via the Downloading Datasets guide and train for the full epoch counts.
Running other experiments
Each dataset has dedicated example scripts under examples/:
# Affective computing
python examples/affect/affect_late_fusion.py
# Healthcare (requires MIMIC access)
python examples/healthcare/mimic_low_rank_tensor.py
# Robotics
python examples/robotics/LRTF.py
python examples/gentle_push/LF.py
# Finance (specify input and target stocks)
python examples/finance/stocks_late_fusion.py --input-stocks 'AAPL MSFT AMZN INTC AMD MSI' --target-stock 'MSFT'
# HCI
python examples/hci/enrico_simple_late_fusion.py
# Multimedia
python examples/multimedia/avmnist_simple_late_fusion.py
python examples/multimedia/mmimdb_simple_late_fusion.py