Quick Start *********** This guide gets you from a fresh install to a trained multimodal model in a few minutes. Make sure you have followed the :doc:`installation` steps first. A first experiment: early fusion on CMU-MOSI ============================================ This example trains a simple early-fusion model on the `CMU-MOSI `_ sentiment dataset (three modalities: text, audio, vision). First, download the MOSI data: .. code-block:: bash pip install gdown gdown https://drive.google.com/u/0/uc?id=1szKIqO0t3Be_W91xvf6aYmsVVUa7wDHU mkdir -p data/affect && mv mosi_raw.pkl data/affect/ Then train and test the model: .. code-block:: python import torch from datasets.affect.get_data import get_dataloader from unimodals.common_models import GRU, MLP, Sequential, Identity from fusions.common_fusions import ConcatEarly from training_structures.Supervised_Learning import train, test from utils.device import get_device device = get_device() # automatically selects CUDA, MPS (Apple Silicon), or CPU # Load data (3 modalities: text, audio, vision) traindata, validdata, testdata = get_dataloader( 'data/affect/mosi_raw.pkl', data_type='mosi', max_pad=True, max_seq_len=50 ) # Define model components encoders = [Identity().to(device) for _ in range(3)] fusion = ConcatEarly().to(device) head = Sequential( GRU(409, 512, dropout=True, has_padding=False, batch_first=True, last_only=True), MLP(512, 512, 1) ).to(device) # Train train(encoders, fusion, head, traindata, validdata, total_epochs=10, task="regression", optimtype=torch.optim.AdamW, lr=1e-3, save='results/models/mosi_ef_r0.pt', objective=torch.nn.L1Loss()) # Test model = torch.load('results/models/mosi_ef_r0.pt', weights_only=False).to(device) test(model, testdata, dataset='affect', is_packed=False, criterion=torch.nn.L1Loss(), task="posneg-classification", no_robust=True) .. note:: Trained checkpoints are saved to ``results/models/`` and robustness plots to ``results/images/`` by default, so your experiment artifacts stay in one place instead of scattering across the repository root. Quickest experiments to get started ==================================== If you just want to confirm your install works and see the full data → train → evaluate pipeline run end to end, these are the fastest entry points. All run on **CPU** with real data and the default 2-epoch example settings, except the MOSI code block above, which uses 10 epochs. .. list-table:: :header-rows: 1 :widths: 22 30 24 14 10 * - Experiment - Script - Data - CPU runtime - Model params * - Stock prediction - ``examples/finance/stocks_late_fusion.py`` - Auto-downloads via ``yfinance`` - ~20 s - 7.4 K * - AV-MNIST (late fusion) - ``examples/multimedia/avmnist_simple_late_fusion.py`` - 2,000 real training examples - ~26 s - 260.9 K * - Gentle Push (unimodal) - ``examples/gentle_push/unimodal_image.py --quick`` - 10 real train / val / test trajectories - ~36 s - 3.9 M **Smallest / fastest overall:** Stock prediction needs no manual download (data is fetched on first run via ``yfinance``) and finishes in seconds, making it the best choice for a first smoke test or for quickly iterating on model architecture. AV-MNIST is the simplest *multimodal* starting point — its example already subsets to 2,000 training samples and 2 epochs. The Gentle Push script without ``--quick`` trains on the full ``gentle_push_1000.hdf5`` training file and is CPU-compatible, but it is not a quick smoke test on typical CPU-only machines. Smoke-test results ================== The numbers below were captured from quick real-data CPU runs of the pipeline on one local machine. They illustrate relative speed and model size, not benchmark accuracy. .. list-table:: :header-rows: 1 :widths: 24 18 18 24 * - Metric - Stock - AV-MNIST - Gentle Push ``--quick`` * - Total runtime - 19.8 s - 25.8 s - 35.8 s * - Training time - 8.4 s - 12.1 s - 27.5 s * - Inference time - 0.27 s - 6.15 s - 2.84 s * - Model parameters - 7,393 - 260,922 - 3,879,898 * - Smoke-test metric - MSE 1.2406 - Accuracy 0.5499 - MSE 0.3309 Random initialization, data-fetch latency, and CPU model can move the results. For real benchmark numbers, download the datasets via the :doc:`datadownload` guide and train for the full epoch counts. Running other experiments ========================= Each dataset has dedicated example scripts under ``examples/``: .. code-block:: bash # Affective computing python examples/affect/affect_late_fusion.py # Healthcare (requires MIMIC access) python examples/healthcare/mimic_low_rank_tensor.py # Robotics python examples/robotics/LRTF.py python examples/gentle_push/LF.py # Finance (specify input and target stocks) python examples/finance/stocks_late_fusion.py --input-stocks 'AAPL MSFT AMZN INTC AMD MSI' --target-stock 'MSFT' # HCI python examples/hci/enrico_simple_late_fusion.py # Multimedia python examples/multimedia/avmnist_simple_late_fusion.py python examples/multimedia/mmimdb_simple_late_fusion.py