Downloading Datasets ******************** We support a variety of datasets in MultiBench out of the box, but downloading each of them requires separate steps to do so: Affective Computing ################### ----- MUStARD ======= To grab the `MUStARD/sarcasm `_ dataset, download the dataset from `here `_. To load it, you can then use the following snippet: .. code-block:: bash from datasets.affect.get_data import get_dataloader traindata, validdata, test_robust = get_dataloader('/path/to/raw/data/file', data_type='sarcasm') ----- CMU-MOSI ======== To grab the `CMU-MOSI `_ dataset, download the dataset from `here `_. To load it, you can then use the following snippet: .. code-block:: bash from datasets.affect.get_data import get_dataloader traindata, validdata, test_robust = get_dataloader('/path/to/raw/data/file', data_type='mosi') ----- UR-Funny ======== To grab the `UR-Funny/humor `_ dataset, download the dataset from `here `_. To load it, you can then use the following snippet: .. code-block:: bash from datasets.affect.get_data import get_dataloader traindata, validdata, test_robust = get_dataloader('/path/to/raw/data/file', data_type='humor') ----- CMU-MOSEI ========= To grab the `CMU-MOSEI `_ dataset, download the dataset from `here `_. To load it, you can then use the following snippet: .. code-block:: bash from datasets.affect.get_data import get_dataloader traindata, validdata, test_robust = get_dataloader('/path/to/raw/data/file', data_type='mosei') ----- Healthcare ########## ----- MIMIC ===== Access to the MIMIC dataset is restricted, and as a result you will need to follow the instructions `here `_ to gain access. Once you do so, email yiweilyu@umich.edu with proof of those credentials to gain access to the preprocessed `im.pk` file we use in our dataloaders. To load it, you can then use the following snippet: .. code-block:: bash from datasets.mimic.get_data import get_dataloader traindata, validdata, test_robust = get_dataloader(tasknum, inputed_path='/path/to/raw/data/file') ----- Robotics ########## ----- MuJoCo Push =========== To grab the MuJoCo Push dataset, you simply need to run any of the example experiments, like the following: .. code-block:: bash python examples/gentle_push/LF.py This will download the dataset into the ``datasets/gentle_push/cache`` folder directly. As this uses ``gdown`` to download the files directly, sometimes this process fails. Should that happen, you can simply download the following files: * Download `this file `_ and place it under ``datasets/gentle_push/cache/1qmBCfsAGu8eew-CQFmV1svodl9VJa6fX-gentle_push_10.hdf5``. * Download `this file `_ and place it under ``datasets/gentle_push/cache/18dr1z0N__yFiP_DAKxy-Hs9Vy_AsaW6Q-gentle_push_300.hdf5``. * Download `this file `_ and place it under ``datasets/gentle_push/cache/1JTgmq1KPRK9HYi8BgvljKg5MPqT_N4cR-gentle_push_1000.hdf5``. Then, you can follow this code block as an example to get the dataloaders: .. code-block:: python from datasets.gentle_push.data_loader import PushTask Task = PushTask modalities = ['control'] # Parse args parser = argparse.ArgumentParser() Task.add_dataset_arguments(parser) args = parser.parse_args() dataset_args = Task.get_dataset_args(args) fannypack.data.set_cache_path('datasets/gentle_push/cache') train_loader, val_loader, test_loader = Task.get_dataloader( 16, modalities, batch_size=32, drop_last=True) ----- Vision&Touch =========== To grab the `Vision&Touch `_ dataset, please run the ``download_data.sh`` file located under ``dataset/robotics/download_data.sh``. This dataset, by default, only has the training and validation dataset, which you can access through the following call: .. code-block:: bash from datasets.robotics.data_loader import get_data trainloader, valloader = get_data(device, config, "path/to/data/folder") By default, the task is the **Contact** task, but passing in ``output='ee_yaw_next'`` into ``get_data`` will allow you to access the **End Effector** task ----- Finance ######## ----- All of the dataloaders, when created, will automatically download the stock data for you. As an example, this can be done through the following code block: .. code-block:: python from datasets.stocks.get_data import get_dataloader # Here, the list of stocks is a list of strings of stock symbols in all CAPS. train_loader, val_loader, test_loader = get_dataloader(stocks, stocks, [args.target_stock]) For the purposes of the MultiBench paper, we used the following lists per dataset: .. code-block:: bash F&B (18): CAG CMG CPB DPZ DRI GIS HRL HSY K KHC LW MCD MDLZ MKC SBUX SJM TSN YUM Health (63): ABT ABBV ABMD A ALXN ALGN ABC AMGN ANTM BAX BDX BIO BIIB BSX BMY CAH CTLT CNC CERN CI COO CVS DHR DVA XRAY DXCM EW GILD HCA HSIC HOLX HUM IDXX ILMN INCY ISRG IQV JNJ LH LLY MCK MDT MRK MTD PKI PRGO PFE DGX REGN RMD STE SYK TFX TMO UNH UHS VAR VRTX VTRS WAT WST ZBH ZTS Tech (100): AAPL ACN ADBE ADI ADP ADSK AKAM AMAT AMD ANET ANSS APH ATVI AVGO BR CDNS CDW CHTR CMCSA CRM CSCO CTSH CTXS DIS DISCA DISCK DISH DXC EA ENPH FB FFIV FIS FISV FLIR FLT FOX FOXA FTNT GLW GOOG GOOGL GPN HPE HPQ IBM INTC INTU IPG IPGP IT JKHY JNPR KEYS KLAC LRCX LUMN LYV MA MCHP MPWR MSFT MSI MU MXIM NFLX NLOK NOW NTAP NVDA NWS NWSA NXPI OMC ORCL PAYC PAYX PYPL QCOM QRVO SNPS STX SWKS T TEL TER TMUS TRMB TTWO TWTR TXN TYL V VIAC VRSN VZ WDC WU XLNX ZBRA HCI ### ------ ENRiCO ====== To grab the ENRiCO dataset, please use the ``download_data.sh`` shell script under the ``datasets/enrico`` dataset. Then, to load the data in, you can use something like the following code-block: As an example, this can be done through the following code block: .. code-block:: python from datasets.enrico.get_data import get_dataloader (train_loader, val_loader, test_loader), weights = get_dataloader("datasets/enrico/dataset") Multimedia ########## ----------- AV-MNIST ======== To grab the AV-MNIST dataset, please download the ``avmnist.tar.gz`` file from `here `_ . Once that is done, untar it your preferred location and do the following to get the dataloaders: .. code-block:: python from datasets.avmnist.get_data import get_dataloader train_loader, val_loader, test_loader = get_dataloader("path/to/dataset") MM-IMDb ======= To grab the MM-IMDb dataset, please download the ``multimodal_imdb.hdf5`` file from `here `_ . If you plan on testing the model's robustness, you will **also** need to download the raw file from `here `_. Once that is done, untar it your preferred location and do something the following to get the dataloaders: .. code-block:: python from datasets.imdb.get_data import get_dataloader train_loader, val_loader, test_loader = get_dataloader("path/to/processed_data","path/to/raw/data/folder",vgg=True, batch_size=128) Kinetics400 =========== To download either of the Kinetics datasets, run the appropriate script under ``special/kinetics_*.py``. Then pass the location of the data to the associated file to finish it. Clotho ====== To download the Clotho dataset, clone `the repository `_ somewhere on your device and follow the given instructions to pre-process the data. To get the dataloaders, you will also need to add the path to the above repo to the get_dataloaders function under `datasets/clotho/get_data.py`.