datasets.clotho package
Submodules
datasets.clotho.clotho_data_loader module
Implements the creation of the dataloader for CLOTHO.
- datasets.clotho.clotho_data_loader.get_clotho_loader(data_dir: Path, split: str, input_field_name: str, output_field_name: str, load_into_memory: bool, batch_size: int, nb_t_steps_pad: AnyStr | Tuple[int, int], shuffle: bool | None = True, drop_last: bool | None = True, input_pad_at: str | None = 'start', output_pad_at: str | None = 'end', num_workers: int | None = 1) DataLoader
Gets the clotho data loader.
- Parameters:
data_dir (pathlib.Path) – Directory with data.
split (str) – Split to use (i.e. ‘development’, ‘evaluation’)
input_field_name (str) – Field name of the clotho data to be used as input data to the method.
output_field_name (str) – Field name of the clotho data to be used as output data to the method.
load_into_memory (bool) – Load all data into memory?
batch_size (int) – Batch size to use.
nb_t_steps_pad (str|(int, int)) – Number of time steps to pad/truncate to. Cab use ‘max’, ‘min’, or exact number e.g. (1024, 10).
shuffle (bool, optional) – Shuffle examples? Defaults to True.
drop_last (bool, optional) – Drop the last examples if not making a batch of batch_size? Defaults to True.
input_pad_at (str) – Pad input at the start or at the end?
output_pad_at (str) – Pad output at the start or at the end?
num_workers (int, optional) – Amount of workers, defaults to 1.
- Returns:
Dataloader for Clotho data.
- Return type:
torch.utils.data.dataloader.DataLoader
datasets.clotho.clotho_dataset module
Implements the CLOTHO dataset as a torch dataset.
- class datasets.clotho.clotho_dataset.ClothoDataset(data_dir: Path, split: AnyStr, input_field_name: AnyStr, output_field_name: AnyStr, load_into_memory: bool)
Bases:
DatasetImplements the CLOTHO dataset as a torch dataset.
- __init__(data_dir: Path, split: AnyStr, input_field_name: AnyStr, output_field_name: AnyStr, load_into_memory: bool) None
Initialization of a Clotho dataset object.
- Parameters:
data_dir (pathlib.Path) – Directory with data.
split (str) – Split to use (i.e. ‘development’, ‘evaluation’)
input_field_name (str) – Field name of the clotho data to be used as input data to the method.
output_field_name (str) – Field name of the clotho data to be used as output data to the method.
load_into_memory (bool) – Load all data into memory?
datasets.clotho.collate_fn module
Implements the collation function for CLOTHO data.
- datasets.clotho.collate_fn.clotho_collate_fn(batch: MutableSequence[ndarray], nb_t_steps: AnyStr | Tuple[int, int], input_pad_at: str, output_pad_at: str) Tuple[Tensor, Tensor]
Pads data.
- Parameters:
batch (list[numpy.ndarray]) – Batch data.
nb_t_steps (str|(int, int)) – Number of time steps to pad/truncate to. Cab use ‘max’, ‘min’, or exact number e.g. (1024, 10).
input_pad_at (str) – Pad input at the start or at the end?
output_pad_at (str) – Pad output at the start or at the end?
- Returns:
Padded data.
- Return type:
torch.Tensor, torch.Tensor
datasets.clotho.get_data module
Implements dataloader creators for the CLOTHO dataset used in MultiBench.
- datasets.clotho.get_data.get_dataloaders(path_to_clotho, input_modal='features', output_modal='words_ind', num_workers=1, shuffle_train=True, batch_size=20)
Get dataloaders for CLOTHO dataset.
- Parameters:
path_to_clotho (str) – Path to clotho dataset
input_modal (str, optional) – Input modality. Defaults to ‘features’.
output_modal (str, optional) – Output modality. Defaults to ‘words_ind’.
num_workers (int, optional) – Number of workers. Defaults to 1.
shuffle_train (bool, optional) – Whether to shuffle training data or not. Defaults to True.
batch_size (int, optional) – Batch size. Defaults to 20.
- Returns:
Tuple of (training dataloader, validation dataloader)
- Return type:
tuple
Module contents
Implements CLOTHO dataloaders.
- datasets.clotho.get_clotho_loader(data_dir: Path, split: str, input_field_name: str, output_field_name: str, load_into_memory: bool, batch_size: int, nb_t_steps_pad: AnyStr | Tuple[int, int], shuffle: bool | None = True, drop_last: bool | None = True, input_pad_at: str | None = 'start', output_pad_at: str | None = 'end', num_workers: int | None = 1) DataLoader
Gets the clotho data loader.
- Parameters:
data_dir (pathlib.Path) – Directory with data.
split (str) – Split to use (i.e. ‘development’, ‘evaluation’)
input_field_name (str) – Field name of the clotho data to be used as input data to the method.
output_field_name (str) – Field name of the clotho data to be used as output data to the method.
load_into_memory (bool) – Load all data into memory?
batch_size (int) – Batch size to use.
nb_t_steps_pad (str|(int, int)) – Number of time steps to pad/truncate to. Cab use ‘max’, ‘min’, or exact number e.g. (1024, 10).
shuffle (bool, optional) – Shuffle examples? Defaults to True.
drop_last (bool, optional) – Drop the last examples if not making a batch of batch_size? Defaults to True.
input_pad_at (str) – Pad input at the start or at the end?
output_pad_at (str) – Pad output at the start or at the end?
num_workers (int, optional) – Amount of workers, defaults to 1.
- Returns:
Dataloader for Clotho data.
- Return type:
torch.utils.data.dataloader.DataLoader