datasets.clotho package

Submodules

datasets.clotho.clotho_data_loader module

Implements the creation of the dataloader for CLOTHO.

datasets.clotho.clotho_data_loader.get_clotho_loader(data_dir: Path, split: str, input_field_name: str, output_field_name: str, load_into_memory: bool, batch_size: int, nb_t_steps_pad: AnyStr | Tuple[int, int], shuffle: bool | None = True, drop_last: bool | None = True, input_pad_at: str | None = 'start', output_pad_at: str | None = 'end', num_workers: int | None = 1) DataLoader

Gets the clotho data loader.

Parameters:
  • data_dir (pathlib.Path) – Directory with data.

  • split (str) – Split to use (i.e. ‘development’, ‘evaluation’)

  • input_field_name (str) – Field name of the clotho data to be used as input data to the method.

  • output_field_name (str) – Field name of the clotho data to be used as output data to the method.

  • load_into_memory (bool) – Load all data into memory?

  • batch_size (int) – Batch size to use.

  • nb_t_steps_pad (str|(int, int)) – Number of time steps to pad/truncate to. Cab use ‘max’, ‘min’, or exact number e.g. (1024, 10).

  • shuffle (bool, optional) – Shuffle examples? Defaults to True.

  • drop_last (bool, optional) – Drop the last examples if not making a batch of batch_size? Defaults to True.

  • input_pad_at (str) – Pad input at the start or at the end?

  • output_pad_at (str) – Pad output at the start or at the end?

  • num_workers (int, optional) – Amount of workers, defaults to 1.

Returns:

Dataloader for Clotho data.

Return type:

torch.utils.data.dataloader.DataLoader

datasets.clotho.clotho_dataset module

Implements the CLOTHO dataset as a torch dataset.

class datasets.clotho.clotho_dataset.ClothoDataset(data_dir: Path, split: AnyStr, input_field_name: AnyStr, output_field_name: AnyStr, load_into_memory: bool)

Bases: Dataset

Implements the CLOTHO dataset as a torch dataset.

__init__(data_dir: Path, split: AnyStr, input_field_name: AnyStr, output_field_name: AnyStr, load_into_memory: bool) None

Initialization of a Clotho dataset object.

Parameters:
  • data_dir (pathlib.Path) – Directory with data.

  • split (str) – Split to use (i.e. ‘development’, ‘evaluation’)

  • input_field_name (str) – Field name of the clotho data to be used as input data to the method.

  • output_field_name (str) – Field name of the clotho data to be used as output data to the method.

  • load_into_memory (bool) – Load all data into memory?

datasets.clotho.collate_fn module

Implements the collation function for CLOTHO data.

datasets.clotho.collate_fn.clotho_collate_fn(batch: MutableSequence[ndarray], nb_t_steps: AnyStr | Tuple[int, int], input_pad_at: str, output_pad_at: str) Tuple[Tensor, Tensor]

Pads data.

Parameters:
  • batch (list[numpy.ndarray]) – Batch data.

  • nb_t_steps (str|(int, int)) – Number of time steps to pad/truncate to. Cab use ‘max’, ‘min’, or exact number e.g. (1024, 10).

  • input_pad_at (str) – Pad input at the start or at the end?

  • output_pad_at (str) – Pad output at the start or at the end?

Returns:

Padded data.

Return type:

torch.Tensor, torch.Tensor

datasets.clotho.get_data module

Implements dataloader creators for the CLOTHO dataset used in MultiBench.

datasets.clotho.get_data.get_dataloaders(path_to_clotho, input_modal='features', output_modal='words_ind', num_workers=1, shuffle_train=True, batch_size=20)

Get dataloaders for CLOTHO dataset.

Parameters:
  • path_to_clotho (str) – Path to clotho dataset

  • input_modal (str, optional) – Input modality. Defaults to ‘features’.

  • output_modal (str, optional) – Output modality. Defaults to ‘words_ind’.

  • num_workers (int, optional) – Number of workers. Defaults to 1.

  • shuffle_train (bool, optional) – Whether to shuffle training data or not. Defaults to True.

  • batch_size (int, optional) – Batch size. Defaults to 20.

Returns:

Tuple of (training dataloader, validation dataloader)

Return type:

tuple

Module contents

Implements CLOTHO dataloaders.

datasets.clotho.get_clotho_loader(data_dir: Path, split: str, input_field_name: str, output_field_name: str, load_into_memory: bool, batch_size: int, nb_t_steps_pad: AnyStr | Tuple[int, int], shuffle: bool | None = True, drop_last: bool | None = True, input_pad_at: str | None = 'start', output_pad_at: str | None = 'end', num_workers: int | None = 1) DataLoader

Gets the clotho data loader.

Parameters:
  • data_dir (pathlib.Path) – Directory with data.

  • split (str) – Split to use (i.e. ‘development’, ‘evaluation’)

  • input_field_name (str) – Field name of the clotho data to be used as input data to the method.

  • output_field_name (str) – Field name of the clotho data to be used as output data to the method.

  • load_into_memory (bool) – Load all data into memory?

  • batch_size (int) – Batch size to use.

  • nb_t_steps_pad (str|(int, int)) – Number of time steps to pad/truncate to. Cab use ‘max’, ‘min’, or exact number e.g. (1024, 10).

  • shuffle (bool, optional) – Shuffle examples? Defaults to True.

  • drop_last (bool, optional) – Drop the last examples if not making a batch of batch_size? Defaults to True.

  • input_pad_at (str) – Pad input at the start or at the end?

  • output_pad_at (str) – Pad output at the start or at the end?

  • num_workers (int, optional) – Amount of workers, defaults to 1.

Returns:

Dataloader for Clotho data.

Return type:

torch.utils.data.dataloader.DataLoader