datasets.imdb package

Submodules

datasets.imdb.get_data module

Implements dataloaders for IMDB dataset.

class datasets.imdb.get_data.IMDBDataset(file: h5py.File, start_ind: int, end_ind: int, vggfeature: bool = False)

Bases: Dataset

Implements a torch Dataset class for the imdb dataset.

__init__(file: h5py.File, start_ind: int, end_ind: int, vggfeature: bool = False) → None

Initialize IMDBDataset object.

Parameters:

file (h5py.File) – h5py file of data
start_ind (int) – Starting index for dataset
end_ind (int) – Ending index for dataset
vggfeature (bool, optional) – Whether to return pre-processed vgg_features or not. Defaults to False.

class datasets.imdb.get_data.IMDBDataset_robust(dataset, start_ind: int, end_ind: int)

Bases: Dataset

Implements a torch Dataset class for the imdb dataset that uses robustness measures as data augmentation.

__init__(dataset, start_ind: int, end_ind: int) → None

Initialize IMDBDataset_robust object.

Parameters:

file (h5py.File) – h5py file of data
start_ind (int) – Starting index for dataset
end_ind (int) – Ending index for dataset
vggfeature (bool, optional) – Whether to return pre-processed vgg_features or not. Defaults to False.

datasets.imdb.get_data.get_dataloader(path: str, test_path: str, num_workers: int = 8, train_shuffle: bool = True, batch_size: int = 40, vgg: bool = False, skip_process=False, no_robust=False) → Tuple[Dict]

Get dataloaders for IMDB dataset.

Parameters:

path (str) – Path to training datafile.
test_path (str) – Path to test datafile.
num_workers (int, optional) – Number of workers to load data in. Defaults to 8.
train_shuffle (bool, optional) – Whether to shuffle training data or not. Defaults to True.
batch_size (int, optional) – Batch size of data. Defaults to 40.
vgg (bool, optional) – Whether to return raw images or pre-processed vgg features. Defaults to False.
skip_process (bool, optional) – Whether to pre-process data or not. Defaults to False.
no_robust (bool, optional) – Whether to not use robustness measures as augmentation. Defaults to False.

Returns:

Tuple of Training dataloader, Validation dataloader, Test Dataloader

Return type:

Tuple[Dict]

datasets.imdb.vgg module

Implements VGG pre-processer for IMDB data.

class datasets.imdb.vgg.VGGClassifier(model_path='vgg.tar', synset_words='synset_words.txt')

Bases: object

Implements VGG classifier instance.

__init__(model_path='vgg.tar', synset_words='synset_words.txt')

Instantiate VGG classifier instance.

Parameters:

model_path (str, optional) – VGGNet weight file. Defaults to ‘vgg.tar’.
synset_words (str, optional) – Path to synset words. Defaults to ‘synset_words.txt’.

classify(image, top=1)

Classify an image with the 1000 concepts of the ImageNet dataset.

Image:: numpy image or image path.
Top:: Number of top classes for this image.
Returns:: list of strings with synsets predicted by the VGG model.

get_features(image)

Return the activations of the last hidden layer for a given image.

Image:: numpy image or image path.
Returns:: numpy vector with 4096 activations.

resize_and_crop_image(output_box=[224, 224], fit=True)

Downsample the image.

Sourced from https://github.com/BVLC/caffe/blob/master/tools/extra/resize_and_crop_images.py

class datasets.imdb.vgg.VGGNet(*args: Any, **kwargs: Any)

Bases: FeedforwardSequence

Implements VGG pre-processor.

__init__(**kwargs): Instantiate VGG pre-processor instance.

datasets.imdb package

Submodules

datasets.imdb.get_data module

datasets.imdb.vgg module

Module contents