datasets.imdb package
Submodules
datasets.imdb.get_data module
Implements dataloaders for IMDB dataset.
- class datasets.imdb.get_data.IMDBDataset(file: h5py.File, start_ind: int, end_ind: int, vggfeature: bool = False)
Bases:
DatasetImplements a torch Dataset class for the imdb dataset.
- __init__(file: h5py.File, start_ind: int, end_ind: int, vggfeature: bool = False) None
Initialize IMDBDataset object.
- Parameters:
file (h5py.File) – h5py file of data
start_ind (int) – Starting index for dataset
end_ind (int) – Ending index for dataset
vggfeature (bool, optional) – Whether to return pre-processed vgg_features or not. Defaults to False.
- class datasets.imdb.get_data.IMDBDataset_robust(dataset, start_ind: int, end_ind: int)
Bases:
DatasetImplements a torch Dataset class for the imdb dataset that uses robustness measures as data augmentation.
- __init__(dataset, start_ind: int, end_ind: int) None
Initialize IMDBDataset_robust object.
- Parameters:
file (h5py.File) – h5py file of data
start_ind (int) – Starting index for dataset
end_ind (int) – Ending index for dataset
vggfeature (bool, optional) – Whether to return pre-processed vgg_features or not. Defaults to False.
- datasets.imdb.get_data.get_dataloader(path: str, test_path: str, num_workers: int = 8, train_shuffle: bool = True, batch_size: int = 40, vgg: bool = False, skip_process=False, no_robust=False) Tuple[Dict]
Get dataloaders for IMDB dataset.
- Parameters:
path (str) – Path to training datafile.
test_path (str) – Path to test datafile.
num_workers (int, optional) – Number of workers to load data in. Defaults to 8.
train_shuffle (bool, optional) – Whether to shuffle training data or not. Defaults to True.
batch_size (int, optional) – Batch size of data. Defaults to 40.
vgg (bool, optional) – Whether to return raw images or pre-processed vgg features. Defaults to False.
skip_process (bool, optional) – Whether to pre-process data or not. Defaults to False.
no_robust (bool, optional) – Whether to not use robustness measures as augmentation. Defaults to False.
- Returns:
Tuple of Training dataloader, Validation dataloader, Test Dataloader
- Return type:
Tuple[Dict]
datasets.imdb.vgg module
Implements VGG pre-processer for IMDB data.
- class datasets.imdb.vgg.VGGClassifier(model_path='vgg.tar', synset_words='synset_words.txt')
Bases:
objectImplements VGG classifier instance.
- __init__(model_path='vgg.tar', synset_words='synset_words.txt')
Instantiate VGG classifier instance.
- Parameters:
model_path (str, optional) – VGGNet weight file. Defaults to ‘vgg.tar’.
synset_words (str, optional) – Path to synset words. Defaults to ‘synset_words.txt’.
- classify(image, top=1)
Classify an image with the 1000 concepts of the ImageNet dataset.
- Image:
numpy image or image path.
- Top:
Number of top classes for this image.
- Returns:
list of strings with synsets predicted by the VGG model.
- get_features(image)
Return the activations of the last hidden layer for a given image.
- Image:
numpy image or image path.
- Returns:
numpy vector with 4096 activations.
- resize_and_crop_image(output_box=[224, 224], fit=True)
Downsample the image.
Sourced from https://github.com/BVLC/caffe/blob/master/tools/extra/resize_and_crop_images.py