Intro

I remember volunteering at a hackathon and sitting in the award ceremony when I saw a group win in the "fun" category for creating a pet breed classifier. You give it an image and it'll tell you what breed it thinks it is and how confident it is. It was "fun" because you could override the threshold and allow images that aren't cats and dogs to be classified as a dog or cat breed. This blog post will show you how you can train your own pet breed classifer and how it isn't that hard nor time consuming to do so. You don't need a beefy computer either since you can use Colab's GPUs. Pet breed classifier meme

Training our own pet breed classifier

First, we'll download the Pet dataset and see what we're given:

path = untar_data(URLs.PETS)
Path.BASE_PATH = path

path.ls()

(#2) [Path('images'),Path('annotations')]

(path/'images').ls()

(#7393) [Path('images/english_setter_69.jpg'),Path('images/scottish_terrier_120.jpg'),Path('images/basset_hound_113.jpg'),Path('images/miniature_pinscher_87.jpg'),Path('images/pomeranian_1.jpg'),Path('images/Persian_68.jpg'),Path('images/japanese_chin_39.jpg'),Path('images/english_setter_107.jpg'),Path('images/Birman_128.jpg'),Path('images/staffordshire_bull_terrier_26.jpg')...]

In this dataset, there are two subfolders: images and annotations. images contains the images of the pet breeds (and their labels) while annotations contains the location of the pet in each image if you wanted to do localization.

The images are structured like so: the name of the pet breed with spaces turned into underscores, followed by a number. The name is capitalized if the pet is a cat. We can get the name of the pet breed by using regular expressions:

fname = (path/'images').ls()[0]
fname, fname.name

(Path('images/english_setter_69.jpg'), 'english_setter_69.jpg')

# ()    = extract what's in the parentheses -> .+
# .+    = any character appearing one or more times
# _     = followed by an underscore
# \d+   = followed by any digit appearing one or more times
# .jpg$ = with a .jpg extension at the end of the string
re.findall(r'(.+)_\d+.jpg$', fname.name)

['english_setter']

This time, we'll be using a DataBlock to create our DataLoaders

pets = DataBlock(
    blocks     = (ImageBlock, CategoryBlock),
    get_items  = partial(get_image_files, folders = 'images'),
    splitter   = RandomSplitter(),
    get_y      = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
    item_tfms  = Resize(460),
    batch_tfms = aug_transforms(size = 224, min_scale = 0.75))
dls = pets.dataloaders(path)

In our pets DataBlock, we give it the following parameters:

blocks = (ImageBlock, CategoryBlock): our independent variable is an image and our dependent variable is a category.
get_items = partial(get_image_files, folders = 'images'): we are getting our images recursively in the images folder. If you've used functional programming before, partial is like currying; we give a function some of its parameters and it returns another function that accepts the rest of its parameters, except partial allows us to specify which parameters we want to give.
splitter = RandomSplitter(): randomly splits our data into training and validation sets with a default 80:20 split. We can also specify a seed if we want to test how tuning our hyperparameters affects the final accuracy.

The final two parameters are part of "presizing":

item_tfms = Resize(460): picks a random area of an image (using its max width or height, whichever is smallest) and resizes it to 460x460. This process happens for all images in the dataset.
batch_tfms = aug_transforms(size = 224, min_scale = 0.75): take a random portion of the image which is at least 75% of it and resize to 224x224. This process happens for all images in a batch (like the batch we get when we call dls.one_batch()).

We first resize an image to a much larger size than our actual size for training so that we can avoid the data destruction done by data augmentation. The larger size allows tranformation of the data without creating empty areas.

Presizing (image taken from fastbook)

We can check if our DataLoaders is created successfully by using the .show_batch() feature:

dls.show_batch(nrows = 1, ncols = 4)

We can then do some Googling to make sure our images are labelled correctly.

Fastai also allows us to debug our DataBlock in case we make an error. It attemps to create a batch from the source:

pets.summary(path)

Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/oxford-iiit-pet
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/great_pyrenees_179.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=500x334
  Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/great_pyrenees_179.jpg
    applying partial gives
      great_pyrenees
    applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
      TensorCategory(21)

Final sample: (PILImage mode=RGB size=500x334, TensorCategory(21))


Collecting items from /root/.fastai/data/oxford-iiit-pet
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> RandomResizedCropGPU -- {'size': (224, 224), 'min_scale': 0.75, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'max_scale': 1.0, 'p': 1.0} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
    starting from
      (PILImage mode=RGB size=500x334, TensorCategory(21))
    applying Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} gives
      (PILImage mode=RGB size=460x460, TensorCategory(21))
    applying ToTensor gives
      (TensorImage of size 3x460x460, TensorCategory(21))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> RandomResizedCropGPU -- {'size': (224, 224), 'min_scale': 0.75, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'max_scale': 1.0, 'p': 1.0} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}
    starting from
      (TensorImage of size 4x3x460x460, TensorCategory([21, 30, 15,  2], device='cuda:0'))
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorImage of size 4x3x460x460, TensorCategory([21, 30, 15,  2], device='cuda:0'))
    applying Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} gives
      (TensorImage of size 4x3x460x460, TensorCategory([21, 30, 15,  2], device='cuda:0'))
    applying RandomResizedCropGPU -- {'size': (224, 224), 'min_scale': 0.75, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'max_scale': 1.0, 'p': 1.0} gives
      (TensorImage of size 4x3x224x224, TensorCategory([21, 30, 15,  2], device='cuda:0'))
    applying Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False} gives
      (TensorImage of size 4x3x224x224, TensorCategory([21, 30, 15,  2], device='cuda:0'))

Now, let's get to training our model. This time, we'll be fine tuning a pretrained model. This process is called transfer learning, where we take a pretrained model and retrain it on our data so that it can perform well for our task. We randomize the head (last layer) of our model, freeze the parameters of the earlier layers and train our model for one epoch. Then, we unfreeze the model and update the later layers of the model with a higher learning rate than the earlier layers.

The pretrained model we will be using is resnet34, which was trained on the ImageNet dataset with 34 layers:

learner = cnn_learner(dls, resnet34, metrics = accuracy)

lrs = learner.lr_find()

learner.fit_one_cycle(3, lr_max = lrs.valley)

learner.unfreeze()
lrs = learner.lr_find()

learner.fit_one_cycle(6, lr_max = lrs.valley)

When we use a pretrained model, fastai automatically freezes the early layers.We then train the head (last layer) of the model for 3 epochs so that it can get a sense of our objective. Then, we unfreeze the model and train all the layers for 6 more epochs. After training for a total of 9 epochs, we now have a model that can predict pet breeds accuractely 94% of the time. We can use fastai's confusion matrix to see where our model is having problems:

interp = ClassificationInterpretation.from_learner(learner)
interp.plot_confusion_matrix(figsize = (12, 12), dpi = 60)

interp.most_confused(5)

[('staffordshire_bull_terrier', 'american_pit_bull_terrier', 6),
 ('Ragdoll', 'Birman', 5),
 ('chihuahua', 'miniature_pinscher', 5)]

Using the .most_confused feature, it seems like most of the errors come from the pet breeds being very similar. We should be careful however, that we aren't overfitting on our validation set through changing hyperparameters. We can see that our training loss is always going down, but our validation loss fluctuates from going down and sometimes up.

And that's all there is to training a pet breed classifier. You could improve the accuracy by exploring deeper models like resnet50 which has 50 layers; training for more epochs (whether before unfreezing or after or both); using discriminative learning rates (giving lower learning rates or early laters using split(lr1, lr2) in the lr_max key-word argument in fit_one_cycle).

Using our own pet breed classifier

First, let's save the model using .export():

learner.export()

Then, let's load the .pkl file:

learn = load_learner('export.pkl')

Create some basic UI:

def pretty(name: str) -> str:
    return name.replace('_', ' ').lower()

def classify(a):
    if not btn_upload.data:
        lbl_pred.value = 'Please upload an image.'
        return
    img = PILImage.create(btn_upload.data[-1])
    pred, pred_idx, probs = learn.predict(img)
    out_pl.clear_output()
    with out_pl: 
        display(img.to_thumb(128, 128))
    lbl_pred.value = f'Looks like a {pretty(pred)} to me. I\'m {probs[pred_idx] * 100:.02f}% confident!'

btn_upload = widgets.FileUpload()
lbl_pred = widgets.Label()
out_pl = widgets.Output()
btn_run = widgets.Button(description = 'Classify')
btn_run.on_click(classify)

VBox([
      widgets.Label('Upload a pet!'),
      btn_upload,
      btn_run,
      out_pl,
      lbl_pred])

And there we have it! You can make it prettier and go win a hackathon.

However, a bit of a downside with deep learning is that it can only predict what it has been trained on. So, drawings of pets, night-time images of pets, and breeds that weren't included in the training set won't be accurately labelled.

We could solve the last case by turning this problem into a multi-label classification problem. Then, if we aren't confident that we have one of the known breeds, we can just say we don't know this breed.

Siamese pair

When I was watching the fastai lectures, I heard Jeremy talking about "siamese pairs" where you give the model two images and it will tell you if they are of the same breed. Now that we have a model, let's make it!

def pair(a):
    if not up1.data or not up2.data:
        lbl.value = 'Please upload images.'
        return
    im1 = PILImage.create(up1.data[-1])
    im2 = PILImage.create(up2.data[-1])
    pred1, x, _ = learn.predict(im1)
    pred2, y, _ = learn.predict(im2)
    out1.clear_output()
    out2.clear_output()
    with out1:
        display(im1.to_thumb(128, 128))
    with out2:
        display(im2.to_thumb(128, 128))
    if x == y:
        lbl.value = f'Wow, they\'re both {pretty(pred1)}(s)!'
    else:
        lbl.value = f'The first one seems to be {pretty(pred1)} while the second \
                      one is a(n) {pretty(pred2)}. I\'m not an expert, but they \
                      seem to be of different breeds, chief.'

up1 = widgets.FileUpload()
up2 = widgets.FileUpload()

lbl = widgets.Label()

out1 = widgets.Output()
out2 = widgets.Output()

run = widgets.Button(description = 'Classify')
run.on_click(pair)

VBox([
      widgets.Label("Siamese Pairs"),
      HBox([up1, up2]),
      run,
      HBox([out1, out2]),
      lbl
])

You can now test out these cells here!

epoch	train_loss	valid_loss	accuracy	time
0	1.542125	0.296727	0.900541	01:14
1	0.618474	0.227452	0.924222	01:13
2	0.401809	0.214500	0.932341	01:12

epoch	train_loss	valid_loss	accuracy	time
0	0.340459	0.213287	0.928281	01:16
1	0.341917	0.233392	0.921516	01:16
2	0.277254	0.187060	0.939107	01:16
3	0.191343	0.192029	0.938430	01:16
4	0.156336	0.178532	0.941813	01:16
5	0.123608	0.174198	0.939107	01:16