A pet breed classifier
If a team did it at a hackathon, surely I can too... right?
Intro
I remember volunteering at a hackathon and sitting in the award ceremony when I saw a group win in the "fun" category for creating a pet breed classifier. You give it an image and it'll tell you what breed it thinks it is and how confident it is. It was "fun" because you could override the threshold and allow images that aren't cats and dogs to be classified as a dog or cat breed. This blog post will show you how you can train your own pet breed classifer and how it isn't that hard nor time consuming to do so. You don't need a beefy computer either since you can use Colab's GPUs.
path = untar_data(URLs.PETS)
Path.BASE_PATH = path
path.ls()
(path/'images').ls()
In this dataset, there are two subfolders: images
and annotations
. images
contains the images of the pet breeds (and their labels) while annotations
contains the location of the pet in each image if you wanted to do localization.
The images are structured like so: the name of the pet breed with spaces turned into underscores, followed by a number. The name is capitalized if the pet is a cat. We can get the name of the pet breed by using regular expressions:
fname = (path/'images').ls()[0]
fname, fname.name
# () = extract what's in the parentheses -> .+
# .+ = any character appearing one or more times
# _ = followed by an underscore
# \d+ = followed by any digit appearing one or more times
# .jpg$ = with a .jpg extension at the end of the string
re.findall(r'(.+)_\d+.jpg$', fname.name)
This time, we'll be using a DataBlock
to create our DataLoaders
pets = DataBlock(
blocks = (ImageBlock, CategoryBlock),
get_items = partial(get_image_files, folders = 'images'),
splitter = RandomSplitter(),
get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
item_tfms = Resize(460),
batch_tfms = aug_transforms(size = 224, min_scale = 0.75))
dls = pets.dataloaders(path)
In our pets DataBlock
, we give it the following parameters:
blocks = (ImageBlock, CategoryBlock)
: our independent variable is an image and our dependent variable is a category.get_items = partial(get_image_files, folders = 'images')
: we are getting our images recursively in theimages
folder. If you've used functional programming before,partial
is like currying; we give a function some of its parameters and it returns another function that accepts the rest of its parameters, exceptpartial
allows us to specify which parameters we want to give.splitter = RandomSplitter()
: randomly splits our data into training and validation sets with a default80:20
split. We can also specify a seed if we want to test how tuning our hyperparameters affects the final accuracy.
The final two parameters are part of "presizing":
item_tfms = Resize(460)
: picks a random area of an image (using its max width or height, whichever is smallest) and resizes it to 460x460. This process happens for all images in the dataset.batch_tfms = aug_transforms(size = 224, min_scale = 0.75)
: take a random portion of the image which is at least 75% of it and resize to 224x224. This process happens for all images in a batch (like the batch we get when we calldls.one_batch()
).
We first resize an image to a much larger size than our actual size for training so that we can avoid the data destruction done by data augmentation. The larger size allows tranformation of the data without creating empty areas.
We can check if our DataLoaders
is created successfully by using the .show_batch()
feature:
dls.show_batch(nrows = 1, ncols = 4)
We can then do some Googling to make sure our images are labelled correctly.
Fastai also allows us to debug our DataBlock
in case we make an error. It attemps to create a batch from the source:
pets.summary(path)
Now, let's get to training our model. This time, we'll be fine tuning a pretrained model. This process is called transfer learning, where we take a pretrained model and retrain it on our data so that it can perform well for our task. We randomize the head (last layer) of our model, freeze the parameters of the earlier layers and train our model for one epoch. Then, we unfreeze the model and update the later layers of the model with a higher learning rate than the earlier layers.
The pretrained model we will be using is resnet34
, which was trained on the ImageNet dataset with 34 layers:
learner = cnn_learner(dls, resnet34, metrics = accuracy)
lrs = learner.lr_find()
learner.fit_one_cycle(3, lr_max = lrs.valley)
learner.unfreeze()
lrs = learner.lr_find()
learner.fit_one_cycle(6, lr_max = lrs.valley)
When we use a pretrained model, fastai automatically freezes the early layers.We then train the head (last layer) of the model for 3 epochs so that it can get a sense of our objective. Then, we unfreeze the model and train all the layers for 6 more epochs. After training for a total of 9 epochs, we now have a model that can predict pet breeds accuractely 94% of the time. We can use fastai's confusion matrix to see where our model is having problems:
interp = ClassificationInterpretation.from_learner(learner)
interp.plot_confusion_matrix(figsize = (12, 12), dpi = 60)
interp.most_confused(5)
Using the .most_confused
feature, it seems like most of the errors come from the pet breeds being very similar. We should be careful however, that we aren't overfitting on our validation set through changing hyperparameters. We can see that our training loss is always going down, but our validation loss fluctuates from going down and sometimes up.
And that's all there is to training a pet breed classifier. You could improve the accuracy by exploring deeper models like resnet50
which has 50 layers; training for more epochs (whether before unfreezing or after or both); using discriminative learning rates (giving lower learning rates or early laters using split(lr1, lr2)
in the lr_max
key-word argument in fit_one_cycle
).
learner.export()
Then, let's load the .pkl
file:
learn = load_learner('export.pkl')
Create some basic UI:
def pretty(name: str) -> str:
return name.replace('_', ' ').lower()
def classify(a):
if not btn_upload.data:
lbl_pred.value = 'Please upload an image.'
return
img = PILImage.create(btn_upload.data[-1])
pred, pred_idx, probs = learn.predict(img)
out_pl.clear_output()
with out_pl:
display(img.to_thumb(128, 128))
lbl_pred.value = f'Looks like a {pretty(pred)} to me. I\'m {probs[pred_idx] * 100:.02f}% confident!'
btn_upload = widgets.FileUpload()
lbl_pred = widgets.Label()
out_pl = widgets.Output()
btn_run = widgets.Button(description = 'Classify')
btn_run.on_click(classify)
VBox([
widgets.Label('Upload a pet!'),
btn_upload,
btn_run,
out_pl,
lbl_pred])
And there we have it! You can make it prettier and go win a hackathon.
However, a bit of a downside with deep learning is that it can only predict what it has been trained on. So, drawings of pets, night-time images of pets, and breeds that weren't included in the training set won't be accurately labelled.
We could solve the last case by turning this problem into a multi-label classification problem. Then, if we aren't confident that we have one of the known breeds, we can just say we don't know this breed.
def pair(a):
if not up1.data or not up2.data:
lbl.value = 'Please upload images.'
return
im1 = PILImage.create(up1.data[-1])
im2 = PILImage.create(up2.data[-1])
pred1, x, _ = learn.predict(im1)
pred2, y, _ = learn.predict(im2)
out1.clear_output()
out2.clear_output()
with out1:
display(im1.to_thumb(128, 128))
with out2:
display(im2.to_thumb(128, 128))
if x == y:
lbl.value = f'Wow, they\'re both {pretty(pred1)}(s)!'
else:
lbl.value = f'The first one seems to be {pretty(pred1)} while the second \
one is a(n) {pretty(pred2)}. I\'m not an expert, but they \
seem to be of different breeds, chief.'
up1 = widgets.FileUpload()
up2 = widgets.FileUpload()
lbl = widgets.Label()
out1 = widgets.Output()
out2 = widgets.Output()
run = widgets.Button(description = 'Classify')
run.on_click(pair)
VBox([
widgets.Label("Siamese Pairs"),
HBox([up1, up2]),
run,
HBox([out1, out2]),
lbl
])
You can now test out these cells here!