Fast.ai Lesson 3 of 7: Multi-label; Segmentation

Notes from Practical Deep Learning for Coders 2019 Lesson 3 (Part 1)

14 min readAug 12, 2019

Other lessons: Lesson 1 / Lesson 2 / Lesson 4 / Lesson 5 / Lesson 6 / Lesson 7

Quick links: Fast.ai course page / Lecture / Jupyter Notebooks

This lesson is about multi-label prediction, and we’re starting off with the Planet Amazon dataset from Kaggle.

Install Kaggle in Jupyter with:

! {sys.executable} -m pip install kaggle --upgrade

Create credentials on Kaggle and download the data. Then, unzip it with 7zip

Now, we can take a look at the data. We use pandas to read the data.

Each picture can have multiple labels. The csv file containing the labels shows that each image name is associated with several tags separated by spaces:

The type of data object we use for modeling is the DataBunch class. Once we have the databunch, we can create a CNN with it and start training. The trickiest step previously in deep learning used to be getting the data in a format that’s good for training. But we can now use factory methods in which you specify what kind of data you want based on a given source. But sometimes you want more flexibility. For this, we have the DataBlock API.

tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)

Define some transformations, which will be passed in to the data object.

get_transform will by default flip each image randomly (horizontally), but we want it to do it vertically.

max_warp means perspective warping. Looking from below or above will cause the shape to change. You want to include that while creating the training batches — modify them a little bit. But for satellite images, they’re always looking from above. It doesn’t make sense in this use case so we want to turn the max_warp off

np.random.seed(42) src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg').split_by_rand_pct(0.2)        .label_from_df(label_delim=' '))

You can specify the image files in a given folder, a given suffix, separator, and that you want a random validation set with 20% of the data, etc.

data = (src.transform(tfms, size=128)      .databunch().normalize(imagenet_stats))

Finally, we set up the datasets with the source above, and transform with the tfms defined earlier. And we’ll create a databunch.

Relevant PyTorch clasess

Dataset

__getitem()__ and __len()__ mean that you can index into the Dataset object and get its length. Neither of them are implemented, which means that the Dataset abstract class just tells you what the Dataset needs to do. Fast.ai has lots of Dataset subclasses that implement those functions
A Dataset is not enough to train a model. We have to have a few images at a time so that our GPU can work in parallel. We use minibatch — few items at a time for the model to train in parralel. To create a minibatch, we use another PyTorch class called a DataLoader

DataLoader

Takes in a dataset, grabs items at random and creates a batch of the size you pass in. Wraps individual random items, create a batch, pop them onto the GPU and send it to the model

These still aren’t enough to train a model. Enter DataBunch, which binds together a training model and a validator

DataBunch

Takes in a training set DataLoader and validation set DataLoader. Can send it off to a loader
The src above will create the dataset: where the images come from, where the labels come from and create 2 datasets (training and validation). It will return 2 things: images and labels
The data will create the DataLoader and DataBunch in one go

DataBlock API Examples

https://github.com/fastai/fastai/blob/master/docs_src/data_block.ipynb

data_block

The data block API lets you customize the creation of a by isolating the underlying parts of that process in separate…

docs.fast.ai

This is saying that the data comes from a list of ImageList files which are in some folder, and that they’re labeled according to the folder name. Split it into training and validation based on the folder that they’re in. You can optionally add a test set. Transform the dataset based on a set of transforms defined earlier, and convert into a databunch. Each of the stages can also take in parameters. And then data is an actual Dataset class so you can index into it

We can also specify how we’re getting the labels by setting a get_y_fn and passing it to label_from_func()

A lot more examples here. Remember you can use shift + tab in the jupyter notebook to bring up documentation

Multiclassification

I installed fast.ai in my local shell ssh’d into my compute engine vm:

jupyter@my-fastai-instance:~/course-v3$ conda install -c fastai fastai

To create a Learner, we’ll set the base architecture to restnet50 again. Metrics are a little bit different: Use accuracy_thresh instead of accuracy

In Lesson 1, we determined the prediction for a given class by picking the final activation that was the biggest, but here there is no 1 label we’re looking for. There are several labels. If you run data.c you can see how many classes we want to get probabilities for — this the same as the number of classes. So if you run len(data.classes) it will also be 17

So instead, we will find accuracy thresholds beyond which we can assume the model is saying it does have that feature. We want to pick that threshold.

Each activation can be 0. or 1. Let’s say a reasonable threshold is 0.2. accuracy_thresh will select the data that are above a certain threshold and compare them to the ground truth.

We use acc_02 = partial() to pass in a function accuracy_thresh and parameters. So acc_02 is a new function that calls the accuracy_thresh function with the parameter thresh=0.2

Fbeta is the metric that was used by Kaggle in this competition

Note: metrics has nothing to do with how a model trains. Changing your metrics will not change your resulting model at all. It's just for logging during training. The metrics we want to know in this case are the accuracy (acc_02 and f_score which is related to how Kaggle would judge the model). The f_score shows the relationship between false positives and false negatives

Find a good learning rate with learn.lr_find() — it should be the part with the steepest slope. Then run the learner

Accuracy is not bad, at around 95%

Next, unfreeze and fine-tune the model. Before you unfreeze, the learning rate shape will usually look like the diagram above where there’s a steep downward slope (not the actual bottom). When we call learn.recorder.plot() again after unfreezing, the shape is very different:

Look for the part that’s right before the steep increase, and go back about 10x.

So 1e-05: use this for the first half of the slice and for the second half, use whatever learning rate you used for the frozen part divided by 5. This is basically Jeremy Howard’s rule of thumb (empirical).

learn.fit_one_cycle(5, slice(1e-5, lr/5))

The accuracy_thresh went up by a bit.

Remember that the dataset took the image size as 128x128. But actually the images from the Kaggle dataset is 256x256. We set it to 128 partially because we wanted to experiment quickly — it’s easier to experiment with smaller images. But now I have a model that’s pretty good at recognizing contents of 128x128. So how do we apply the model to 256x256? Use transfer learning. Start with the model that’s good for 128x128 and fine-tune (instead of starting again).

Let’s keep the same learner but use a new databunch, which instead of 128x128 is 256x256:

# create a new databunch using 256 instead of 128 data = (src.transform(tfms, size=256)         .databunch().normalize(imagenet_stats))  # use the same learner, but re-assign the data learn.data = data data.train_ds[0][0].shape

We freeze again, so that we go back to training just the last few layers, and do a new lr.find()

Jupyter was giving me an error:

RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 7.43 GiB total capacity; 6.75 GiB already allocated; 12.94 MiB free; 50.78 MiB cached)

Solution was to restart the kernel. Saw some relevant posts on the forum.

If you run lr_find and plot again, the graph won't have the same sharp slope as before (because the model is already pretty good) but you can identify at what point it becomes too high, and use that as the new lr.

Image segmentation with CamVid

Prepare the data (untar, see what’s in the folder). The data is special in this case because they’re color-coded.

When looking at the image files, you’ll notice some coded filenames for the images and labels — they might be segment maps between the samples and the labels.

View the images in the set:

By inferring from the mapping structure, you can create a function to convert from image filenames to corresponding label filenames

get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'

Next, use open_mask (since we want integers, not floats) to make sure it works

Normally, we use open_image to look at the file but this is not a usual image file — it contains integers. So use open_mask so you can see the color-coded images

We have a file called codes.txt and one called valid.txt

codes.txt is telling us which object corresponds to the number we see in the data:

4 is ‘Building’, and that orange chunk in the top left corner is in fact a building.

Datasets

Again, using the DataBlock API we can set the data source and create databunch objects:

We can pass in the classes parameter. In previous lessons, we had a string as the label. But in this case, it's a number. We can still pass in the list of what the numbers in to the DataBlock API.

For transformations: Sometimes we randomly fit the image. If we randomly fit the independent variable (x) image, but we don’t randomly fit the color-coded (y) image. We need to tell fast.ai to transform the y as well — whatever you do to the x, you need to do with the y.

Using a smaller batch size — creating a classifier for every pixel is going to take up a lot of GPU. Fast.ai will combine the two pieces for you and it will actually color-code the photo when you run show_batch()

Model

Accuracy in pixel segmentation: correctly classified pixels / total pixels

We’re using a acc_camvid instead of accuracy

This is more specific to this particular set of data, because you could actually just use accuracy. The reason for acc_camvid is because some pixels were labeled as 'Void'. And in the CamVid paper they said you should remove the void pixels when reporting accuracy.

Similar to the steps in the Planet dataset, we’re going to initialize a learner and run it.

For segmentation, we’ll be using the unet_learner instead of a cnn_learner

The U-Net learner is a CNN architecture for fast and precise segmentation of images. We pass it the normal stuff — databunch, architecture, metrics. The rest is the same — find the learning rate, save it from time to time, unfreeze and train a bit more.

Tip if you’re running out of memory often:

Use mixed-precision training. Instead of using single-precision floating point numbers, you can use to_fp16() to do most of the calculations in the model with half-precision floating point numbers. So 16 bits instead of 32. But you need the most recent CUDA drivers for this to even work because it's so new. But if you do have that very recent GPU, it will also work twice as fast! Results can sometimes be better.

See how the learner is doing with show_results()

plot losses with learn.recorder.plot_losses() and learning rate with learn.recorder.plot_lr()

Notice that the losses go up a bit before going down.

And why does the learning rate go up and then down? Because we use fit_one_cycle()

It makes the learning rate start low, go up and then go down again.

Getting the right learning rate is important because it minimizes the loss faster. You want your learning rate to decrease so that as you get closer to the minimum loss, you take smaller and smaller steps.

This is an illustration of the ideal learning rate: start from top right, rate is still changing/adapting but when it gets close to the bottom (lowest point), it should kind of just stay there — taking smaller and smaller steps.

Summary of the iterative training process:

Prepare the data with src = ImageList or SegmentationItemList or ImagePoints and generate DataBunch objects with src.transform().databunch().normalize()
Specify the architecture (i.e. models.resnet50)
Specify the metrics metrics, i.e. accuracy_thresh or accuracy or f_score
Choose a learner, i.e. cnn_learner or unet_learner. Remember, what a learner does is it’s is minimizing a loss function by finding the lowest point of that function
Find the learning rate by first looking for the ideal lr using learn.lr_find() and plot it with learn.recorder.plot()
Run the learner with learn.fit_one_cycle()
Save this stage with learn.save()
Inspect results with learn.show_results()
Unfreeze the model with learn.load(<stage>) and learn.unfreeze()
Update the learning rate with learner.lr_find()
Run learn.fit_one_cycle() again, with the new lr using slice()

Regression with BIWI head pose dataset

This is a different type of dataset — pictures with a dot at the center of a person’s face. So we’ll train a model to recognize the center of the face.

There are a few pre-processing steps that are particular to this dataset. So there are some conversion routines that we don’t need to spend too much time on.

But it’s important to note that the input will be ImagePoints instead of images.

def get_ip(img,pts): return ImagePoints(FlowField(img.size, pts), scale=True)

So these image points are coordinates indicating the center of the face. So we’re creating a model that can pinpoint the center of the face, which means we need a model that can spit out two numbers. This is not a classification model. It’s a regression model: any kind of model that has a continuous output. The good news is that we can approach it the same way as before.

data = (PointsItemList.from_folder(path)         .split_by_valid_func(lambda o: o.parent.name=='13')         .label_from_func(get_ctr).transform(get_transforms(), tfm_y=True, size=(120,160)).databunch().normalize(imagenet_stats))

Validation set in this case is just a particular person in the set. Hence the argument into the split_by_valid_func() above.

We again create a CNN. While traditionally we use cross-entropy loss for classification, for regression we’ll use the Mean Square Error loss function.

learn = cnn_learner(data, models.resnet34) 
learn.loss_func = MSELossFlat()

Find a learning rate with lr_find() and run the learner. We want the valid_loss to go lower and lower. Inspect the results with show_results()

IMDB

Let’s do some NLP! Let’s classify some documents. Each row in the data has a review (text) and a label (positive, negative), and a flag to indicate if it should be part of the validation set or training set. But we can just ignore this flag and create a DataBunch containing the data

data_lm = TextDataBunch.from_csv(path, 'texts.csv')

Images could be fed almost directly into the model because images are just a big array of pixel values, which are floats between 0 and 1.

But a piece of text has words, and we need to convert those words to numbers. This is done with 2 steps:

Tokenization
Numericalization

The TextDataBunch class does all of that for you.

data_lm = TextDataBunch.from_csv(path, 'texts.csv')

Data can be saved and loaded:

data_lm.save()
data = load_data(path)

Tokenization

Each token is a word, or individual text units.

data = TextClasDataBunch.from_csv(path, 'texts.csv')
data.show_batch()

Words that are extremely rare/unusual will be replaced with the UNK token.

Numericalization

We now convert tokens into integers by creating a list of all words used. The ids to tokens mapping is stored in data.vocab in a dictionary called itos (for int to string). Use data.train_ds to see the training dataset

Again, we can use the DataBlock API with NLP. We can call the tokenize and numericalize steps, allowing more flexibility

data = (TextList.from_csv(path, 'texts.csv', cols='text')
                .split_from_df(col=2)
                .label_from_df(cols=0)
                .databunch())

Language models can use a lot of GPU, so we want to decrease the batch size. Set bs = 48

The reviews are in a training and test set following an imagenet structure. In addition to train and test data, there's an unsup folder that contains unlabelled data.

We’re not going to train a model that classifies the reviews from scratch. Like in computer vision, we’ll use a model pretrained on a bigger dataset (a cleaned subset of wikipedia called wikitext-103). That model has been trained to guess what the next word is, its input being all the previous words. It has a recurrent structure and a hidden state that is updated each time it sees a new word. This hidden state thus contains information about the sentence up to that point.

We still need to fine-tune the pretrained model to our particular dataset. So we use the unlabelled data for this:

data_lm = (TextList.from_folder(path)
           #Inputs: all the text files in path
            .filter_by_folder(include=['train', 'test', 'unsup']) 
           #We may have other temp folders that contain text files so we only keep what's in train and test
            .split_by_rand_pct(0.1)
           #We randomly split and keep 10% (10,000 reviews) for validation
            .label_for_lm()           
           #We want to do a language model so we label accordingly
            .databunch(bs=bs))
data_lm.save('data_lm.pkl')

We use a special kind of TextDataBunch that ignores the labels, shuffles the texts at each epoch before concatenating them all together (only for training), and sends batches that read that text in order with targets that are the next word in the sentence.

We saved the data in data_lm.pkl, so we can load it and see the data:

Use a language_model_learner and follow the same steps for finding the learning rate, running the learner and tuning the lr.