Fast.ai Lesson 3 of 7: Multi-label; Segmentation
Notes from Practical Deep Learning for Coders 2019 Lesson 3 (Part 1)
Other lessons: Lesson 1 / Lesson 2 / Lesson 4 / Lesson 5 / Lesson 6 / Lesson 7
Quick links: Fast.ai course page / Lecture / Jupyter Notebooks
This lesson is about multi-label prediction, and we’re starting off with the Planet Amazon dataset from Kaggle.
Install Kaggle in Jupyter with:
! {sys.executable} -m pip install kaggle --upgrade
Create credentials on Kaggle and download the data. Then, unzip it with 7zip
Now, we can take a look at the data. We use pandas to read the data.
Each picture can have multiple labels. The csv file containing the labels shows that each image name is associated with several tags separated by spaces:
The type of data object we use for modeling is the DataBunch class. Once we have the databunch, we can create a CNN with it and start training. The trickiest step previously in deep learning used to be getting the data in a format that’s good for training. But we can now use factory methods in which you specify what kind of data you want based on a given source. But sometimes you want more flexibility. For this, we have the DataBlock API.
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
Define some transformations, which will be passed in to the data object.
get_transform
will by default flip each image randomly (horizontally), but we want it to do it vertically.
max_warp
means perspective warping. Looking from below or above will cause the shape to change. You want to include that while creating the training batches — modify them a little bit. But for satellite images, they’re always looking from above. It doesn’t make sense in this use case so we want to turn the max_warp
off
np.random.seed(42) src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg').split_by_rand_pct(0.2) .label_from_df(label_delim=' '))
You can specify the image files in a given folder, a given suffix, separator, and that you want a random validation set with 20% of the data, etc.
data = (src.transform(tfms, size=128) .databunch().normalize(imagenet_stats))
Finally, we set up the datasets with the source above, and transform with the tfms
defined earlier. And we’ll create a databunch.
Relevant PyTorch clasess
__getitem()__
and__len()__
mean that you can index into the Dataset object and get its length. Neither of them are implemented, which means that the Dataset abstract class just tells you what the Dataset needs to do. Fast.ai has lots of Dataset subclasses that implement those functions- A Dataset is not enough to train a model. We have to have a few images at a time so that our GPU can work in parallel. We use minibatch — few items at a time for the model to train in parralel. To create a minibatch, we use another PyTorch class called a DataLoader
- Takes in a dataset, grabs items at random and creates a batch of the size you pass in. Wraps individual random items, create a batch, pop them onto the GPU and send it to the model
These still aren’t enough to train a model. Enter DataBunch, which binds together a training model and a validator
- Takes in a training set DataLoader and validation set DataLoader. Can send it off to a loader
- The
src
above will create the dataset: where the images come from, where the labels come from and create 2 datasets (training and validation). It will return 2 things: images and labels - The
data
will create the DataLoader and DataBunch in one go
DataBlock API Examples
https://github.com/fastai/fastai/blob/master/docs_src/data_block.ipynb
This is saying that the data comes from a list of ImageList files which are in some folder, and that they’re labeled according to the folder name. Split it into training and validation based on the folder that they’re in. You can optionally add a test set. Transform the dataset based on a set of transforms defined earlier, and convert into a databunch. Each of the stages can also take in parameters. And then data
is an actual Dataset
class so you can index into it
We can also specify how we’re getting the labels by setting a get_y_fn
and passing it to label_from_func()
A lot more examples here. Remember you can use shift + tab in the jupyter notebook to bring up documentation
Multiclassification
I installed fast.ai in my local shell ssh’d into my compute engine vm:
jupyter@my-fastai-instance:~/course-v3$ conda install -c fastai fastai
To create a Learner
, we’ll set the base architecture to restnet50
again. Metrics are a little bit different: Use accuracy_thresh
instead of accuracy
In Lesson 1, we determined the prediction for a given class by picking the final activation that was the biggest, but here there is no 1 label we’re looking for. There are several labels. If you run data.c
you can see how many classes we want to get probabilities for — this the same as the number of classes. So if you run len(data.classes)
it will also be 17
So instead, we will find accuracy thresholds beyond which we can assume the model is saying it does have that feature. We want to pick that threshold.
Each activation can be 0. or 1. Let’s say a reasonable threshold is 0.2. accuracy_thresh
will select the data that are above a certain threshold and compare them to the ground truth.
We use acc_02 = partial()
to pass in a function accuracy_thresh
and parameters. So acc_02
is a new function that calls the accuracy_thresh
function with the parameter thresh=0.2
Fbeta
is the metric that was used by Kaggle in this competition
Note: metrics
has nothing to do with how a model trains. Changing your metrics will not change your resulting model at all. It's just for logging during training. The metrics we want to know in this case are the accuracy (acc_02
and f_score
which is related to how Kaggle would judge the model). The f_score
shows the relationship between false positives and false negatives
Find a good learning rate with learn.lr_find()
— it should be the part with the steepest slope. Then run the learner
Accuracy is not bad, at around 95%
Next, unfreeze and fine-tune the model. Before you unfreeze, the learning rate shape will usually look like the diagram above where there’s a steep downward slope (not the actual bottom). When we call learn.recorder.plot()
again after unfreezing, the shape is very different:
Look for the part that’s right before the steep increase, and go back about 10x.
So 1e-05: use this for the first half of the slice and for the second half, use whatever learning rate you used for the frozen part divided by 5. This is basically Jeremy Howard’s rule of thumb (empirical).
learn.fit_one_cycle(5, slice(1e-5, lr/5))
The accuracy_thresh
went up by a bit.
Remember that the dataset took the image size as 128x128. But actually the images from the Kaggle dataset is 256x256. We set it to 128 partially because we wanted to experiment quickly — it’s easier to experiment with smaller images. But now I have a model that’s pretty good at recognizing contents of 128x128. So how do we apply the model to 256x256? Use transfer learning. Start with the model that’s good for 128x128 and fine-tune (instead of starting again).
Let’s keep the same learner but use a new databunch, which instead of 128x128 is 256x256:
# create a new databunch using 256 instead of 128 data = (src.transform(tfms, size=256) .databunch().normalize(imagenet_stats)) # use the same learner, but re-assign the data learn.data = data data.train_ds[0][0].shape
We freeze again, so that we go back to training just the last few layers, and do a new lr.find()
Jupyter was giving me an error:
RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 7.43 GiB total capacity; 6.75 GiB already allocated; 12.94 MiB free; 50.78 MiB cached)
Solution was to restart the kernel. Saw some relevant posts on the forum.
If you run lr_find
and plot again, the graph won't have the same sharp slope as before (because the model is already pretty good) but you can identify at what point it becomes too high, and use that as the new lr
.
Image segmentation with CamVid
Prepare the data (untar, see what’s in the folder). The data is special in this case because they’re color-coded.
When looking at the image files, you’ll notice some coded filenames for the images and labels — they might be segment maps between the samples and the labels.
View the images in the set:
By inferring from the mapping structure, you can create a function to convert from image filenames to corresponding label filenames
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'
Next, use open_mask
(since we want integers, not floats) to make sure it works
Normally, we use open_image
to look at the file but this is not a usual image file — it contains integers. So use open_mask
so you can see the color-coded images
We have a file called codes.txt and one called valid.txt
codes.txt is telling us which object corresponds to the number we see in the data:
4 is ‘Building’, and that orange chunk in the top left corner is in fact a building.
Datasets
Again, using the DataBlock API we can set the data source and create databunch objects:
We can pass in the classes
parameter. In previous lessons, we had a string as the label. But in this case, it's a number. We can still pass in the list of what the numbers in to the DataBlock API.
For transformations: Sometimes we randomly fit the image. If we randomly fit the independent variable (x) image, but we don’t randomly fit the color-coded (y) image. We need to tell fast.ai to transform the y as well — whatever you do to the x, you need to do with the y.
Using a smaller batch size — creating a classifier for every pixel is going to take up a lot of GPU. Fast.ai will combine the two pieces for you and it will actually color-code the photo when you run show_batch()
Model
Accuracy in pixel segmentation: correctly classified pixels / total pixels
We’re using a acc_camvid
instead of accuracy
This is more specific to this particular set of data, because you could actually just use accuracy
. The reason for acc_camvid
is because some pixels were labeled as 'Void'. And in the CamVid paper they said you should remove the void pixels when reporting accuracy.
Similar to the steps in the Planet dataset, we’re going to initialize a learner and run it.
For segmentation, we’ll be using the unet_learner
instead of a cnn_learner
The U-Net learner is a CNN architecture for fast and precise segmentation of images. We pass it the normal stuff — databunch, architecture, metrics. The rest is the same — find the learning rate, save it from time to time, unfreeze and train a bit more.
Tip if you’re running out of memory often:
Use mixed-precision training. Instead of using single-precision floating point numbers, you can use to_fp16()
to do most of the calculations in the model with half-precision floating point numbers. So 16 bits instead of 32. But you need the most recent CUDA drivers for this to even work because it's so new. But if you do have that very recent GPU, it will also work twice as fast! Results can sometimes be better.
See how the learner is doing with show_results()
plot losses with learn.recorder.plot_losses()
and learning rate with learn.recorder.plot_lr()
Notice that the losses go up a bit before going down.
And why does the learning rate go up and then down? Because we use fit_one_cycle()
It makes the learning rate start low, go up and then go down again.
Getting the right learning rate is important because it minimizes the loss faster. You want your learning rate to decrease so that as you get closer to the minimum loss, you take smaller and smaller steps.
This is an illustration of the ideal learning rate: start from top right, rate is still changing/adapting but when it gets close to the bottom (lowest point), it should kind of just stay there — taking smaller and smaller steps.
Summary of the iterative training process:
- Prepare the data with src =
ImageList
orSegmentationItemList
orImagePoints
and generate DataBunch objects withsrc.transform().databunch().normalize()
- Specify the architecture (i.e. models.resnet50)
- Specify the metrics metrics, i.e.
accuracy_thresh
oraccuracy
orf_score
- Choose a learner, i.e.
cnn_learner
orunet_learner
. Remember, what a learner does is it’s is minimizing a loss function by finding the lowest point of that function - Find the learning rate by first looking for the ideal
lr
usinglearn.lr_find()
and plot it withlearn.recorder.plot()
- Run the learner with
learn.fit_one_cycle()
- Save this stage with
learn.save()
- Inspect results with
learn.show_results()
- Unfreeze the model with
learn.load(<stage>)
andlearn.unfreeze()
- Update the learning rate with
learner.lr_find()
- Run
learn.fit_one_cycle()
again, with the newlr
usingslice()
Regression with BIWI head pose dataset
This is a different type of dataset — pictures with a dot at the center of a person’s face. So we’ll train a model to recognize the center of the face.
There are a few pre-processing steps that are particular to this dataset. So there are some conversion routines that we don’t need to spend too much time on.
But it’s important to note that the input will be ImagePoints
instead of images.
def get_ip(img,pts): return ImagePoints(FlowField(img.size, pts), scale=True)
So these image points are coordinates indicating the center of the face. So we’re creating a model that can pinpoint the center of the face, which means we need a model that can spit out two numbers. This is not a classification model. It’s a regression model: any kind of model that has a continuous output. The good news is that we can approach it the same way as before.
data = (PointsItemList.from_folder(path) .split_by_valid_func(lambda o: o.parent.name=='13') .label_from_func(get_ctr).transform(get_transforms(), tfm_y=True, size=(120,160)).databunch().normalize(imagenet_stats))
Validation set in this case is just a particular person in the set. Hence the argument into the split_by_valid_func()
above.
We again create a CNN. While traditionally we use cross-entropy loss for classification, for regression we’ll use the Mean Square Error loss function.
learn = cnn_learner(data, models.resnet34)
learn.loss_func = MSELossFlat()
Find a learning rate with lr_find()
and run the learner. We want the valid_loss
to go lower and lower. Inspect the results with show_results()
IMDB
Let’s do some NLP! Let’s classify some documents. Each row in the data has a review (text) and a label (positive, negative), and a flag to indicate if it should be part of the validation set or training set. But we can just ignore this flag and create a DataBunch
containing the data
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
Images could be fed almost directly into the model because images are just a big array of pixel values, which are floats between 0 and 1.
But a piece of text has words, and we need to convert those words to numbers. This is done with 2 steps:
- Tokenization
- Numericalization
The TextDataBunch
class does all of that for you.
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
Data can be saved and loaded:
data_lm.save()
data = load_data(path)
Tokenization
Each token is a word, or individual text units.
data = TextClasDataBunch.from_csv(path, 'texts.csv')
data.show_batch()
Words that are extremely rare/unusual will be replaced with the UNK token.
Numericalization
We now convert tokens into integers by creating a list of all words used. The ids to tokens mapping is stored in data.vocab
in a dictionary called itos
(for int to string). Use data.train_ds
to see the training dataset
Again, we can use the DataBlock API with NLP. We can call the tokenize and numericalize steps, allowing more flexibility
data = (TextList.from_csv(path, 'texts.csv', cols='text')
.split_from_df(col=2)
.label_from_df(cols=0)
.databunch())
Language models can use a lot of GPU, so we want to decrease the batch size. Set bs = 48
The reviews are in a training and test set following an imagenet structure. In addition to train
and test
data, there's an unsup
folder that contains unlabelled data.
We’re not going to train a model that classifies the reviews from scratch. Like in computer vision, we’ll use a model pretrained on a bigger dataset (a cleaned subset of wikipedia called wikitext-103). That model has been trained to guess what the next word is, its input being all the previous words. It has a recurrent structure and a hidden state that is updated each time it sees a new word. This hidden state thus contains information about the sentence up to that point.
We still need to fine-tune the pretrained model to our particular dataset. So we use the unlabelled data for this:
data_lm = (TextList.from_folder(path)
#Inputs: all the text files in path
.filter_by_folder(include=['train', 'test', 'unsup'])
#We may have other temp folders that contain text files so we only keep what's in train and test
.split_by_rand_pct(0.1)
#We randomly split and keep 10% (10,000 reviews) for validation
.label_for_lm()
#We want to do a language model so we label accordingly
.databunch(bs=bs))
data_lm.save('data_lm.pkl')
We use a special kind of TextDataBunch
that ignores the labels, shuffles the texts at each epoch before concatenating them all together (only for training), and sends batches that read that text in order with targets that are the next word in the sentence.
We saved the data in data_lm.pkl
, so we can load it and see the data:
Use a language_model_learner
and follow the same steps for finding the learning rate, running the learner and tuning the lr
.