Object Detection Training For Beginners

Object Detection Training For Beginners

There are many interesting things you can do with computer vision, including detecting custom objects. In this video, we look at how to train a computer vision algorithm (YOLOv8) to detect custom items.


"In this video, I’m going to show you how to train a computer vision algorithm, to detect a custom object.

Ok, so what object are we going to perform training on?

Well, surfers. Let me explain.

So, in a prior video, I showed you how to perform object detection, by using YOLO version 8 and open cv.

Now, we were able to use the default models in YOLO to detect the surfers in a video, well actually we weren't detecting surfers per say, we were detecting people. Now you might be thinking, surfers are people, so why am I making this distinction between surfers and people? Well, I'll come back to this in a moment here.

So, how well did the default model work to detect surfers?

Well, it did ok, I mean it detected some of the surfers, but towards the end of the video, it missed most of the surfers that were off in the distance.

Ok, so, what can we do, to improve the accuracy of object detection?

Well, one thing we tried in the prior video was to use the largest model, which did improve the accuracy, but it still missed detecting many of the surfers, and there's a tradeoff which is that the larger models, are much slower, so our framerate wasn't great, it was a bit slower than we'd like.

Ok, so what can we do to improve both the speed and the accuracy of inference?

Well, what if we trained the Machine Learning model specifically on surfers? In theory, doing this could significantly improve the accuracy, as surfers do look a bit different visually, than a person, and so training specifically on surfers could lead to better results, and we can potentially use a smaller model, which would be faster.

Alright, so how do you go about training the machine-learning model in YOLOv8?

Well, it's actually not that hard to do, but it is time-consuming and a bit tedious.

Now, at a high level, training consists of collecting a bunch of images that include the object or objects you wish to detect.

Next, you need to draw bounding boxes around the objects in the images, then basically, you start the training process and pass along these labeled images, then you’ll get an updated model file as an output to the training process.

Now, just so you know, there are tools to help you do the labeling, and I'm going to use one of my favorite labeling tools, which is provided by roboflow.

So, I'm going to show you how to label with roboflow, but then I'll explain what's happening underneath the hood, when we use a tool like roboflow.

Ok, to begin labeling, I'll sign in to roboflow with my free account.

Next, I'll create a new project.

This project will be for object detection, and we're detecting surfers, and the project name will be surfer_vision, and I'll create this public project.

The next thing we need to do is add a bunch of images of surfers to perform training on.

Now in our case, I'm going to use a couple of videos as sources for images.

So, I'll just drag and drop a video file into the upload section of roboflow.

Ok, so now we're being asked how to sample the video. In other words, this tool wants to know how many frames to use from this video.

I'll adjust the sample size to 30 frames per second, which will give us 483 images to use for training purposes.

Next, I'll click choose frame rate, and as you can see, the tool is breaking the video up into individual frames, which I'm going to start labeling or annotating in a moment here.

Now I'm going to add one more video to our training dataset, then I'll set the sample rate to 30 frames per second, which gives us 517 images to work with, then I'll click "choose frame rate".

Ok, so in total we've got 912 images to train with.

Now, in a production scenario, you'll likely want to train on many more images than this, but this is good enough for our purposes.

Next, I'll click "save and continue" and now the images are being uploaded to roboflow.

Now I'll assign all of these images to myself, by clicking my name and then I'll click the assign images button.

Next, I'll click the start annotating button, and then I'll start drawing bounding boxes around every surfer that's visible.

So this is the tedious part of training, that can be very time-consuming.

Now, after I've labeled all the surfers in one image, I'll advance to the next image and annotate this image, and I'll keep on going until all the images are labeled.

Ok, I wanted to show you how to label images, but I’ll be honest with you, I’m not looking forward to labeling all these images, as it’s a lot of work that I’d rather avoid if possible.

Alright, so is there any shortcuts we could take?

Well, I’m wondering if somebody has already created a dataset like this, that we could use in lieu of performing the labeling myself.

So, where do we look for a surfers training dataset?

Well, there's various places on the internet that host open datasets like this, and in fact, Roboflow hosts thousands of open-source image sets you can use, so let's see if we can find a surfer dataset, here on roboflow.

Todo this, I'll back out of this screen, then I'll click on universe, then I'll search for the term 'surfers'.

Ok, check this dataset out. It's a surfer dataset with over 8000 images, which looks promising.

Let's take a closer look, and yeah, this looks like a dataset we could use, cool.

I'll go ahead and click the "Download this dataset" button, then I'll verify that the export format is YOLOv8, then I'll click "continue".

Now we're given a few options for downloading the dataset.

We can download it into a Jupyter notebook with this snippet here, which we're going to use, so I'll go ahead and copy the snippet.

Or, alternatively, we could download the dataset from the terminal, or we could get a raw url to download as well.

Ok, so now that we've got a dataset we can use, the question is, how do we use it.

Well, I’ll show you how to perform training in a Jupiter Notebook, so I'm going to open a google colab notebook, which we'll use to train our new model.

Now, you don't have to use google colab, you could use your own computer, so long as you've got a usable GPU.

I like to use google colab because I can use GPU's that are faster than what's on my M1 Mac.

Ok, let me walk you through the cells in this notebook.

First, I'll run this cell, which will execute the nvidia-smi command, which will verify we've got access to a GPU for training purposes. And it looks like the command wasn't found, which means there is no GPU available, so we need to change our runtime to include a GPU.

Todo this, I'll click "runtime" then "change runtime type", then I'll set the hardware accelerator to "GPU" and I'll set the GPU type to be A100.

Now, if your following along with me, you might not be able to select A100, because you have to buy credits to use this GPU, but I believe you can choose T4 on a free account, so long as the resources are available.

Next I'll click save, and I'll run this nvidia-smi command again, and this time we get a valid response back, which again means we've got a GPU we can use for training.

Next I'll run this cell, which sets a variable to the current working directory.

Now I'll install ultralytics, which is the package that contains yolo, then I'll verify it's installed by calling checks.

In the next cell, we're going to download our training dataset, so first I'm making a directory name datasets and I'm cd'ing into that directory.

Now, just below this I'll paste in the snippet I copied earlier from here.

And now I'll run this cell.

So, what's happening now is the image dataset we created is getting downloaded into our notebook environment, where we can use it for training purposes.

Now, you don't really need to know exactly what's getting downloaded, but for those of you who are curious, we'll take a closer look.

So, I'll click on the files icon, then I'll open up the dataset directory that I created a moment ago, then I'll open up the surfer-1 directory and and here we see two folders: train and valid, and inside these folders you'll find two directories, images and labels.

So what's in these folders? Well, you could probably guess, it's the images which you can find in here, and the associated labels or bounding box data in here.

I'm not going to open these up, because they have more than eight thousand files in each of them, and it would take a bit of time to even see the images.

So, we've got images and corresponding labels, and we've got this data, yaml file.

This file is the config file for our dataset, and it includes the class names, of which there's only one in our case, and it specifies the paths to the training and validation image sets.

So, basically, this dataset highlighted here, conforms to a simple set of conventions that yolo understands and can use for training purposes.

Now we can either run a command line program to execute the training, or we could use the api, but in either case, we get the same result, which is a newly trained model that we can use to detect surfers.

I'll go ahead and perform the training using the command line option, so I'll execute the yolo command and I'm passing it the task detect for object detection, then I'm passing in the mode, which I'm setting to train, then I'm specifying which base model to use, in our case I'm specifying the small model, then I'm passing along the location of our dataset configuration file, then I'm saying we'll run the training for 100 epochs, and I'm setting the image size to 640 and I'm setting plots to true.

Next, I'll go ahead and kick off the training run, and this is going to take a bit of time, so I'll fast-forward to the end.

Ok the training is done, and it created a model file in this path right here. So, I’ll go ahead and run this cell, which will download the generated model file for us.

Next, I'll copy the model file and paste it into the folder where our script is, then I'll replace this old model file name, with our new model file name, and I'll run our script again, so we can get a sense of how well our new model is working.

Ok, it looks like most of the surfers are getting recognized and our frame rate is pretty good as well, nice.

Hey, here at Mycelial, we're building development tools for machine learning on the edge. More specifically, we're building the data backbone for edge machine learning applications. If you're curious about what we're building, I'd encourage you to join our mailing list to learn more."