Custom Train YOLO for Object Detection: A Tutorial


Step 1: Creating the Dataset


YOLO (You Only Look Once) is an algorithm that uses neural network to provie real-time object detection. It is gain popularity over the years because of its speed and accuracy and has been widely used for various applications to detect traffic signals, humans, animals etc. You would be retraining the YOLO neural network to also learn to identify DJI robomaster and other obstacles that you want. Specifically, you will retraining YOLOv8 model. For this, you will be using Roboflow's Python API. Please go through the Python API thoroughly.
  1. Create a roboflow account and sign-in.
  2. Create a new workspace and then create a new project. Or simply follow the Roboflow tutorial.


  3. Upload the training images for annotations. Take a lot of images of the DJI robotmaster and other objects you want to detect. Make sure you also take a lot of images that does not include the image of the robot.
  4. Annotate the images and/or assign it to your teammates. Make sure to annotate the images both with and without robot. Make sure the images with no objects like robots or obstacles are classified as "empty" or "null".
  5. Split your data to Training set, validation set and testing set.
  6. Preprocess the Dataset! Resize will be useful. Keeping the resolution and aspect ratio the same as your robot camera input.
  7. Augment the data. This helps to create more training example images for your model. Feel free to use "horizontal flip", "saturation", "brightness", "exposure", "blur", "cutout". Do not over do the amount of augmentation threshold; for example, do not blur too much; up to 1.5-2.0 px blur is more than enough.
  8. Generate the final dataset for your model (Use maximum version size (3x)).

Step 2: Train the Model


Method 1
  1. Now, we are ready to train the model using Roboflow. Use the YOLOv8 version. Start the training (Use the "Fast" method.). It will take few hours on their online server. Sit back and enjoy till the network trains. You would receive an email like this:

  2. Follow the roboflow api. Install the dependencies using pip install roboflow
  3. from roboflow import Roboflow
    rf = Roboflow(api_key="API_KEY")
    project = rf.workspace().project("MODEL_ENDPOINT")
    model = project.version(VERSION).model
    # infer on a local image
    print(model.predict("your_image.jpg", confidence=40, overlap=30).json())
    # visualize your prediction
    # model.predict("your_image.jpg", confidence=40, overlap=30).save("prediction.jpg")
    # infer on an image hosted elsewhere
    # print(model.predict("URL_OF_YOUR_IMAGE", hosted=True, confidence=40, overlap=30).json())
Method 2
Train your model offline on your personal computer.
  1. We will training YOLOv8n (nano model, 3.2M params) for this.
  2. Download the training, validation and testing dataset from Roboflow.
  3. Install ultralytics:
    pip install ultralytics
    . This will install install the dependencies, including torchvision and pytorch.
  4. Install clearml:
    pip install clearml
  5. For this, you would need to setup the API key. Please follow the clearml instructions from here.
  6. from ultralytics import YOLO

    # Load the model.
    model = YOLO('yolov8n.pt')

    # Training.
    results = model.train(
    data='pothole_v8.yaml',
    imgsz=1280,
    epochs=50,
    batch=8,
    name='yolov8n_v8_50e')
  7. Run the inference on a video or modify it for images/real-time feed:
    yolo task=detect mode=predict model=runs/detect/yolov8n_v8_50e/weights/best.pt source=inference_data/video_1.mp4 show=True imgsz=1280 name=yolov8n_v8_50e_infer1280 hide_labels=True