How to add your classes to the Microsoft COCO dataset and train the YOLOX model

There are enough articles on the Internet on the topic “how to train custom dataset on yolo”.

What is hidden behind these words?

Nothing supernatural. We collect or find a dataset somewhere, mark it up, create an annotation file for pictures. Next, we take one of the pre-trained yolo models, train this model on our own dataset, and then enjoy the result.

Of course, there are some nuances from yolo to yolo (which are already determined by families in their versions, for example, the same yolov5 has about ten model options), but, in general, the procedure is this.

It’s like that. However, a model trained on its own dataset will only determine the classes that have been put into it. For example, only smoke and fire.

At the same time, it is known that in the dataset on which she studied, for example, Microsoft COCO in the situation with YOLOX, there are still 80 classes left, which I would also like to use for my own purposes to determine objects in the image.

In addition, along the way, the question arises: how to remove unnecessary classes from the Microsoft COCO dataset? Because, not everyone needs to identify buses and people in the image, and at the same time plates and knives.

Therefore, the structure of the work is proposed as follows:
1. Let’s take some already marked up custom dataset. To complicate the task, let it be in a different format (for example, Pascal VOC – the data format used in YOLOv5), convert it to the COCO format (data format for the purposes of the yolox model).
2. Download Microsoft COCO and “clean” it from unnecessary classes.
3. “Glue” our dataset with Microsoft COCO.
4. Train the yolox model on the newly created dataset and check how the model works.

Your own dataset.

There are several options:

– collect your own dataset;

– find ready.

For the first option, I would like to advise the opensource (almost) roboflow platform.

The author has nothing to do with its development, but I would like to thank its creators, since it is extremely convenient to work with it: you upload data (pictures with annotations or just pictures), the system itself can clearly show where the errors in the annotations or omissions are, split the data into selections (train, val, test), perform augmentation in all its forms, train the model on the spot, upload data in a convenient coco, pascal format, etc.

There are also disadvantages: if you don’t pay money for using it, then there is a limit on the size of the dataset of 10 thousand. images and the fact that your dataset becomes a “public” property and anyone can download it from roboflow.


*The author sketched a small parser and took a slice of 10 thousand. names of datasets with roboflow. But, unfortunately, there are dozens of benign ones –
download.

On this point, one could finish: uploaded the dataset to roboflow, optionally performed augmentation, converted to coco format, uploaded.
However, as practice has shown, you cannot just take and upload the dataset to coco via roboflow.
Despite the external similarity, the annotations after uploading are not “friends” with the format that yolox accepts. Therefore, it is advisable to upload in pascal voc format and then use the manual conversion method:

We will use the previously created and marked up dataset, which will have doors and stairs as classes. It is these two classes that we will add to the existing Microsoft COCO classes, while, for example, throwing out “unnecessary” classes from there.

Dataset with doors and stairs in Pascal VOC format can be downloaded from here – dataset.

The format that the dataset is presented in is “gotten” by yolov5.
In general terms, this data format (Pascal VOC) is the following:
– each picture is accompanied by a separate xml file that describes it;
– there is also a single file with an extension that simply lists the classes that are used. In our case it looks like this:

doors_and_stairs_map.pbtxt

item {

id: 1

name: ‘door’

}

item {
id: 2
name: ‘Stairs’
}

The data is divided into train/val/test selections, there are folders with annotations for train and val. In total, about 10 thousand. files (pictures) this is quite enough so that the classes do not overlap with the classes from the Microsoft COCO dataset, whose representatives also number in the thousands.

If you take a too small dataset, then its representatives will not be determined by the model, an imbalance of classes will occur.

Convert from Pascal VOC to coco format.

Yolox training requires the coco data format. These are, as a rule, two json files with annotations (one for the folder with pictures from the train, the second for the validation folder). In fact, everything that is contained in xml files (for each picture) that use yolov5 models should migrate to 2 json files that yolox uses.

Stages of conversion.

*For the dataset of the door, stairs, all the steps below have already been completed.

1. After unpacking the dataset with doors and stairs, at the root you need to create a simple labels.txt with the classes that are used in the dataset. In our case door and Stairs. At the same time, it is important that the syntax of class names is preserved and that each class starts on a new line.

2. Create folders train2017, val2017, train_annotations, val_annotations. In which there will be annotations and pictures separately from each other.

3. Create two txt files: train.txt and val.txt (for train and val folders), which will contain file names in a column without extensions:

cam_image1

cam_image1

cam_image1

cam_image10

A little help-code for this –

code

import os
directory = 'doors/images/train'
file_out="train.txt"
for file in os.listdir(directory):
    if file.lower().endswith(".jpg"):
        with open(file_out, 'a') as f:
            f.write(f'{file.split(".")[0]}\n')

4. Convert the train and val folders to coco format:

python voc2coco.py --ann_dir doors-and-stairs-dataset/train_annotations --ann_ids doors-and-stairs-dataset/train.txt  --labels doors-and-stairs-dataset/labels.txt --output doors-and-stairs-dataset/coco_instances_train.json --ext xml
python voc2coco.py --ann_dir doors-and-stairs-dataset/val_annotations --ann_ids doors-and-stairs-dataset/val.txt  --labels doors-and-stairs-dataset/labels.txt --output doors-and-stairs-dataset/coco_instances_val.json --ext xml

Converter –

voc2coco.py

The result is two json files (coco_instances_train.json, coco_instances_train.json), which, together with the train and val folders (which contain pictures), will be needed for further training as part of Microsoft COCO classes. Folders with xml files are no longer needed.
Download ready json – here.

2. Download Microsoft COCO and “clean” it from unnecessary classes.

The Microsoft COCO dataset is located at

link

. There you can also read about its structure, get acquainted with api. We are interested in coco 2017, pictures with annotations.

You don’t have to download the test selection, these are just pictures on which you can check the finished model. Instead of pictures from the test, you can take absolutely any picture from the Internet.

In annotations that are json files, we only need instances_train2017.json and instances_val2017.json.

Let’s take a look at one of them to get an idea of ​​the data structure:

read_jsons.py


import json
from time import sleep

with open('instances_val2017.json') as json_file:
    data = json.load(json_file)
    for i in data['categories']:
        print(i['name'])    
    #for i in data['annotations']:
        #print(i)
    #for i in data['images']:
        #print(i)

In such a simple way, you can see categories (classes), annotations to them, descriptions of pictures. This information will be needed later when we merge classes.
In the meantime, let’s filter Microsoft COCO by throwing out unnecessary classes.

Filtering Microsoft COCO

To filter classes, you need to execute the file

filter.py

with the following arguments:

python filter.py --input_json /annotations/instances_train2017.json --output_json /annotations/filtered_train2017.json --categories person dog cat

In doing so, list the classes that are left. If the class consists of two words, for example wine glass, then it must be enclosed in “”. In the example above, we leave only the person, dog, and cat classes. But, to avoid confusion, I will exclude fewer classes from coco, keeping the following:

the code

python filter.py --input_json instances_train2017.json --output_json train2017_filtered.json --categories car orange banana "wine glass" sandwich bottle vase bicycle fork sofa umbrella toothbrush keyboard book mouse cat bed cup spoon microwave "cell phone" "tv monitor" carrot "teddy bear" "sports ball" knife scissors laptop oven remote sink backpack bench dog "dining table" chair handbag bowl toilet "hair drier" refrigerator "potted plant" clock person suitcase apple 

*Sofa tv monitor classes will not be found, but that’s okay.

The same code must be run for instances_val2017.json:

the code

python filter.py –input_json instances_val2017.json –output_json val2017_filtered.json –categories car orange banana “wine glass” sandwich bottle vase bicycle fork sofa umbrella toothbrush keyboard book mouse cat bed cup spoon microwave “cell phone” “tv monitor” carrot “teddy bear” “sports ball” knife scissors laptop oven remote sink backpack bench dog “dining table” chair handbag bowl toilet “hair drier” refrigerator “potted plant” clock person suitcase apple

Ready!
Check the previously used read_jsons.py for the newly created files (any of them). There should be 44 coco classes left out of 80.

3. “Glue” our dataset with Microsoft COCO.

To understand how the merging happens, you need to look into the annotation file using read_jsons.py

with open('filtered_val.json') as json_file:
    data = json.load(json_file)
    for i in data['categories']:
        print(i) 

The code will show that the categories are in order:


{'supercategory': 'person', 'id': 1, 'name': 'person'}
{'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}
{'supercategory': 'vehicle', 'id': 3, 'name': 'car'}
{'supercategory': 'outdoor', 'id': 4, 'name': 'bench'}

And here is how the categories look in the generated json for the custom dataset “Doors and Stairs”:


{'supercategory': 'none', 'id': 1, 'name': 'door'}
{'supercategory': 'none', 'id': 2, 'name': 'Stairs'}

Therefore, if you just glue two json, then the categories will overlap each other, and there will be confusion with the annotations and the pictures themselves. Therefore, you must first convert the json custom dataset by “shifting” the class id.

Everything would be quite simple if not for one “but”, exactly two “buts”.

Annotations, id images in json files are also subject to change and here is a brief reminder:


##"categories": 
#2 класса
##{'supercategory': 'none', 'id': 1, 'name': 'doors'} #заменить id 1 на id 45
##{'supercategory': 'none', 'id': 2, 'name': 'Stairs'} #заменить id 1 на id 46

##"annotations": 
##{'area': 38280, 'iscrowd': 0, 'bbox': [311, 29, 174, 220], 'category_id': 2, 'ignore': 0, \
#'segmentation': [], 'image_id': 2, 'id': 1}  #заменить category_id (см выше) и image_id (image_id*1000000)

#"images": 
#{'file_name': 'cam_image2.jpg', 'height': 540, 'width': 960, 'id': 2} #заменить id (image_id*1000000)

Fortunately, we managed to write a script that changes everything itself, but you need to manually specify the multiplier in the image_id*1000000 line. The following example will make it clearer.

Converting json custom dataset before merging with Microsoft COCO.

Convert the custom annotation files of the “Doors and Stairs” dataset:

convert_jsons.py


import json,os
from time import sleep

#сдвиг по классам
n=44 # 44- указать последний номер класса в датасете coco сейчас
x1,x2,x3=100000,200000,1000000 #указать порядковые номера картинок

in_file="doors_stairs_train.json"
out_file="doors_stairs_train_out.json"

in_file2='doors_stairs_val.json'
out_file2='doors_stairs_val_out.json'

def transform(in_file,out_file):
    with open(in_file) as json_file, open(out_file,'w') as out_file:
    #with open('fromCOCO_cat_dog.json') as json_file:
        data = json.load(json_file)
        #меняем классы в "categories"
        for i in data['categories']:        
            i['id']+=n
            #print(i['id'])
        #for i in data['categories']:
            #print(i)
        #меняем классы и id в "annotations"
        for i in data['annotations']:
             i['category_id']+=n
             i['image_id']+=x1 #1000000 взято с потолка, просто, чтобы номера не пересекались с номерами из COCO, где ~700000 картинок
             i['id']+=x2 ##100000 взято с потолка, просто, чтобы номера не пересекались с номерами из COCO
        #for i in data['annotations']:
            #print(i)
        #меняем id в "images"
        for i in data['images']:
             i['id']+=x3 #1000000 взято с потолка, просто, чтобы номера не пересекались с номерами из COCO, где ~700000 картинок
        #for i in data['images']:
            #print(i)
        json.dump(data, out_file)
transform(in_file,out_file)
transform(in_file2,out_file2)
os.remove(in_file)
os.remove(in_file2)

Here you need to pay attention to the introductory part:

n=44 # 44- указать последний номер класса в датасете coco сейчас
x1,x2,x3=100000,200000,1000000 #указать порядковые номера картинок

in_file="doors_stairs_train.json"
out_file="doors_stairs_train_out.json"
in_file2='doors_stairs_val.json'
out_file2='doors_stairs_val_out.json'

In the future, adding new classes, you need to change n and x1,x2,x3, increasing the latter by 10,000 (in fact, this is the number of pictures in the dataset so that they do not overlap the id of other pictures). If there are more pictures in the new custom dataset, then x1,x2,x3 must also be increased “with a margin”.

Once the conversion is done, again, you can check the converted jsons. It will be seen that categories, annotations, parameters of images in annotations have changed:


{'supercategory': 'none', 'id': 45, 'name': 'door'}
{'supercategory': 'none', 'id': 46, 'name': 'Stairs'}

We glue with Microsoft COCO.

The most interesting remains. Glue your custom dataset to the coco dataset, thereby expanding the number of classes.

Here another script comes to the rescue, which I was lucky enough to write:

join_jsons.py


import json,os
from time import sleep

json_one="44_coco_train.json"
json_two='doors_stairs_train_out.json'
out_file="coco46_train.json"

json_one2='44_coco_val.json'
json_two2='doors_stairs_val_out.json'
out_file2='coco46_val.json'

def transform(json_one,json_two,out_file):
    with open(json_one) as json_one, open(json_two) as json_t:
        data1 = json.load(json_one)
        data2 = json.load(json_t)
        #объединили категории    
        data1['categories'].extend(data2['categories'])
        data1['annotations'].extend(data2['annotations'])
        data1['images'].extend(data2['images'])
        
    with open(out_file, 'w') as out_file:
        json.dump(data1, out_file)        

transform(json_one,json_two,out_file)
transform(json_one2,json_two2,out_file2)

os.remove(json_two)
os.remove(json_two2)

Here you need to be careful not to glue val with train:


json_one="44_coco_train.json"
json_two='doors_stairs_train_out.json'
out_file="coco46_train.json"

json_one2='44_coco_val.json'
json_two2='doors_stairs_val_out.json'
out_file2='coco46_val.json'

As a result, we get ready-made annotation files for further training of the yolox model already on 46 classes, including our own.

Ready annotation files −

download

.

And one more important point.
It is necessary to copy all the pictures of the custom dataset into the pictures of the coco dataset. Train custom dataset in train coco, val custom dataset in val coco.
What about filenames?
With a high degree of probability, the file names will not intersect with each other. And the binding to pictures by id is indicated in the annotations that we have just analyzed.

To be continued.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *