Bad advice on dataset preparation

Hi all! Anton Kobak, CEO of Kobak Lab. In the last 3 years, many of our projects are tied to video analytics.

As often happens, for the internal order, we decided to compile a manual for new employees on organizing the collection of datasets, technical requirements, etc. Ultimately, all this resulted in a long read in the chat, which I decided to share.

For those who have little time – just a summary of the article. The second part of the article will be devoted to all the technical details missing here.

  1. We carefully choose the angles and what we shoot at all;

  2. We plan the collection/shooting process in detail;

  3. We check the equipment ten times;

  4. We take only our proven PC;

  5. We save all data, even raw ones;

  6. We take excess equipment;

  7. We agree with the customer everything in writing and in advance;

  8. We think over access to the Network in advance;

  9. We think about ourselves and our comfort!

Who is the article for?

  • You are taking the first steps in collecting datasets in nature, or

  • Already have experience, but it is interesting to see the opinion of colleagues.

Article Format – bad advice and to him check listhow to do it. Let’s run!

Tip #1 Do not set rigid criteria for including data in a set

At all. Do not set requirements for the angle, shooting angle and movement of the object in the camera. 120 or 180 degree viewing angle? Doesn’t matter. Don’t chase variety. We’ll all learn later.

How right?

  1. Determine universal mounting points system cameras and use recordings from similar angles when training the model.

For what? The neural network will not understand what is required of it if you trained it on one data set, but show completely different ones.

  1. Smaller, but better: do not take angles where the object of observation is lost among others. From such angles, it is almost impossible to distinguish it from the environment, which will affect the performance of the model. Pay attention to angles and camera coverage.

  2. Include in the dataset the video captured by the cameras that will be used in the final product. Images taken with another camera may not be suitable. You can use frames from different cameras, but the model will work best at the resolution at which it was trained.

  3. Do not use muddy and blurry footage in training.

  4. Apply shots with different light levels.

  5. A universal tip to avoid overfitting issues is to provide a variety of frames so that the model can learn to handle different situations. The higher the diversity of the dataset, the better the model will generalize the essence of the detected object.

  6. Test the model both on frames with objects of observation and without them. During testing, the lack of people in the field of observation can be a problem for the model.

Tip #2 Plan tests in real conditions: they are already waiting for you there and ready to help in any way they can

Permission to shoot will be given immediately, the use of non-certified equipment (that is, your system) will be allowed. Do not agree in advance, pull to the last. You will be surprised how much everyone wants to join modern technologies, especially government agencies.

How right?

  1. Ask for help. Introduce yourself as students or graduate students. Explain that the purpose of data collection is scientific. This method is suitable if you have no other way to implement data collection.

  2. Give guarantees that the system does not pose risks to visitors and personally to the management of the facility. Attach a list of equipment, licenses, and anything you can show in the security plan (more on that below).

  3. Prepare a list of locations and contacts where you can potentially shoot. Make a mailing list and set statuses so as not to get confused who was asked, who refused, who did not answer.

  4. Send letters to institutions and companies where you plan to shoot IN ADVANCE! Ideally 2-3 months before filming.

  5. Feel free to send bulk mail. Nobody wants to answer and understand what you wrote. Out of 30 companies, 1-2 answer.

  6. Take everything with you at once. Let’s say you’re collecting data in a factory and you expect there to be tools that you can use to install your equipment. Screwdrivers, pliers, drill? They are not available or will not be given to you. Or give, but for the money. Just be prepared.

Tip #3 Don’t need no papers, it’s all paperwork

You finally agreed on the phone with the manager at the facility and arrived. You were met, escorted off, shown all the sockets, connected to the connection, and given overalls. They assigned a person to help you. He talks about safety, introduces colleagues, helps you navigate and stays in touch… and then the alarm goes off, it’s time to get up.

How right?

  1. Do not believe, do not be afraid and ask. Ask for everything: agreements written down on paper, permission to shoot, to install equipment. Ask for permission from the security service, ask to sign a contract for the work. Your mail-order arrangements with one person in a plant with 10,000 employees may not work. Of course, it all depends on the scale of the facility where you plan to collect or test. And, I will not hide, our team also had to work on the bird’s rights. But you need to understand that this is the wrong and risky approach. Carve all agreements in stone, they will help you.

  2. Define the infrastructure requirements and confirm that they have been met at the site in writing. Capacity on the servers, access to them, the availability of electricity, Internet bandwidth, an employee of the facility who will help. This is a priority. You may be disconnected from electricity or cut the speed in the local network. SB will not care who you communicated with. If you don’t have a supervisor-approved list, you’re done.

  3. Prepare Test program and methodology. It is difficult to overestimate the importance of this document when planning work and agreeing on access to an object. In general, the document itself can be simplified to several basic tables if you are submitting a project not in accordance with GOST. Here are some examples:

Table 1. Composition of equipment for testing (example)

No.

Component

Model

Connection

Characteristics

1

underwater camera

Hikvision DS-2CD2955FWD-I

PoE: Cat 5e RJ-45 connector

-60 to +60°C; IP54

2

power unit

Output voltage 5V
Output power 20W
Output current 4 A

Table 2. Checklist and test procedure (example)

No.

Action

Result

1

Verification of prototype compliance with the requirements for recognition, fixation, tracking and notification of hazard levels

Determination and display on the user’s monitor of sports metrics by surface cameras. Update metrics at least once per second

Successfully getting and updating the metrics of people in the pool (calculation of swimming distance and average speed)

Successfully update metrics at least once per second

2

API integration with CCTV system

..

Tip #4 Do not test the technique – it works

Set up the camera and calmly run around the object, do not worry about checking the record! Everything will be recorded.

How right?

  1. Check that the cameras are working at all. Check again: the cameras are recording, the picture is coming, the stream is not freezing, etc.

  2. Make a test recording and make sure the stream is running. Enable recording on each camera. And then check again. Once again.

  3. Set up your camera. What if the quality is bad? Often what is not corrected at the start cannot be corrected later.

  4. Check in advance that all equipment is connected to the PC from which the tests will be carried out.

Tip #5 Do system setup on a PC you see for the first time

Leave your laptop at home, it’s expensive and heavy. Borrow a laptop from a colleague. It may have slightly different software … But you will figure it out somehow.

How right?

  1. Never use a computer that you see for the first time. Even if you plan to work via the web, use a familiar PC. At the most inconvenient moment, you find out that nothing works on someone else’s laptop: there are no drivers, services do not work, containers do not rise. Fun, right?

  2. Do tests! Test your PC’s capabilities and complete a full test deployment.

Tip #6 Don’t save the raw stream – save when you’re done

Why store raw data? Yes, and on two devices. Data cannot be lost, rest assured. The world is safe and predictable.

How right?

  1. Store both raw and processed records. For what? Raw can be used for additional training. Use processing from the collection point for photo and video materials and evaluate how the system works at the moment. Such processing is used in the post-analysis of problems in tests, so as not to re-run the raw material and not waste time searching. Plus, there will always be material for comparing network versions. Add reliability and leave yourself a chance to return to the original data.

  2. Use two computers to store data while you work on site. On one, something can always not turn on, it is also quite possible to break and even break a laptop.

Tip #7 Do not take extra consumables, do not drag tripods – everything will be given out

Accurately calculate the number of connections and the length of the wire. Remember, one camera, one mount. Don’t take too much, you won’t need it anyway. You will figure it out on the spot, you have developed engineering thinking!

How right?

  1. Use tripods and universal mounts. Take more clamps, never be superfluous.

  2. Be prepared for turns: you will need at least 2 times more twisted pair than you calculated for the site distance.

  3. Take 4 times as many RJ 45 connectors. Say thank you later!

  4. A cable tester is a must. Your task is to check the compression immediately and not waste time later looking for something that does not work.

  5. Visit the test site in advance: determine the location, measure the distances, see what might interfere with the installation of the equipment.

Tip #8 Don’t Plan for the Internet – It Will Be

The Internet is everywhere, this is a known fact. You can download Source packages for your frontend and rebuild docker containers in the middle of nowhere. By the way, 5G catches better there. Update the platform, change the code – do not deny yourself anything.

How right?

  1. Assess site conditions. thick walls in a workshop at a factory, remoteness from towers, interference – all this can interfere with catching a signal even for mobile communications, to say nothing about Wi-Fi.

  2. Find out how the Internet works in advance. Perhaps you will be connected to the network? Maybe at the enterprise where the collection takes place, there is a jammer? Or the Internet is, but slow? It is better to know about this right away.

  3. Take EVERYTHING at once. Distributions, local repositories. You must have a local copy of ALL code on your PC. Check that it works without a network. Think over a step-by-step deployment plan and possible failure options for equipment, software, communications.

Tip #9 Think of a project as a fun adventure

It will be an easy ride! Later you will tell your friends how you put the cameras over the moving conveyor belt, in millimeters from the mechanisms. By the way, when you test in the smelter, take a selfie near the furnaces. Without overalls, so as in Aliens.

How right?

Think about yourself. Collecting data in harsh environments is not uncommon: a red-hot shop, a tower crane, a mining quarry, a ship or a mine. After all, you can just shoot outside in -20°C in winter. Prepare yourself. Overalls, shoes, helmets, warm tea in a thermos – the list depends on the conditions. Remember – nobody cares except you.

  1. Think safety. Do not rely on the employees of the enterprise, they will easily leave you alone with dangerous equipment. No one will explain to you what to do, what to press, why you should not put the camera where in a minute there will be sharp steam or hot metal. How did I know? Don’t ask…

  2. Check at the negotiation stage if special protection or safety training is required. You will need a list of actions that you plan to capture on cameras. From the object – a list of premises to which you must have access and consent to accompany you and conduct briefings.

  3. Check equipment readiness. Check your passport for temperature and humidity ranges. Specify the requirements for tightness, installation rules, connections. The bad news: there will be surprises while working on site. For example, our team encountered the fact that a PSU source that says 600 W turned off at peak load from the computer’s power supply. Yes, now we know that the power supply has capacitors that use 600 watts at the time of start. There will always be nuances that can be learned only by standing on a rake.

Is there another opinion? Comment! In the second part I will talk more about the technical details. Soon!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *