Predicting the trajectory of a flying object

In this article, we will discuss the solution to the problem of predicting the coordinates of a flying object. Let’s imagine that you want to do anti-mosquito air defense. Knowing the coordinates of the mosquito in several frames of the video, it is necessary to say where it will be in the next frame.
Or, say, you write AI for a browser toy and you need to predict where the player will be in a second to shoot with realistic lead.

You can build a complex model that takes into account the wind, inertia and the entire physics of the object, or you can simply leave the data in the neural network and get a completely tolerable result, which, it turns out, works equally well for tracking mosquitoes, drones, birds, airplanes and other actively maneuvering objects. So, this article is about modeling flight through neural networks for the lazy.

Short:

We assume that on a fairly short segment, the flight path fits into the Bezier curve, from several points we guess about the characteristics of the curve and use it for predictions.

  1. One process collects a dataset with coordinates and trains a neural network
  2. The second process, having the previous coordinates, actually makes a prediction, gives out the coordinates in the future.

For simplicity, we assume that we have 5 coordinates in the past: T-4, T-3, T-2, T-1, and T-0, and the time distance between them is the same. Strictly speaking, this does not happen in real life, because frames from a video come with different delays, not all frames can recognize an object, etc., therefore, in addition to coordinates, the time stamp must also be processed in an amicable way. We take this out of the article and will operate only with flat coordinates.

So, we assume that we have pixel coordinates for each of the T.

A CSV dataset looks like this:

filename, x_t+1, y_t+1, x_t-0, y_t-0, x_t-1, y_t-1, x_t-2, y_t-2, x_t-3, y_t-3, x_t-4, y_t-4 

kxs2ut1j, -0.003441, -0.006477,  0.000000,  0.000000, -0.076425,  0.012893, -0.232717,  0.032203, -0.468875,  0.057928, -0.784900,  0.090070

These coordinates are relative to the point T-0. That is, we consider that T-0 is always exactly in the middle of the frame with coordinates 0,0; while -0.5, -0.5 is the top-left corner of the frame, +0.5, +0.5 is the bottom-right corner of the frame.

We use JavaScript and the brain.js library as tools

Let’s declare dependencies

const brain = require("brain.js");
const fs = require("fs");

Let’s initialize a primitive fully-connected neural network in three hidden layers and if there are pre-trained weights, load them

const config = {
  binaryThresh: 0.5,
  hiddenLayers: [3], // array of ints for the sizes of the hidden layers in the network
  activation: "sigmoid", //supported activation types: ["sigmoid", "relu", "leaky-relu", "tanh"],
  learningRate: 0.000003, // scales with delta to effect training rate --> number between 0 and 1
  momentum: 0.01,
  log: true,
  logPeriod: 1,
  iterations: 1000, //number of iterations per epoch
  errorThresh: 0.0000005,
};

const net = new brain.NeuralNetwork(config);

//if there are weights saved from the previous run, fetch them
if (fs.existsSync("brain")) { 
  let json = JSON.parse(fs.readFileSync("brain"));
  net.fromJSON(json);
  console.log("Reused the weights from the brain trained last time");
}

Let’s read our dataset from a csv file

const data = fs.readFileSync("./dataset/data.csv", "UTF-8");

// split the contents by new line
let lines = data.split(/r?n/);

Let’s create two arrays with a dataset: coordinates T-4 … T-0 will go to input, coordinates T + 1 will go to output

let dataset = [];

lines.forEach((line) => {
  let raw = line.split(",");
  let rec = {
    input: [],
    output: []
  };

  for (let i = 1; i < raw.length; i++)
    raw[i] = (parseFloat(raw[i]) +1) / 2;

  rec.input.push(raw[3]); //present coordinates T-0
  rec.input.push(raw[4]);
  rec.input.push(raw[5]); //past coordinates T-1
  rec.input.push(raw[6]);
  rec.input.push(raw[7]); //past coordinates T-2
  rec.input.push(raw[8]);
  rec.input.push(raw[9]); //past coordinates T-3
  rec.input.push(raw[10]);
  rec.input.push(raw[11]); //past coordinates T-4
  rec.input.push(raw[12]);

  rec.output.push(raw[1]); //future coordinates T+1
  rec.output.push(raw[2]);

  dataset.push(rec);
});

We start training. On my laptop, training in a thousand iterations with a hundred datapoints takes only 38ms without using a GPU.
For objects flying along a ballistic trajectory, data from a hundred frames are sufficient. For actively maneuvering objects – a little more.

console.time("training");
net.train(dataset);
console.timeEnd("training");
fs.writeFileSync("brain", JSON.stringify(net.toJSON())); //save the weights

The result is obvious:

https://habr.com/ru/post/598711/image https://habr.com/ru/post/598711/image https://habr.com/ru/post/598711/image

Here, the past points are shown in green, the real coordinate in the future is shown in yellow, and the predicted coordinate is shown in red.

That is, in just 38 unfortunate milliseconds and with the help of data from a hundred frames, we taught the neural grid to predict the coordinate with quite sufficient accuracy. If you can afford to spend another 38ms, then you can run another cycle, the prediction will become even more accurate.

The purpose of the article was to show that with the help of a primitive neural network, one can easily replace heavy physics modeling and get a decent accuracy in predicting the future coordinates of a flying object.

All the code and synthetic dataset for illustration, of course, on githabe

Similar Posts

Leave a Reply