How to get two diplomas for one pet project

My name is Vlad, I work as a Full-stack developer in the Logistics department of KORUS Consulting. In parallel with this, I am studying in my last year of master’s degree at the St. Petersburg State University of Aerospace Instrumentation at the Department of Computer Technology and Software Engineering.

During my undergraduate studies I studied applied computer science, but during my studies I did not spend enough time on programming and software development. Basically, the emphasis was shifted to mathematical statistics and various analyzes. The current direction in the master's program provides great opportunities for me to realize myself as a developer.

I’ll tell you about my pet project (and thesis) “Intelligent system for determining the parameters of objects of a sporting event using a tracking library.”

Project idea

Everyone knows the FIFA series of computer football simulators? I used to play this game a lot. Some will say that this is a waste of time, but I don't agree with that. This game inspired me to develop a pet project, which became my bachelor's degree.

While playing FIFA, the user sees a small map with the location of the players and the ball on the field; this interface element is a very useful feature, without which it is impossible to imagine a full-fledged gameplay. It seemed to me that it would be nice to transfer this map to the real world using a video recording of the match and a neural network.

I started developing the project in 2021, and now I’m working on improving it as part of my master’s thesis.

FIFA game interface example

FIFA game interface example

Despite the fact that in 2021, much fewer people were interested in neural networks, and AI was used mainly by enthusiasts, I wanted to get involved and try them in my project. By the way, during my studies I had no contact with neural networks and started implementing the project with absolutely zero knowledge on this topic.

In my master's thesis, in addition to drawing the map, I thought it would be nice to somehow analyze the match that is being played or at least provide some kind of statistical data. This is how the idea came to implement separate application modules that would collect statistical parameters based on the source image. These parameters will be: percentage of ball possession, pass map, ball touch map.

Technologies and tools

I chose Python 3.9 as a programming language because I was more concerned about the possibility and simplicity of writing code, especially when using neural networks that were unclear at that time, rather than optimizing the program code, which I would conquer at the final stage of the project in my undergraduate degree.

OpenCV2 was chosen as the toolkit for working with video images. This library is a comprehensive and simple tool that allows you to read and save videos, transform perspective, and draw various simple geometric shapes directly on top of the video. We can say that OpenCV2 has no competitors that can be similar in terms of the user needs it covers.

It was necessary to choose a ready-made model as a neural network, since it is almost impossible to build a more or less competent neural network with zero knowledge. In addition, for this task, an exclusively convolutional neural network was needed, which converts the image into a convolution, a reduced converted copy into a video of a numerical array, which can then be subjected to classification. It was decided to use the YOLOv3 architecture.

Based on tests using the COCO mAP-50 metric system, the YOLOv3 architecture was the fastest in identifying objects. AP (average precision) – calculates the average precision of recall in the range from 0 to 1. Recall measures how well the neural network finds positive samples. IoU (intersection over union) – measures the difference in overlap between two areas to determine the percentage of overlap between the predicted area of ​​an object and its actual area. The higher the IoU value, the more accurately the object is identified. COCO mAP (mean average precision) – the average value for AP for a set of COCO classes. mAP-50 means that the IoU should be close to 0.5 for the tests performed.

Already starting my master's work, it was decided to switch to the newer YOLOv8 model. The figure shows comparisons that influenced the choice in favor of YOLOv3 in 2021.

Graph of mAP-50 versus time for object detection for various convolutional network architectures

Graph of mAP-50 versus time for object detection for various convolutional network architectures

To implement statistics-related modules, it was decided to implement the DEEPSORT object tracking library, but after a couple of tests I was not satisfied with the performance and it was decided to use another tracker, ByteTrack. It turned out to be faster and easier to use. For those interested, here is a link to the description: https://arxiv.org/pdf/2110.06864.pdf

The result was the following system operation diagram:

System diagram

System diagram

More about development

First of all, it was necessary to find a recording of a football match with static recording on the field. A moving camera with zoom was not suitable because it had to be linked to the coordinates of the field map.

Next, I launched the video using Python and OpenCV2 and connected the YOLOv8 neural networks. This helped to understand the principle of operation and how to connect and configure the network. Additionally, I conducted experiments with neural network weights. To implement the determination of whether a player belongs to a particular team, I applied a color mask to the object using the OpenCV2 inRange() method.

After that, I proceeded to the implementation stage of transferring objects from the video to the map using perspective transformation, which is included in OpenCV2 with the getPerspectiveTransform method. To map field edges and objects, the class was used PixelMapper.

Drawing onto the map also took place using OpenCV2.

Original image

Original image

Image with distorted perspective

Image with distorted perspective

To add the ability for the application to work with CUDA, I used CMAKE, with which you can rebuild the OpenCV2 library with access to the video card.

The result was a minimum viable product (MVP), which became my bachelor's thesis.

I started working on improving the project with the implementation of DeepSORT, a tracking library that monitors the movement of players.

The player's identification number is retained. But after implementation, the ball was no longer defined as an object, so I decided to train my own model with two classes: ball, player. For this I found dataset for marking and further training on the roboflow platform.

In the process, it turned out that DeepSORT did not work as efficiently as expected, so I “moved” to ByteTrack. After this, you could start implementing markers and modifying the rendering on the video image, as well as collecting statistics: percentage of ball possession, touch map, pass map.

In the upper left corner you can see the percentage of ball possession of one of the teams, depending on the color. Listing 1 shows the code for the ball possession statistics collection class.

Listing 1 – Code for the ball possession statistics collection class

@dataclass

class PossessionService:

    color_main: Colo
    color_reserved: Color

    frames_total: Optional[int] = 0

    frames_main: Optional[int] = 0

    frames_reserve: Optional[int] = 0

    possession_main: Optional[float] = 0

    possession_reserve: Optional[float] = 0

 

    def __calculate_frames(self, detections: List[Detection]):

        for detection in detections:

            if detection.team == TEAM1:

                self.frames_main += 1

                self.frames_total += 1

            if detection.team == TEAM2:

                self.frames_reserve += 1

                self.frames_total += 1

        return

 

    def __calculate_possession(self, detections: List[Detection]):

        if len(detections) == 0:

            return

        self.__calculate_frames(detections)

        if self.frames_total == 0:

            return

        self.possession_main = round(self.frames_main / self.frames_total * 100, 1)

        self.possession_reserve = round(self.frames_reserve / self.frames_total * 100, 1)

 

    def annotate(self, image: np.ndarray, detections: List[Detection]) -> np.ndarray:

        self.__calculate_possession(detections)

        annotated_image = image.copy()

        annotated_image = draw_text(image=image,

                                    text=f'Team #1: {self.possession_main} %',

                                    anchor=pr.Point(x=POSSESSION_POINT_MAIN[0], y=POSSESSION_POINT_MAIN[1]),

                                    color=self.color_main,

                                    font_scale=0.7)

        annotated_image = draw_text(image=annotated_image,

                                    text=f'Team #2: {self.possession_reserve} %',

                                    anchor=pr.Point(x=POSSESSION_POINT_RESERVE[0], y=POSSESSION_POINT_RESERVE[1]),

                                    color=self.color_reserved,

                                    font_scale=0.7)

        return annotated_image
This is a pass card.  Listing 2 shows a class for collecting statistics of completed passes.

This is a pass card. Listing 2 shows a class for collecting statistics of completed passes.

Listing 2 – Class for collecting statistics of completed passes

@dataclass
class PreviousPlayerData:
    id: int
    team: int
    x: int
    y: int

@dataclass
class Pass:
    src_x: int
    src_y: int
    dest_x: int
    dest_y: int
    src_id: int
    dest_id: int
    color: Color

@dataclass
class PassCollector:
    pm: PixelMapper
    colors: List[Color]
    passes: List[Pass] = field(default_factory=list)
    previous_player: PreviousPlayerData = None

    def __get_color_by_team(self, team) -> Color:
        if team == Consts.TEAM1:
            return self.colors[0]
        if team == Consts.TEAM2:
            return self.colors[1]
        return self.colors[0]

    def __new_previous_player_data(self, detection: Detection) -> PreviousPlayerData:
        lonlat = tuple(self.pm.pixel_to_lonlat((int(detection.rect.x), int(detection.rect.y)))[0])
        return PreviousPlayerData(
            id=detection.tracker_id,
            team=detection.team,
            x=int(lonlat[0]),
            y=int(lonlat[1]))

    def append(self, player_in_possession_detection: List[Detection]):
        if not player_in_possession_detection:
            return
        if self.previous_player is None:
            self.previous_player = self.__new_previous_player_data(player_in_possession_detection[-1])
            return
        if self.previous_player.id == player_in_possession_detection[-1].tracker_id:
            self.previous_player = self.__new_previous_player_data(player_in_possession_detection[-1])
            return
        if self.previous_player.team != player_in_possession_detection[-1].team:
            self.previous_player = self.__new_previous_player_data(player_in_possession_detection[-1])
            return
        lonlat = tuple(self.pm.pixel_to_lonlat((int(player_in_possession_detection[-1].rect.x),
                                                int(player_in_possession_detection[-1].rect.y)))[0])
        self.passes.append(Pass(
            src_x=self.previous_player.x,
            src_y=self.previous_player.y,
            dest_x=int(lonlat[0]),
            dest_y=int(lonlat[1]),
            src_id=self.previous_player.id,
            dest_id=player_in_possession_detection[-1].tracker_id,
            color=self.__get_color_by_team(self.previous_player.team)
        ))
        self.previous_player = self.__new_previous_player_data(player_in_possession_detection[-1])
        return

    def __draw_adapter(self, passes_pitch, pass_item: Pass):
        passes_pitch = DrawUtil.draw_circle(
            image=passes_pitch,
            lonlat=(math.trunc(pass_item.src_x), math.trunc(pass_item.src_y)),
            color=pass_item.color,
            radius=3)
        passes_pitch = DrawUtil.draw_text(
            image=passes_pitch,
            anchor=pr.Point(x=pass_item.src_x, y=pass_item.src_y),
            text=str(pass_item.src_id),
            color=Color(255, 255, 255),
            thickness=1)
        passes_pitch = DrawUtil.draw_circle(
            image=passes_pitch,
            lonlat=(math.trunc(pass_item.dest_x), math.trunc(pass_item.dest_y)),
            color=pass_item.color,
            radius=3)
        passes_pitch = DrawUtil.draw_text(
            image=passes_pitch,
            anchor=pr.Point(x=pass_item.dest_x, y=pass_item.dest_y),
            text=str(pass_item.dest_id),
            color=Color(255, 255, 255),
            thickness=1)
        passes_pitch = DrawUtil.draw_line(
            image=passes_pitch,
            src_x=pass_item.src_x,
            src_y=pass_item.src_y,
            dest_x=pass_item.dest_x,
            dest_y=pass_item.dest_y,
            color=pass_item.color
        )
        return passes_pitch

    def get_image(self, pitch):
        passes_pitch = copy.deepcopy(pitch)
        for pass_item in self.passes:
            passes_pitch = self.__draw_adapter(passes_pitch, pass_item)
        return passes_pitch

This is a ball touch map.  Listing 3 shows a class for collecting ball touch statistics.

This is a ball touch map. Listing 3 shows a class for collecting ball touch statistics.

Listing 3 – Ball touch statistics collection class

@dataclass
class Touch:
    x: int
    y: int
    id: int
    color: Color


@dataclass
class TouchCollector:
    pm: PixelMapper
    colors: List[Color]
    touches: List[Touch] = field(default_factory=list)

    def __get_color_by_team(self, team) -> Color:
        if team == Consts.TEAM1:
            return self.colors[0]
        if team == Consts.TEAM2:
            return self.colors[1]
        return self.colors[0]

    def __get_touch(self, detections: List[Detection]) -> Touch:
        for detection in detections:
            lonlat = tuple(self.pm.pixel_to_lonlat((int(detection.rect.x), int(detection.rect.y)))[0])
            id = detection.tracker_id
            color = self.__get_color_by_team(detection.team)
        return Touch(x=lonlat[0], y=lonlat[1], id=id, color=color)

    def append(self, player_in_possession_detection: List[Detection], ball_detections: List[Detection] = None):
        if not player_in_possession_detection:
            return
        self.touches.append(self.__get_touch(player_in_possession_detection))

    def __draw_adapter(self, touches_pitch, touch: Touch):
        touches_pitch = DrawUtil.draw_circle(
            image=touches_pitch,
            lonlat=(math.trunc(touch.x), math.trunc(touch.y)),
            color=touch.color)
        touches_pitch = DrawUtil.draw_text(
            image=touches_pitch,
            anchor=pr.Point(x=touch.x, y=touch.y),
            text=str(touch.id),
            color=Color(255, 255, 255),
            thickness=1)
        return touches_pitch

    def get_image(self, pitch):
        touches_pitch = copy.deepcopy(pitch)
        for touch in self.touches:
            touches_pitch = self.__draw_adapter(touches_pitch, touch)
        return touches_pitch

“I clicked something and everything disappeared” or what difficulties there were

The first difficulty was the situation when in MVP the final video was played back at 13-15 fps, which was annoying during testing and demonstration. After enabling CUDA and rebuilding the OpenCV2 library, the situation improved. After switching to YOLOv8, it was enough to just connect the torch library and unlock CUDA inside it, which applies to all program code.

The second difficulty is that after connecting DeepSORT, the ball, which was the most important object on the field for collecting statistics, was no longer detected. Then I decided to train my model. A lot of time was spent searching for the dataset and marking it up. And it took about 10 hours to train the YOLOv8x model with 50 epochs. This is not to say that the training result is excellent – the ball still sometimes disappears, players are not always classified accurately, but it is enough for the moment.

Now the entire software part of the system has been implemented, and I am writing a final qualifying thesis, which will be the documentation of this system.
I see potential in expanding the number of statistical values, so I plan to add a map of interceptions and shots on goal.
But I’m not ready to release the system to users yet, because, in my opinion, it’s still too raw. 🙂

While working on this pet project, I not only learned all the pain of working with neural network tools, but also understood the principle of their work, learned their strengths and weaknesses, and learned to work with metrics that determine the quality of a neural network. Well, I was once again convinced that the best way to learn something new is to practice working with this very new thing, and also that having confidence and sufficient curiosity greatly pushes the boundaries of possibilities. At the moment, I don’t think that a particular pet project in its current implementation can be called a professional tool that everyone who deals with sports event analytics should use, but I think that such developments as a whole are moving the world into a more automated and technologically advanced future.
Studying for a long time takes up most of my free time, so I don’t have much time to reflect on other pet projects outside of school, but there are some thoughts about other developments related to neural networks and not only that I plan to implement in the near future, but for now nothing specific I won’t share to preserve the originality of my ideas 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *