Entering the world of game development

A little about us

I am a professional machine learning (ML) student and have actively participated in competitions on Kaggle. It allows me to develop skills and explore new solutions, but Kaggle takes a lot of time and does not generate income. Because of this, my friends and I decided to create something of our own, which may turn into a startup in the future (as they say, “every hobby should end in a startup”). Our team consists of two machine learning specialists and one front-end developer. ML specialists work on the backend, while the frontend developer deals with his direct responsibilities.

Based on this, we decided to try to create mobile games using machine learning techniques, even if it was only one of the elements of the game.

Our first project

Encouraged by podcasts and articles, we decided that creating a “dream game” wasn’t worth it right away. Instead, we preferred to start with a simpler project in order to learn all the intricacies of the process, taking into account our work schedule – full-time work was excluded.

On one of the podcasts, I learned that an effective strategy to start is to create an MVP (minimum viable product) for Android, and if the project takes root, then expand to iOS, laying the foundation for multi-platform from the very beginning.

For the first project, we chose an application where the user must guess the movie by frame. During the development process, we decided to add ratings and interesting facts about films. Despite the emergence of many ideas, at a certain point we stopped developing additional functions in order not to complicate the first project.

Although my friend and I were well versed in machine learning and backend, developing the frontend for mobile devices was challenging. After several unsuccessful attempts to do it ourselves, we hired a friend for a nominal fee – he wanted to gain experience in mobile development. Although he did most of the work, he eventually left the project, but we parted on amicable terms.

As a result, another developer joined our team, motivated by the idea, like us, and there were three of us. He successfully completed front-end development before the first publication of the application.

Here is our first project

How we found the necessary footage from the film

The processing process began by dividing the film into scenes using the OpenCV library. Scene changes were determined by analyzing statistical changes in frame brightness. If the average absolute difference in frame luminance exceeded a given threshold, it was considered the start of a new scene.

Previously, we considered the use of the YOLO neural network model to identify objects in frames. However, despite YOLO's high accuracy in classifying and identifying objects, we encountered the problem of excessive filtering of important frames. For example, in the famous scene from the movie Forrest Gump with a flying feather, important frames for the plot did not contain recognizable objects, which led to their exclusion. As a result, YOLO was abandoned in favor of more traditional shot selection methods that were better able to capture visually significant moments.

In addition, image hashing technique using the ImageHash library was used to remove duplicate frames. This made it possible to eliminate repetitions and reduce the amount of data in the dataset.

Below is a code snippet demonstrating the basic steps of video processing:

def scene_change_detection(movie_path, output_folder, threshold=30.0, frames_per_scene=3):
    """
    Detects scene changes and extracts n random frames from each scene.
    Uses OpenCV for frame processing and scene change detection.

    :param movie_path: Path to the movie file.
    :param output_folder: Folder to save the extracted frames.
    :param threshold: Threshold for detecting scene changes based on frame differences.
    :param frames_per_scene: Number of random frames to extract from each scene.
    """
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    cap = cv2.VideoCapture(movie_path)
    if not cap.isOpened():
        print("Error: Could not open video.")
        return

    ret, prev_frame = cap.read()
    if not ret:
        print("Error: Could not read the first frame.")
        return

    prev_frame_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    frame_buffer = []
    scene_count = 0
    
    while True:
        ret, curr_frame = cap.read()
        if not ret:
            break

        frame_buffer.append(curr_frame)
        curr_frame_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        frame_diff = cv2.absdiff(curr_frame_gray, prev_frame_gray)
        diff_score = np.mean(frame_diff)

        if diff_score > threshold:
            if len(frame_buffer) > frames_per_scene:
                scene_count += 1
                selected_frames = random.sample(frame_buffer, frames_per_scene)
                for idx, frame in enumerate(selected_frames):
                    frame_name = f"scene_{scene_count}_frame_{idx}.jpg"
                    cv2.imwrite(os.path.join(output_folder, frame_name), frame)
            frame_buffer.clear()

        prev_frame_gray = curr_frame_gray

    cap.release()
    print(f"Scene detection complete. {scene_count} scenes detected.")

Using the prepared dataset, we trained a neural network using the EfficientNetV2 M architecture, known for its efficiency in image classification tasks. The network helped us distinguish “good” frames from “bad” ones by relying on the model's ability to identify important visual features, even if there were no objects such as people or visible objects directly in the frame. For example, in a scene from the movie Forrest Gump, a shot of a floating feather may be important, even though it does not contain characters.

To further filter and eliminate overly similar frames in the final application, frame embeddings were also extracted from the model. These embeddings were vectors that characterized the content of a frame at a deep level, allowing us to accurately compare and screen out similar images.

This approach not only improved the quality of frame selection for the dataset, but also provided deeper and more meaningful data preparation for our final product.

Interesting Facts

The process of adding interesting facts about movies has been effective using ChatGPT 3.5. This choice was based on the lower cost of use compared to newer versions, while maintaining a sufficient level of accuracy and information content of the answers.

We prepared specially formulated queries (prompts) that were aimed at extracting data and facts specific to each film. These queries were gradually refined through several iterations to refine and improve the quality and relevance of the information.

A key aspect of our work was optimizing prompts, which required a detailed understanding of the capabilities and limitations of ChatGPT. For example, we started with basic queries like “Tell me an interesting fact about the movie.” [название фильма]”, and gradually added refinements and context to improve the results. Testing different formulations of prompts allowed us to get the most out of the model to generate accurate and useful information.

Once the prompts were set up, the fact generation process ran smoothly and each movie was processed quickly and without significant errors, providing a reliable and valuable addition to our content.

Application design

Since our team did not have professional designers, we decided to use neural networks to create the application design. We chose MidJourney and DALL-E for this task, which allowed us to automate the process and get quality results without involving design specialists.

Using MidJourney and DALL-E gave us the opportunity to experiment with different styles and design elements, adapting them to the unique requirements of our application. Below is an example of such generations demonstrating the effectiveness of the approaches used.

Working with a publication

The first experience of publishing an application on Google Play turned out to be quite difficult for me. It seems like I've run into every possible problem, from not understanding the process to constantly fixing bugs. Despite watching many videos and articles, I still haven’t found a simple and clear guide explaining how to do everything right the first time.

P.S

Collaboration experience is also interesting: if you have ideas or experience in the field of game development, I suggest joining forces to create something truly impressive. Let's create something unique together!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *