SQL for Junior Data Engineers: examples of business problems

Entering the Data Engineer profession requires not only proficiency in tools for building data, but also confident knowledge of SQL to solve problems of varying complexity. Although many SQL queries may appear to be “analytical,” in practice, Data Engineers are often responsible for writing and optimizing them. After all, analysts and product specialists need fast, accurate access to data to analyze it, which means DEs must provide access to the data they need and help create queries to process large volumes of information.

SQL as a Data Engineer and interaction with business teams

The role of a Data Engineer goes beyond technical support, as these specialists often work at the intersection of data and business. They collaborate with analysts, marketers and other business teams to provide accurate, aggregated data for decision-making. Data Engineers not only create pipelines for data transfer, but also optimize SQL queries to make data access as fast and efficient as possible. This directly impacts how quickly and accurately a business can respond to change.

Data Engineers also help build complex analytical queries that are used in a variety of business cases, ranging from marketing campaigns to logistics optimization. The following sections present tasks that are regularly encountered in Data Engineer practice and correspond to the Junior level.


Database structure and table descriptions

Table clients
Description: Stores data about the company's clients.
Columns:

  • client_id – client ID

  • first_name — client name

  • last_name – client's last name

Table drives
Description: Stores trip data.
Columns:

  • drive_id — trip identifier

  • client_id – client ID

  • start_location — starting point of the trip

  • end_location — the end point of the trip

  • start_time — date and time of the start of the trip

  • end_time — date and time of the end of the trip

  • fare — cost of the trip

Examples of business problems

Example tasks discussed below cover customer activity analysis, cost segmentation, building profiles for marketing strategies, and other cases. These SQL queries are an integral part of the DE operation, allowing you to optimize data processing and ensure that business logic is maintained at every stage.

Case 1: Analysis of customer activity for marketing strategy

Situation: The marketing team wants to know which customers have been active in the last month in order to create targeted offers and promotions. Active customers are those who have traveled in the last 30 days.

Task: Get a list of unique first and last names of active customers in the last month to send them personalized offers.

Solution: Using a subquery to filter client_idcorresponding to recent trips, with a dynamic filter to count the last 30 days from the current date.

SELECT DISTINCT 
    last_name, 
    first_name
FROM clients
WHERE client_id IN (
    SELECT DISTINCT client_id
    FROM drives
    WHERE start_time >= CURRENT_DATE - INTERVAL '30 days'
);

This request allows you to quickly identify those who actively use the service and focus marketing efforts on the relevant audience.


Case 2: Segmentation of trips by cost to optimize offers

Situation: The company wants to segment customers depending on their spending in order to offer more favorable conditions to customers with a high average bill. Customers who make expensive trips may be interested in premium service packages.

Task: Classify trips into categories “expensive” and “cheap” depending on the cost, and also display the starting location of the trip for further analysis.

Solution: Usage JOIN And CASEto automatically categorize trips by cost.

SELECT 
       c.first_name, 
       c.last_name, 
       d.start_location, 
       d.end_location,
       d.fare,
       CASE
           WHEN d.fare > 3000 THEN 'Дорогая'
           ELSE 'Дешевая'
       END AS fare_category
FROM clients AS c
JOIN drives AS d ON c.client_id = d.client_id;

Case 3: Identifying VIP clients for improved service

Situation: The company plans to introduce a VIP program for customers who regularly spend large sums on travel. The program will improve the experience for these customers and increase their loyalty.

Task: Find all customers whose total travel expenses exceed a certain amount, for example, 5000.

Solution: Using grouping and HAVINGto filter customers based on their total spend.

SELECT 
    c.first_name, 
    c.last_name, 
    SUM(d.fare) AS total_fare
FROM clients AS c
JOIN drives AS d ON c.client_id = d.client_id
GROUP BY c.first_name, c.last_name
HAVING SUM(d.fare) > 5000;

Case 4: Logistics – trip sequence analysis

Situation: The company's logistics team wants to understand in what order customers most often use the service in order to optimize routes and offers.

Task: Determine the order of trips for each client, sorting them by time.

Solution: Using the window function ROW_NUMBER() to number each customer's trips in order.

SELECT 
    c.first_name, 
    c.last_name, 
    d.start_location, 
    d.end_location, 
    d.start_time,
    ROW_NUMBER() OVER(PARTITION BY c.client_id ORDER BY d.start_time) AS trip_number
FROM clients AS c
JOIN drives AS d ON c.client_id = d.client_id;

Case 5: Highlighting high-value trips for pricing analysis

Situation: The pricing team wants to see trips that cost more than average to understand which routes and conditions lead to higher costs.

Task: Find trips that cost above average.

Solution: Using a subquery with AVGto select trips above average cost.

SELECT 
    d.drive_id, 
    client_id, 
    fare
FROM drives d
WHERE fare > (SELECT AVG(fare) FROM drives);

Case 6: Finding customers with a lot of trips for loyalty

Situation: Marketing is planning a loyalty program for active customers. It is necessary to identify clients who frequently use the services.

Task: Find customers who have made more trips than the average number of trips among all customers.

Solution: Using a subquery and HAVING to calculate the average number of trips and filter out customers with a higher than average number of trips.

SELECT 
    c.first_name, 
    c.last_name, 
    COUNT(d.drive_id) AS num_of_trips
FROM clients AS c
JOIN drives AS d ON c.client_id = d.client_id
GROUP BY c.first_name, c.last_name
HAVING COUNT(d.drive_id) > (
    SELECT AVG(num_trips) 
    FROM (SELECT client_id, COUNT(drive_id) AS num_trips FROM drives GROUP BY client_id) AS avg_trips
);

Case 7: Building a customer profile for personalized offers

Situation: The marketing department wants to build customer profiles to personalize offers.

Task: Get the maximum cost and total amount of all trips for each customer.

Solution: Using window functions MAX And SUM to aggregate data for each client.

SELECT 
       c.first_name, 
       c.last_name,
       MAX(d.fare) OVER(PARTITION BY c.client_id) AS max_fare,
       SUM(d.fare) OVER(PARTITION BY c.client_id) AS total_fare
FROM clients AS c
JOIN drives AS d ON c.client_id = d.client_id;

Case 8: Segmentation of clients by region and last name for targeted marketing

Situation: The marketing department is launching a targeted campaign in a specific region and wants to segment customers.

Task: Find clients from Moscow whose last names begin with the letter “A”.

Solution: Usage JOIN with several conditions to filter by location and last name.

SELECT 
      c.first_name, 
      c.last_name, 
      d.start_location, 
      d.end_location, 
      d.fare
FROM clients AS c
JOIN drives AS d ON c.client_id = d.client_id
WHERE 1 = 1
      AND d.start_location = 'Москва' 
      AND c.last_name LIKE 'А%';

Tips for optimizing queries

SQL queries used by Data Engineers often require optimization to handle large volumes of data. To speed up, for example, in case 1, you can create an index on the column start_time tables drives:

CREATE INDEX idx_drives_start_time ON drives (start_time);

Another trick is to replace subqueries with JOIN in cases where it improves performance.

Conclusion: SQL as a Foundation for Data Engineer Growth

This level of SQL is the required minimum for Junior Data Engineers, which is tested during interviews and is the basis for work. But in order to develop in the profession, you need to improve in query optimization, writing complex analytical queries and developing ETL processes.

Data Engineers, with deep knowledge of SQL, solve business problems and actively influence processes within the company, making it more flexible and data-driven.


If you are interested in the topic of data engineering, I invite you to look into my blog on Telegramwhere I share working cases, examples of tasks and experience in this area. The content will be especially useful for beginners, but I'm always happy to chat with more experienced engineers. I hope that the materials on the channel will be useful to you.

Also in my repository You can find additional materials and examples that I publish for study and practice.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *