SQL questions that are often asked in interviews. Part 1

SQL (Structured Query Language) is a programming language used to manage relational databases. In this article, I have collected SQL questions that you may encounter in an interview. They are often asked to test general knowledge and skills.

1. What is a primary key in SQL and why is it important?

In SQL, a primary key is a field or combination of fields that uniquely identifies a specific row in a table. The primary key is important because it ensures that there are no duplicate rows in the table and also allows you to query and index the table efficiently.

For example, you have a table of employees and you want each employee to have a unique ID. You can create a primary key field called “employee_id”. This field will be an integer that increases by one for each new employee added to the table. When you create the primary key, you will also set a constraint that ensures that the “employee_id” field does not contain null values ​​or duplicates.

Having a primary key also allows the table to be indexed efficiently, which improves query performance. When you query a table using a primary key, the database engine can quickly find the row you need without scanning the entire table.

In general, a primary key is a critical component of a well-designed database design because it ensures data integrity and allows for efficient querying and indexing.

2. What is a foreign key and how is it used to establish relationships between tables?

A foreign key is a column or set of columns that references the primary key of another table. It is used to establish a relationship between two tables.

Adding a foreign key to a table creates a relationship between the data in that table and the data in another table. This relationship ensures that the data in the two tables always matches each other.

Let’s say we have two tables: one for orders, the other for customers. We can create a foreign key in the orders table that will reference the customer ID in the customers table. This will establish a relationship between the two tables based on the client ID.

To illustrate this, consider the example below.

We have two tables: one for employees, the other for departments. The employees table has a foreign key column that references the department ID column in the departments table:

CREATE TABLE departments( 
    dept_id INT PRIMARY KEY, 
    dept_name VARCHAR(50) NOT NULL 
);

CREATE TABLE employees (
     emp_id INT PRIMARY KEY,
     emp_name VARCHAR(50) NOT NULL,
     dept_id INT, 
    FOREIGN KEY (dept_id) REFERENCES departments(dept_id));

In this example, the “departments” table contains information about the various departments of the company, and the “employees” table contains information about the company’s employees. A foreign key in the “employees” table ensures that each employee is associated with a department that exists in the “departments” table.

By ensuring referential integrity between two tables, a foreign key prevents data inconsistency and ensures data accuracy. For example, if we try to delete a department from the “departments” table that has employees associated with it in the “employees” table, the database will not allow us to do this because it would violate the referential integrity established by the foreign key.

In general, a foreign key is a column or set of columns in a table that establishes a relationship from one table to another based on the second table’s unique primary key. It guarantees data consistency and accuracy by ensuring referential integrity between two tables.

3. What is the difference between a database and a schema?

In SQL, a database is a collection of related data that is stored in an organized, structured way. It typically contains one or more tables, as well as other objects such as views, stored procedures, and indexes. A schema is a container for database objects, including tables, views, and stored procedures.

A database can have multiple schemas, with each schema containing a subset of database objects. A schema allows you to logically group related objects and separate them from other objects in the same database. This can help with organization, security, and access control.

For example, imagine a database for a retail store. It may have multiple layouts for different departments such as sales, inventory, and human resources. Each schema will contain tables and other objects related to that department. This will make it easier to manage the database and ensure that only the relevant data is available for each department.

In general, a database is a repository for all data and objects, and a schema is a container for a subset of these objects, providing organization and separation of concerns.

4. What is a GROUP BY clause and how is it used?

A GROUP BY clause is an SQL statement used to group rows with the same values ​​in one or more columns into summary rows, for example, “find the total sales for each product.”

The GROUP BY statement is used in conjunction with the SELECT statement and requires that the SELECT statement use at least one aggregate function, such as SUM, COUNT, AVG, MAX, or MIN. The GROUP BY clause is usually followed by the name(s) of the column(s) by which you want to group the data.

For example, if you have a table “Sales” with columns “Product”, “Date” and “Sales Amount”, and you want to find total sales amount for each product, you can use the following SQL query:

SELECT Product, SUM([Sales Amount]) as TotalSales
FROM Sales
GROUP BY Product;

The result will be a table with two columns: “Product” and “TotalSales”, where each row represents a unique product and its total sales.

The GROUP BY clause can also be used with multiple columns, allowing you to group data by each unique combination of columns. For example:

SELECT Product, Date, SUM([Sales Amount]) as TotalSales
FROM Sales
GROUP BY Product, Date;

The result will be a table with three columns: “Product”, “Date” and “TotalSales”, where each row represents a unique combination of product and date, as well as the total number of sales.

In general, the GROUP BY clause is used in SQL to group rows with the same values ​​in one or more columns into summary rows, using aggregate functions to perform calculations on the grouped data.

5. What is self-join and when is it used?

Self-join in SQL is a type of join operation in which a table is joined to itself. This is useful when you have a table with related data in different rows that you want to join based on a common field.

For example, consider the table “employees” with columns for employee ID and name, and manager ID. The Manager ID column contains the employee’s manager ID. To get a list of all employees with their manager’s name, you can use self-join.

Here is an example request:

SELECT e.name AS employee_name, m.name AS manager_name
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id;

In this query, we join the “employees” table to itself using the “manager_id” column to match each employee with their manager. The resulting table will have columns with the employee’s name and the name of his manager.

Self-join can also be used to find relationships between data in the same table. For example, if you have a product table with columns for product ID, title, and parent product ID, a join within the table can be used to find all the “child” products of a given “parent” product.

6. What is the difference between an inner and outer join?

Joins are used to combine data from multiple tables based on a common column or their relationship. The two most common types of joins are inner joins and outer joins.

An inner join returns only matching rows from both tables based on the join condition. For example, if we have two tables A and B and we perform an inner join using a common column C, only rows where C matches in both tables will be returned.

Here is an example query for an inner join:

SELECT A.column1, B.column2
FROM A
INNER JOIN B
ON A.C = B.C;

On the other hand, an outer join returns all rows from one table and matching rows from another table. If there are no matching rows in the second table, the result will contain NULL values ​​for all columns of that table. Outer joins are also divided into left outer join, right outer join and full outer join.

Here is an example query for a left outer join:

SELECT A.column1, B.column2
FROM A
LEFT OUTER JOIN B
ON A.C = B.C;

This query will return all rows from table A and matching rows from table B based on column C. If there are no matching rows in table B, the result will contain NULL values ​​for column2.

In general, the main difference between an inner and outer join is that the inner join returns only the matching rows, while the outer join returns all the rows from one table and the matching rows from the other table.

7. What is the difference between a correlated and an uncorrelated subquery?

In SQL, a subquery is a query that is nested within another query, and it can be either correlated or uncorrelated. The main difference between them is how they refer to the external request.

An uncorrelated subquery can be evaluated independently of the outer query. It does not reference any column of the outer query and can be executed on its own. Consider the following example, which uses an uncorrelated subquery to retrieve the average employee salary (AVG(salary)):

SELECT first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Subquery (SELECT AVG(salary) FROM employees) is an uncorrelated subquery because it can be executed independently of the outer query. It retrieves the average salary of all employees, and the result is used in the WHERE clause of the outer query to filter out employees whose salary is greater than the average.

On the other hand, a correlated subquery references one or more columns from the outer query using values ​​from it. Let’s look at an example that uses a correlated subquery to get the total sales for each employee:

SELECT first_name, last_name, 
  (SELECT SUM(amount) FROM sales WHERE sales.employee_id = employees.employee_id) AS total_sales
FROM employees;

Subquery (SELECT SUM(amount) FROM sales WHERE sales.employee_id = employees.employee_id) is correlated because it refers to a column employee_id from an external request. It is evaluated for each employee in the external query, and the result is used to calculate the total sales for each employee.

In general, the main difference between correlated and uncorrelated subqueries in SQL is how they reference the outer query. An uncorrelated subquery can be evaluated independently of the outer query, while a correlated one is evaluated for each row of the outer query using its values.

8. What is a common table expression (CTE) and how is it used?

A common table expression (CTE) is the result of a table expression that is temporarily stored in memory and can be accessed again. It allows the user to define a subquery that can be referenced multiple times within a larger query.

CTEs are defined using the WITH keyword followed by the name of the CTE and a subquery that defines it. The CTE syntax is as follows:

WITH cte_name AS (
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition
)
SELECT *
FROM cte_name

Once defined, the CTE can be used in subsequent queries as if it were a table. This can be useful in situations where a subquery needs to be referenced multiple times in a larger query, as it simplifies the syntax and improves readability.

CTEs can also be recursive, allowing for more efficient querying of hierarchical data. A recursive CTE includes an anchor part and a recursive part and can be used to traverse the hierarchy until a certain condition is met.

Overall, CTE is a powerful SQL feature that can simplify complex queries and improve performance.

9. What is the difference between DELETE and TRUNCATE statements?

DELETE and TRUNCATE statements are used to delete data from a table. However, they differ in their functionality and impact on the table.

The DELETE statement is used to remove specific rows from a table based on a condition specified in the WHERE clause. It can also be used to remove all rows from a table without specifying a condition. The DELETE statement deletes rows one at a time, which can be a slow process for large tables.

The TRUNCATE statement is used to remove all rows from a table at one time. This is a faster method of deleting data compared to DELETE. However, TRUNCATE does not allow the use of a WHERE clause and cannot selectively remove specific rows.

Another difference between DELETE and TRUNCATE is that DELETE can be rolled back using the transaction log, while TRUNCATE cannot. Once the TRUNCATE statement is executed, the data is permanently removed from the table.

In general, if you want to selectively delete specific rows from a table or roll back changes, use the DELETE statement. If you want to remove all rows from a table and free up disk space used by the table, you should use the TRUNCATE statement.

10. What is a temporary table and how is it used?

A temporary table is a type of table that is created and exists only for the duration of a session or transaction. It is not permanently stored in the database and is deleted automatically.

Temporary tables can be used to store intermediate results or to break complex queries into simpler steps. They are especially useful when a query requires multiple steps or complex calculations, as they help improve query performance and simplify query syntax.

Temporary tables can be created using the CREATE TEMPORARY TABLE statement. They can be created in memory or on disk, depending on the database system and configuration.

Temporary tables can be used like regular tables in SQL queries and populated with data using INSERT statements. They can also be joined with other tables or used in subqueries.

One common use case for temporary tables is to store and process intermediate results in complex queries, especially those that involve joins or aggregations. For example, a temporary table can be used to store the results of a join operation, which can then be used for further manipulation or joins to other tables in subsequent stages of the query.

11. What is the difference between HAVING and WHERE clauses?

SQL uses both HAVING and WHERE clauses to filter data in a query. However, there are some differences between them.

The WHERE clause is used to filter data before grouping or aggregating it and is used in SELECT, UPDATE, and DELETE statements. It filters data based on conditions that apply to individual rows.

For example, if you want to retrieve data about all employees whose salary is greater than $50,000, you can use a WHERE clause in the SELECT statement:

SELECT *
FROM employees
WHERE salary > 50000;

The HAVING clause is used to filter data after it has been grouped or aggregated and is used only with the SELECT statement. It filters data based on conditions that apply to groups of rows.

Let’s say you want to get the average salary of employees in each department and show only those departments where the average salary is greater than $50,000. You can use the HAVING clause in a SELECT statement:

SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

In this example, the GROUP BY clause groups data by department, and the AVG function calculates the average salary for each department. The HAVING clause filters results to show only departments with average salaries greater than $50,000.

In general, the WHERE clause is used to filter individual rows before grouping or aggregation, and the HAVING clause is used to filter groups of rows after grouping or aggregation.

12. What is a window function and how is it used?

A window function is a type of function in SQL that performs calculations on a set of rows in a specific “window” or range. It is used to solve complex analytical problems that cannot be easily solved using simple aggregate functions.

Window functions can be used to perform calculations such as moving averages, subtotals, rankings, row numbering, and percentage of total. They operate on a subset of rows defined by the OVER() clause, which specifies the window or range for the function.

Consider a sales data table containing date, region, product, and sales amount columns. Here’s an example of a window function that calculates the moving average of sales for each product in each region over a three-month period:

SELECT 
    date, 
    region, 
    product, 
    sales_amount,
    AVG(sales_amount) OVER (
        PARTITION BY region, product 
        ORDER BY date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as rolling_avg_sales
FROM 
    sales_data

In this request the function AVG() used as a window function to calculate a moving average of sales. Offer PARTITION BY divides the data into sections or groups by region and product, and the offer ORDER BY sorts data by date within each section. Offer ROWS BETWEEN specifies the range of lines that should be included in the window frame, in this case the current line and the previous two.

Window functions can greatly simplify complex analytical queries and enable more efficient processing of large data sets.

13. What is the difference between a transaction and a batch?

In SQL, a transaction is a single logical unit of work that includes one or more SQL statements. The transaction is executed atomically. This means that all statements within a transaction must either be committed or rolled back as a whole.

Transactions ensure consistency and integrity of data in the database, allowing multiple operations to be grouped and executed as a single unit. If any of the statements in a transaction fail, the entire transaction is aborted and the database is returned to its previous state.

On the other hand, a batch is a collection of SQL statements that are submitted to the database to be executed as a group. Unlike transactions, batches do not provide the same level of atomicity or consistency guarantee. Each statement in a batch is executed separately, and any errors or exceptions are handled independently. Packages are typically used for tasks such as loading data into a database, running reports, or performing routine maintenance tasks.

In general, a transaction is used to combine multiple SQL queries into a single logical unit of work that must be executed atomically, while a batch is used to send a set of SQL queries to the database to be executed as a group without the same level of transactional guarantees.

14. What is the difference between a scalar and a table function?

In SQL, a function is a set of instructions that can be used to perform a specific task. There are two types of functions: scalar and table.

A scalar function returns a single value and is used in a query to convert input values ​​to output values. For example, a scalar function can be used to perform mathematical operations, such as finding the square root of a number, or to manipulate strings, such as converting them to upper or lower case.

The table function, on the other hand, returns a table as its result set. This means that a table-valued function can be used in a query just like a table, allowing you to join, filter, and aggregate the data it returns. Table functions are useful for complex data manipulations, when the result set is not known in advance, or when you want to reuse a query as a table.

One of the key differences between scalar and table functions is that scalar functions can be called within a query, that is, used as part of a SELECT, WHERE, or ORDER BY clause. Table functions must be called as part of a FROM clause because they return a table.

Another difference is that scalar functions return one value for each row, while table functions can return multiple rows. Scalar functions are usually simpler and faster than table functions, but they are less flexible and cannot be used in many situations.

In general, scalar functions return a single value and are used to transform input values, while tabular functions return a table and are used to manipulate and aggregate data.

15. What is normalization and why is it important?

Normalization is the process of organizing data in a database in such a way as to reduce redundancy and ensure integrity. It involves breaking the database into smaller, more manageable tables and establishing relationships between them.

Normalization is important for several reasons. First, it helps eliminate data redundancy that can lead to inconsistencies and errors. By organizing data into separate tables and linking them together, we can ensure that each piece of information is stored only once, making updating and maintenance easier.

Second, normalization helps maintain data consistency and accuracy. When data is spread across multiple tables, we can set rules and restrictions that will ensure the data is entered correctly and meets certain standards.

Finally, normalization makes data querying and analysis easier. By breaking the database into smaller, more specific tables, we can more efficiently retrieve and manipulate data.

Normalization is usually done through a series of steps known as normal forms. The most commonly used normal forms are first normal form (1NF), second normal form (2NF), and third normal form (3NF). Each normal form builds on the previous one, adding more rules and restrictions with each step to ensure the integrity and consistency of the data.

Here is the link to the second part of this series of SQL interview questions

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *