CodeWiz

Top Database and SQL Interview Questions

What is a primary key?

A primary key is a constraint that uniquely identifies each record in a table. Primary keys must contain unique values and cannot contain NULL values. A table can have only one primary key, which may consist of single or multiple fields.

Example:


CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    UserName VARCHAR(50) NOT NULL,
    Email VARCHAR(100) UNIQUE
);

In this table, UserID is the primary key.

What is a foreign key?

A foreign key is a key used to link two tables together. It's a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table. The table with the foreign key is called the child table, and the table with the primary key is called the referenced or parent table.

Example: Let's create an Orders table with a foreign key that references the Users table.


CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderDate DATE,
    UserID INT,
    FOREIGN KEY (UserID) REFERENCES Users(UserID)
);

Here, UserID in the Orders table is a foreign key that points to the UserID primary key in the Users table, establishing a link between the two tables.

What is the difference between a `PRIMARY KEY` and a `UNIQUE KEY`?

Feature	`PRIMARY KEY`	`UNIQUE KEY`
Null Values	Does not allow null values	Allows one null value
Number per Table	Only one per table	Can have multiple per table
Clustered Index	Creates a clustered index by default	Creates a non-clustered index by default

What is a database?

A database is a collection of structured data, stored and accessed electronically. It allows for efficient data management, retrieval, and updating.

What is a DBMS?

A Database Management System (DBMS) is software that interacts with end-users, applications, and the database itself to capture and analyze the data. A DBMS allows users to create, read, update, and delete data in a database.

What is a table?

A table is a data structure that organizes information into rows and columns. Each column represents a specific attribute, and each row represents a record.

Example:


CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    UserName VARCHAR(50) NOT NULL,
    Email VARCHAR(100) UNIQUE
);

What is a view?

A view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. Views can be used to simplify complex queries, provide a layer of security by restricting access to data, and present data in a different format.

Example:


CREATE VIEW UserOrders AS
SELECT u.UserName, o.OrderDate
FROM Users u
JOIN Orders o ON u.UserID = o.UserID;

Now, you can query this view like a regular table:


SELECT * FROM UserOrders;

What is a schema?

A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated.

What is the difference between a table and a view?

Feature	Table	View
Data Storage	Stores data physically in the database	Does not store data physically
Nature	A real, physical entity	A virtual table based on a query
Modification	Can be modified (INSERT, UPDATE, DELETE)	Limited modification capabilities

What is the difference between SQL and MySQL?

SQL (Structured Query Language): A standard language for accessing and manipulating databases.
MySQL: A popular open-source Relational Database Management System (RDBMS) that uses SQL.

What are constraints?

Constraints are rules that are enforced on data columns on a table. These are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the database.

Some common constraints are:

NOT NULL: Ensures that a column cannot have a NULL value.
UNIQUE: Ensures that all values in a column are different.
PRIMARY KEY: A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table.
FOREIGN KEY: Uniquely identifies a row/record in another table.
CHECK: Ensures that all values in a column satisfy a specific condition.
DEFAULT: Sets a default value for a column when no value is specified.

Example:


CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100) NOT NULL,
    SKU VARCHAR(50) UNIQUE,
    Price DECIMAL(10, 2) CHECK (Price > 0),
    Status VARCHAR(20) DEFAULT 'In Stock'
);

This table includes:

A PRIMARY KEY on ProductID.
A NOT NULL constraint on ProductName.
A UNIQUE constraint on SKU.
A CHECK constraint to ensure Price is positive.
A DEFAULT value for the Status column.

What is data integrity?

Data integrity is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle. It is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data.

What are the different types of SQL commands?

SQL commands are divided into four main categories:

DDL (Data Definition Language): Used to define and manage the database schema.
- CREATE: Creates database objects like tables, views, or indexes.
```
CREATE TABLE Employees (ID INT, Name VARCHAR(100));
```
- ALTER: Modifies the structure of an existing database object.
```
ALTER TABLE Employees ADD Email VARCHAR(100);
```
- DROP: Deletes an entire database object.
```
DROP TABLE Employees;
```
- TRUNCATE: Removes all records from a table, but not the table itself. This is faster than DELETE and cannot be rolled back.
```
TRUNCATE TABLE Employees;
```

DML (Data Manipulation Language): Used for managing the data within schema objects.

SELECT: Retrieves data from a table.


SELECT Name, Email FROM Employees WHERE ID = 1;

INSERT: Adds new rows of data into a table.


INSERT INTO Employees (ID, Name, Email) VALUES (1, 'John Doe', 'john.doe@example.com');

UPDATE: Modifies existing records in a table.


UPDATE Employees SET Email = 'new.email@example.com' WHERE ID = 1;

DELETE: Removes existing records from a table.
```
DELETE FROM Employees WHERE ID = 1;
```

DCL (Data Control Language): Used to manage access rights to the database.
- GRANT: Gives a specific user permissions to perform certain tasks.
```
GRANT SELECT, UPDATE ON Employees TO 'some_user'@'localhost';
```
- REVOKE: Removes permissions from a user.
```
REVOKE UPDATE ON Employees FROM 'some_user'@'localhost';
```
TCL (Transaction Control Language): Used to manage transactions in the database.
- COMMIT: Saves all the changes made in the current transaction.
```
COMMIT;
```
- ROLLBACK: Undoes all the changes made in the current transaction.
```
ROLLBACK;
```
- SAVEPOINT: Sets a point within a transaction to which you can later roll back.
```
SAVEPOINT my_savepoint;
ROLLBACK TO my_savepoint;
```

What is the difference between `DELETE`, `TRUNCATE`, and `DROP` commands?

Command	`DELETE`	`TRUNCATE`	`DROP`
Type	DML	DDL	DDL
Scope	Removes one or more rows from a table	Removes all rows from a table	Removes the entire table
`WHERE` clause	Can be used	Cannot be used	Cannot be used
Rollback	Can be rolled back	Cannot be rolled back	Cannot be rolled back
Speed	Slower	Faster	Fastest

What is the difference between WHERE and HAVING clauses?

WHERE clause: Used to filter records before any groupings are made. It operates on a row-by-row basis.


SELECT * FROM Employees WHERE DepartmentID = 101;

HAVING clause: Used to filter groups after the GROUP BY clause has been applied. It operates on aggregated data.

Suppose you want to find departments with more than 2 employees.


SELECT DepartmentName, COUNT(*) AS EmployeeCount
FROM Employees
GROUP BY DepartmentName
HAVING COUNT(*) > 2;

What is the `GROUP BY` clause and how is it used?

The GROUP BY clause is used with aggregate functions (like COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns.


SELECT DepartmentName, COUNT(*) AS EmployeeCount
FROM Employees
GROUP BY DepartmentName;

What is the `ORDER BY` clause used for?

The ORDER BY clause is used to sort the result-set in ascending (ASC) or descending (DESC) order. By default, it sorts the records in ascending order.


SELECT * FROM Employees ORDER BY Name ASC;

What is the `MERGE` statement?

The MERGE statement (also known as UPSERT) is used to perform INSERT, UPDATE, or DELETE operations on a target table based on the results of a join with a source table.

For example, if you want to insert a new employee into the Employees table, but if the employee already exists, you want to update the employee's information, you can use the MERGE statement.


MERGE INTO Employees e
USING (SELECT 1 AS ID, 'John Doe' AS Name, 101 AS DepartmentID FROM DUAL) s
ON (e.ID = s.ID)
WHEN MATCHED THEN UPDATE SET e.Name = s.Name, e.DepartmentID = s.DepartmentID
WHEN NOT MATCHED THEN INSERT (ID, Name, DepartmentID) VALUES (s.ID, s.Name, s.DepartmentID);

What is the purpose of the `SAVEPOINT` command in TCL?

A SAVEPOINT is a point in a transaction that you can roll back to without rolling back the entire transaction. This is useful for implementing complex error handling logic.

What is the difference between `GRANT` and `REVOKE`?

GRANT: Gives a user specific permissions on database objects.
REVOKE: Removes user permissions on database objects.

What is the `ALTER` command used for?

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table. It can also be used to add and drop various constraints on an existing table.


ALTER TABLE Employees ADD Email VARCHAR(100);

What is the difference between `UNION` and `INTERSECT`?

UNION: Combines the result sets of two or more SELECT statements and removes duplicate rows.
INTERSECT: Returns the common rows from two or more SELECT statements.


SELECT * FROM Employees WHERE DepartmentID = 101
UNION
SELECT * FROM Employees WHERE DepartmentID = 102;


SELECT * FROM Employees WHERE DepartmentID = 101
INTERSECT
SELECT * FROM Employees WHERE DepartmentID = 102;

What is the `LIKE` operator used for?

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

%: Represents zero, one, or multiple characters.
_: Represents a single character.


SELECT * FROM Employees WHERE Name LIKE 'A%';

What is the `CASE` statement?

The CASE statement goes through conditions and returns a value when the first condition is met (like an if-then-else statement). Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause.


SELECT EmployeeName,
CASE
WHEN DepartmentID = 101 THEN 'HR'
WHEN DepartmentID = 102 THEN 'Engineering'
ELSE 'Marketing'
END AS DepartmentName
FROM Employees;

What are SQL joins? Explain the different types of joins.

SQL JOIN clauses are used to combine rows from two or more tables based on a related column between them. It's the primary way to query data that has been normalized across multiple tables.

Let's use two sample tables to illustrate the different types of joins: Employees and Departments.

Employees Table:

| EmployeeID | EmployeeName | DepartmentID |
|------------|--------------|--------------|
| 1          | Alice        | 101          |
| 2          | Bob          | 102          |
| 3          | Charlie      | 101          |
| 4          | David        | NULL         |

Departments Table:

| DepartmentID | DepartmentName |
|--------------|----------------|
| 101          | HR             |
| 102          | Engineering    |
| 103          | Marketing      |

`INNER JOIN`

Returns records that have matching values in both tables. This is the most common type of join.

Query:


SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID;

Result:

| EmployeeName | DepartmentName |
|--------------|----------------|
| Alice        | HR             |
| Bob          | Engineering    |
| Charlie      | HR             |

Notice that David (no department) and the Marketing department (no employees) are not included.

`LEFT JOIN` (or `LEFT OUTER JOIN`)

Returns all records from the left table (Employees), and the matched records from the right table (Departments). The result is NULL from the right side if there is no match.

Query:


SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
LEFT JOIN Departments d ON e.DepartmentID = d.DepartmentID;

Result:

| EmployeeName | DepartmentName |
|--------------|----------------|
| Alice        | HR             |
| Bob          | Engineering    |
| Charlie      | HR             |
| David        | NULL           |

Notice that David is included, as he is in the left table.

`RIGHT JOIN` (or `RIGHT OUTER JOIN`)

Returns all records from the right table (Departments), and the matched records from the left table (Employees). The result is NULL from the left side if there is no match.

Query:


SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
RIGHT JOIN Departments d ON e.DepartmentID = d.DepartmentID;

Result:

| EmployeeName | DepartmentName |
|--------------|----------------|
| Alice        | HR             |
| Bob          | Engineering    |
| Charlie      | HR             |
| NULL         | Marketing      |

Notice that the Marketing department is included, as it is in the right table.

`FULL OUTER JOIN`

Returns all records when there is a match in either the left or the right table. It's a combination of LEFT JOIN and RIGHT JOIN.

Query:


SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
FULL OUTER JOIN Departments d ON e.DepartmentID = d.DepartmentID;

Result:

| EmployeeName | DepartmentName |
|--------------|----------------|
| Alice        | HR             |
| Bob          | Engineering    |
| Charlie      | HR             |
| David        | NULL           |
| NULL         | Marketing      |

Notice that both David and the Marketing department are included.

`CROSS JOIN`

Returns the Cartesian product of the two tables, meaning it returns all possible combinations of rows from both tables.

Query:


SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
CROSS JOIN Departments d;

Result: (A few example rows)

| EmployeeName | DepartmentName |
|--------------|----------------|
| Alice        | HR             |
| Alice        | Engineering    |
| Alice        | Marketing      |
| Bob          | HR             |
| Bob          | Engineering    |
| ...          | ...            |

This join type is used less frequently and does not require an ON clause.

What is the difference between `UNION` and `UNION ALL`?

UNION: Combines the result sets of two or more SELECT statements and removes duplicate rows.
UNION ALL: Combines the result sets of two or more SELECT statements and includes all rows, including duplicates.

UNION ALL is faster than UNION because it doesn't have to check for duplicates.


SELECT * FROM Employees WHERE DepartmentID = 101
UNION
SELECT * FROM Employees WHERE DepartmentID = 102;


SELECT * FROM Employees WHERE DepartmentID = 101
UNION ALL
SELECT * FROM Employees WHERE DepartmentID = 102;

What is a self-join?

A self-join is a regular join, but the table is joined with itself. This is useful for querying hierarchical data or comparing rows within the same table.


SELECT e1.EmployeeName, e2.EmployeeName
FROM Employees e1
JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID;

What is a natural join?

A natural join is a type of join that creates an implicit join clause for you based on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables.

What is the difference between `INNER JOIN` and `LEFT JOIN`?

INNER JOIN: Returns only the rows that have matching values in both tables.
LEFT JOIN: Returns all rows from the left table, and the matched rows from the right table. If there is no match, the result is NULL on the right side.

What is a `CROSS JOIN` and when would you use it?

A CROSS JOIN returns the Cartesian product of two tables. This means it returns all possible combinations of rows from both tables. It is used when you want to create a combination of every row from two tables.

What is the `INTERSECT` operator?

The INTERSECT operator is used to return the results of two or more SELECT statements. However, it only returns the rows that are common to all SELECT statements.

What is the `EXCEPT` operator?

The EXCEPT operator is used to return the results of the first SELECT statement that are not present in the result of the second SELECT statement.

Can you join more than two tables?

Yes, you can join multiple tables in a single query. You can do this by chaining JOIN clauses. For example, you can join TableA with TableB, and then join the result with TableC.


SELECT * FROM TableA
JOIN TableB ON TableA.ID = TableB.ID
JOIN TableC ON TableB.ID = TableC.ID;

What is the difference between a join and a subquery?

Join: Combines rows from two or more tables based on a related column.
Subquery: A query nested inside another query. It can be used to return data that will be used in the main query as a condition to further restrict the data to be retrieved.


SELECT * FROM TableA
WHERE ID IN (SELECT ID FROM TableB);

What are some best practices for writing joins?

Use INNER JOIN whenever possible, as it is the most efficient type of join.
Use aliases for table names to improve readability.
Qualify column names with table names to avoid ambiguity.
Use the ON clause to specify the join condition.

What is a `FULL OUTER JOIN`?

A FULL OUTER JOIN returns all rows from both tables, with NULL values in columns where there is no match. It is a combination of LEFT JOIN and RIGHT JOIN.

What is database normalization?

Database normalization is the process of organizing the columns and tables of a relational database to minimize data redundancy. It involves dividing a database into two or more tables and defining relationships between the tables. The goal is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using defined relationships.

Normal Forms:

First Normal Form (1NF): The table is in 1NF if it has no repeating groups. Each cell should contain a single value, and each record needs to be unique.
Second Normal Form (2NF): The table is in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.
Third Normal Form (3NF): The table is in 3NF if it is in 2NF and there is no transitive dependency.

What is denormalization?

Denormalization is the process of intentionally introducing redundancy into a table to improve query performance. This is often done in data warehousing for reporting purposes.

Explain 1NF, 2NF, and 3NF with examples.

Let's walk through the normalization process with a practical example. Imagine we have a single table to store order information for a small shop.

Unnormalized Table (OrderData) This table contains repeating groups and redundant data.

OrderID	CustomerDetails	ProductDetails
1	101, John Doe	(P01, Apple, 10), (P02, Banana, 5)
2	102, Jane Smith	(P03, Cherry, 20)

This design has several issues:

The CustomerDetails and ProductDetails columns contain multiple values.
The product information is stored in a complex, non-atomic format.
CustomerName is repeated for every order the customer makes.

First Normal Form (1NF): Eliminate Repeating Groups

Rule: Each cell must hold a single, atomic value, and each record must be unique.

To achieve 1NF, we break the repeating groups into separate rows. This gives us a single record for each product in an order.

Orders_1NF Table (Primary Key: (OrderID, ProductID))

OrderID	CustomerID	CustomerName	ProductID	ProductName	Quantity
1	101	John Doe	P01	Apple	10
1	101	John Doe	P02	Banana	5
2	102	Jane Smith	P03	Cherry	20

The table is now in 1NF, but it still has issues:

Data Redundancy: CustomerName is repeated for each product in an order.
Partial Dependencies: CustomerName depends only on CustomerID (which depends on OrderID), not the full primary key (OrderID, ProductID). ProductName depends only on ProductID.

Second Normal Form (2NF): Remove Partial Dependencies

Rule: The table must be in 1NF, and every non-key attribute must be fully dependent on the entire primary key. This rule applies to tables with composite primary keys.

We split the Orders_1NF table to remove these partial dependencies.

OrderItems Table: Contains information specific to each product in an order. (Primary Key: (OrderID, ProductID))

OrderID ProductID Quantity
1 P01 10
1 P02 5
2 P03 20
Orders Table: Contains information about the order itself. (Primary Key: OrderID)

OrderID CustomerID CustomerName
1 101 John Doe
2 102 Jane Smith
Products Table: Contains product information. (Primary Key: ProductID)

ProductID ProductName
P01 Apple
P02 Banana
P03 Cherry

OrderID	ProductID	Quantity
1	P01	10
1	P02	5
2	P03	20

OrderID	CustomerID	CustomerName
1	101	John Doe
2	102	Jane Smith

ProductID	ProductName
P01	Apple
P02	Banana
P03	Cherry

Now we are in 2NF, but there's still a problem in the Orders table:

Transitive Dependency: CustomerName depends on CustomerID, which is not the primary key. The dependency is OrderID -> CustomerID -> CustomerName.

Third Normal Form (3NF): Remove Transitive Dependencies

Rule: The table must be in 2NF, and there should be no transitive dependencies (where a non-key attribute depends on another non-key attribute).

We split the Orders table to remove the transitive dependency.

Customers Table: (Primary Key: CustomerID)

CustomerID CustomerName
101 John Doe
102 Jane Smith
Orders_Final Table: (Primary Key: OrderID, Foreign Key: CustomerID)

OrderID CustomerID
1 101
2 102

CustomerID	CustomerName
101	John Doe
102	Jane Smith

OrderID	CustomerID
1	101
2	102

Final 3NF Schema

After normalization, we have four separate, well-structured tables.

Customers


CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    CustomerName VARCHAR(100)
);

Products


CREATE TABLE Products (
    ProductID VARCHAR(10) PRIMARY KEY,
    ProductName VARCHAR(100)
);

Orders


CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

OrderItems


CREATE TABLE OrderItems (
    OrderID INT,
    ProductID VARCHAR(10),
    Quantity INT,
    PRIMARY KEY (OrderID, ProductID),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

This design minimizes redundancy and prevents data anomalies.

What is BCNF?

Boyce-Codd Normal Form (BCNF) is a higher version of 3NF. A table is in BCNF if for every non-trivial functional dependency X -> Y, X is a superkey.

What are the different types of relationships in a database?

There are three types of relationships in a database:

One-to-One: Each record in Table A is related to one and only one record in Table B.
One-to-Many: Each record in Table A can be related to one or more records in Table B.
Many-to-Many: Each record in Table A can be related to one or more records in Table B, and each record in Table B can be related to one or more records in Table A.

What is a data model?

A data model is a conceptual representation of data objects, the associations between different data objects, and the rules. Data models are used to define how data is stored, connected, accessed, and processed within a database management system.

What is the difference between a logical and a physical data model?

Logical Data Model: A high-level design that defines the data elements and their relationships. It is independent of any specific database management system.
Physical Data Model: A low-level design that specifies how the data will be stored in a particular database system. It includes details like table names, column names, data types, and constraints.

What is an ER diagram?

An Entity-Relationship (ER) diagram is a flowchart that illustrates how "entities" such as people, objects, or concepts relate to each other within a system. ER diagrams are most often used to design or debug relational databases.

What is the purpose of normalization?

The main purpose of normalization is to:

Minimize data redundancy.
Avoid data modification issues (insertion, deletion, and update anomalies).
Simplify queries.
Improve database performance.

What is a composite key?

A composite key is a primary key that consists of two or more columns. Each column in the composite key may not be unique by itself, but the combination of the columns is unique.

What is an attribute?

An attribute is a property or characteristic of an entity. In a relational database, attributes are represented by columns in a table.

What is a relationship?

A relationship is an association between two or more entities. In a relational database, relationships are established by using foreign keys.

What is an index? What are the different types of indexes?

An index is a special lookup table that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.

Creating an index on a column creates another data structure which holds the column value, and a pointer to the record it relates to. This index structure is then sorted, allowing for fast searches.

Types of Indexes:

Clustered Index: Determines the physical order of data in a table. When you create a PRIMARY KEY, a clustered index is often created by default on that column. A table can have only one clustered index.
Non-Clustered Index: Has a structure separate from the data rows, like an index in a book. The index contains pointers to the data rows. A table can have multiple non-clustered indexes.
```
-- Creates a simple non-clustered index on the LastName column
CREATE INDEX IX_Employees_LastName ON Employees(LastName);
```
Unique Index: Ensures that the indexed column does not contain duplicate values. Both clustered and non-clustered indexes can be unique.
```
-- Ensures that no two employees have the same email
CREATE UNIQUE INDEX UQ_Employees_Email ON Employees(Email);
```

Composite Index: An index on two or more columns of a table. The order of columns in the index matters.


-- Creates an index on both LastName and FirstName
CREATE INDEX IX_Employees_LastName_FirstName ON Employees(LastName, FirstName);

Covering Index: An index that includes all the columns required to satisfy a query, so the database doesn't have to look up the actual table data. This is a performance optimization technique. (See example below).

What is the difference between a clustered and a non-clustered index?

Clustered Index: The leaf nodes of a clustered index contain the data pages of the table. This is why there can be only one clustered index per table.
Non-Clustered Index: The leaf nodes of a non-clustered index do not contain the data pages. Instead, they contain pointers to the data rows.

What is query optimization?

Query optimization is the process of choosing the most efficient way to execute a SQL statement. The database's query optimizer generates several execution plans for a query and selects the one with the lowest estimated cost.

What is an execution plan?

An execution plan is a sequence of steps that the database's query optimizer generates to access data in the database. It shows how the database will execute a query, including the order in which tables are accessed, the join methods used, and the access methods (e.g., index scan, index seek).

What is the difference between an index scan and an index seek?

Index Scan (or Table Scan): If you run a query like SELECT * FROM Employees WHERE LastName = 'Smith'; and there is no index on the LastName column, the database has no choice but to check every single row in the table to see if the last name is 'Smith'. This is called a full table scan. It's the slowest way to find data, like reading a book from cover to cover to find a single word. If there is an index, but it's not selective enough, the optimizer might still choose to scan the entire index, which is an index scan.
Index Seek: Now, let's say you create an index on the LastName column.
```
CREATE INDEX IX_Employees_LastName ON Employees(LastName);
```
When you run the same query again, the database's query optimizer will likely use the index. It will perform an index seek, which uses the sorted B-tree structure of the index to navigate directly to the rows where LastName is 'Smith'. This is incredibly fast and efficient, like using the index at the back of a book.

Operation	Analogy	Performance
Table Scan	Reading a book from the first page to the last	Very Slow
Index Seek	Using the index at the back of a book	Very Fast

What is a covering index?

A covering index is a special type of index that contains all the columns needed to satisfy a query's SELECT, JOIN, and WHERE clauses. Because the index itself contains all the necessary data, the database doesn't need to perform an extra lookup into the actual table data.

Example: Consider this query:


SELECT EmployeeID, LastName, FirstName FROM Employees WHERE LastName = 'Jones';

Standard Index: If you only have an index on LastName, the database will perform an index seek to find all the 'Jones' records. However, it then needs to perform an additional step (a "key lookup" or "bookmark lookup") to go back to the table to retrieve the EmployeeID and FirstName for each of those rows.
Covering Index: To optimize this, you can create a covering index that includes all the required columns.
```
-- The INCLUDE clause adds non-key columns to the index
CREATE INDEX IX_Covering_LastName
ON Employees(LastName)
INCLUDE (EmployeeID, FirstName);
```
Now, when you run the query, the database can get everything it needs (EmployeeID, LastName, FirstName) directly from the index pages. It never has to touch the main table data, which can lead to a significant performance boost.

What is the impact of indexes on `INSERT`, `UPDATE`, and `DELETE` statements on indexes?

Indexes can slow down INSERT, UPDATE, and DELETE operations because the database has to update the indexes as well as the table data. Therefore, it's important to have a balance between the number of indexes on a table and the performance of data modification operations.

What are statistics in the context of databases?

Statistics are objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view. The query optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result.

What is parameter sniffing?

Parameter sniffing is a feature of SQL Server where the query optimizer "sniffs" the initial values of parameters in a stored procedure and creates an execution plan based on those values. This can sometimes lead to performance problems if the initial parameter values are not representative of the typical values.

What are some common ways to optimize a query?

Create indexes on columns used in WHERE clauses and join conditions.
Avoid using SELECT *. Instead, specify the columns you need.
Use JOINs instead of subqueries where possible.
Use EXISTS instead of IN for subqueries.
Avoid using functions in WHERE clauses.

What is a filtered index?

A filtered index is a type of non-clustered index that is created on a subset of rows in a table. This can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.

What is a heap?

A heap is a table without a clustered index. Data rows are not stored in any particular order.

What are ACID properties in a transaction?

ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. These are the four properties of a transaction that ensure data integrity.

Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.
Consistency: A transaction must transform the database from one consistent state to another.
Isolation: A transaction in progress and not yet committed must remain isolated from any other transaction.
Durability: The changes of a successfully committed transaction must persist in the database, even in the case of a system failure.

What are the different transaction isolation levels?

Transaction isolation levels define the degree to which one transaction must be isolated from the data modifications made by other transactions. The SQL standard defines four levels:

Read Uncommitted: The lowest level. A transaction can read data that is not yet committed by other transactions ("dirty reads").
Read Committed: Prevents dirty reads. A transaction can only read data that has been committed. This is the default level for many databases (e.g., PostgreSQL, SQL Server).
Repeatable Read: Prevents dirty reads and non-repeatable reads. It ensures that if a transaction reads a row once, it will read the same data if it reads that row again.
Serializable: The highest level. Prevents dirty reads, non-repeatable reads, and phantom reads. It does this by ensuring that transactions execute as if they were running one after another (serially), not concurrently. This level has the highest performance overhead.

Here's a summary of which isolation level prevents which concurrency problem:

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	No	Possible	Possible
Repeatable Read	No	No	Possible
Serializable	No	No	No

Setting the isolation level (example for SQL Server):


SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
-- Your SQL statements here
COMMIT;

What is a transaction?

A transaction is a single logical unit of work that consists of one or more SQL statements. It's treated as an "all or nothing" operation. If any part of the transaction fails, the entire transaction is rolled back, and the database is left in the state it was in before the transaction started.

Think of transferring money from a savings account to a checking account. This requires two operations:

Debit the savings account.
Credit the checking account.

If the debit succeeds but the credit fails, you've lost money. A transaction wraps these two operations into a single atomic unit, ensuring that either both succeed or neither does.

Example:


-- Let's assume we have an Accounts table with AccountID and Balance
BEGIN TRANSACTION;

-- 1. Debit $100 from savings (AccountID 1)
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

-- 2. Credit $100 to checking (AccountID 2)
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;

-- Check for any errors. If everything is okay, commit the transaction.
-- If there was an error, we would issue a ROLLBACK.
COMMIT;

-- To undo the changes before committing:
-- ROLLBACK;

What is a deadlock and how can it be prevented?

A deadlock is a situation where two or more transactions are waiting for each other to release locks, causing a stalemate. Deadlocks can be prevented by:

Ensuring that all transactions access tables in the same order.
Using a lower isolation level.
Keeping transactions as short as possible.

What is locking?

Locking is a mechanism used by database management systems to protect the integrity of data. When a transaction accesses a piece of data, it can acquire a lock on that data to prevent other transactions from modifying it.

What are the different types of locks?

Shared Lock (S): Allows multiple transactions to read the same data.
Exclusive Lock (X): Prevents other transactions from reading or writing to the locked data.
Update Lock (U): A combination of a shared and an exclusive lock. It is used to prevent deadlocks.

What is a dirty read?

A dirty read occurs when a transaction reads data that has been modified by another transaction but not yet committed. This is problematic because the first transaction might roll back its changes, meaning the data that was read is now invalid.

Scenario:

Transaction A starts and updates a product's price from $50 to $75.
Transaction B reads the product's price as $75.
Transaction A rolls back its change.
Transaction B is now operating with "dirty" data. The price is actually still $50.

This is prevented by the Read Committed isolation level and higher.

What is a non-repeatable read?

A non-repeatable read occurs when a transaction reads the same row twice but gets different data each time because another transaction committed an update in between the two reads.

Scenario:

Transaction A reads a product's stock level as 10.
Transaction B updates the stock level for that product to 5 and commits.
Transaction A reads the stock level again and now sees 5.
The value that Transaction A read the first time is not "repeatable", which can cause consistency issues within the transaction.

This is prevented by the Repeatable Read isolation level and higher.

What is a phantom read?

A phantom read occurs when a transaction runs a query twice and the second result includes new rows that weren't there the first time. This happens when another transaction inserts new rows that match the query's WHERE clause and commits.

Scenario:

Transaction A runs a query: SELECT COUNT(*) FROM Employees WHERE Department = 'Sales'; It gets a result of 15.
Transaction B inserts a new employee into the Sales department and commits.
Transaction A runs the same COUNT(*) query again. It now gets a result of 16.
The new row that appeared is called a "phantom" row.

This is the most difficult concurrency issue to prevent and is handled by the Serializable isolation level.

What is the difference between optimistic and pessimistic locking?

Optimistic Locking: Assumes that multiple transactions can complete without affecting each other. It does not lock the data when it is read. Instead, it checks for conflicts when the transaction is committed.
Pessimistic Locking: Assumes that conflicts will occur. It locks the data as soon as it is read, preventing other transactions from modifying it.

What is a `SAVEPOINT` used for?

A SAVEPOINT is a mechanism that allows you to roll back a transaction to a specific point without rolling back the entire transaction. This is useful for implementing complex logic where you might need to undo parts of the transaction based on certain conditions.

Example: Imagine you are inserting records for a new order. You insert the main order record, and then several order items. If one of the items is invalid, you only want to undo the insertion of that specific item, not the whole order.


BEGIN TRANSACTION;

-- Insert the main order record
INSERT INTO Orders (OrderID, CustomerID) VALUES (101, 77);

-- Create a savepoint before adding items
SAVEPOINT BeforeItems;

-- Try to insert the first item
INSERT INTO OrderItems (OrderID, ProductID, Quantity) VALUES (101, 'PROD-A', 10);

-- Let's say the second item is invalid (e.g., 'PROD-X' does not exist)
-- An error would occur here. Let's simulate catching it.

-- If an error is detected, roll back to the savepoint
ROLLBACK TO BeforeItems;

-- You can now continue the transaction, perhaps trying a different item
-- or just committing the main order record.

-- Commit the valid parts of the transaction
COMMIT;

What is a two-phase commit?

A two-phase commit is a protocol used to ensure that all participating databases in a distributed transaction either all commit or all roll back. It consists of two phases:

Phase 1 (Prepare): The transaction coordinator asks all participating databases to prepare to commit.
Phase 2 (Commit): If all databases respond that they are prepared, the coordinator tells them to commit. Otherwise, it tells them to roll back.

What is the difference between an implicit and an explicit transaction?

Implicit Transaction: A new transaction is automatically started after the previous transaction is committed or rolled back.
Explicit Transaction: A transaction is started explicitly with a BEGIN TRANSACTION statement and ended with a COMMIT or ROLLBACK statement.

What is a subquery?

A subquery, also known as an inner query or nested query, is a query within another SQL query and embedded inside the WHERE clause.

What is the difference between a correlated and a non-correlated subquery?

Non-Correlated Subquery: The inner query is independent of the outer query and can be executed on its own.
Correlated Subquery: The inner query depends on the outer query for its values. It is executed once for each row processed by the outer query.

What is a Common Table Expression (CTE)?

A Common Table Expression (CTE) is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs are defined using the WITH clause and are used to break down complex queries into simpler, more readable logical blocks. A CTE is only valid in the query that it is defined for.

Example: Let's find all employees who earn more than the average salary of their respective departments.


-- Sample Data:
-- Employees (ID, Name, Department, Salary)
-- Departments (ID, Name)

WITH DepartmentAvgSalary AS (
    -- First, calculate the average salary for each department
    SELECT Department, AVG(Salary) as AvgSalary
    FROM Employees
    GROUP BY Department
)
-- Then, join the CTE back to the Employees table
SELECT e.Name, e.Salary, d.AvgSalary
FROM Employees e
JOIN DepartmentAvgSalary d ON e.Department = d.Department
WHERE e.Salary > d.AvgSalary;

This makes the logic much clearer than nesting subqueries.

What are window functions?

Window functions perform a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row.

What is a recursive CTE?

A recursive CTE is a common table expression (CTE) that references itself. It is used to query hierarchical data, such as organizational charts, bill of materials, or file system trees.

A recursive CTE has three parts:

Anchor Member: The initial query that seeds the recursion. It's the base case.
UNION ALL: The operator that combines the anchor with the recursive member.
Recursive Member: The query that references the CTE itself. This part is executed repeatedly until it returns no more rows.

Example: Let's find the entire reporting chain for an employee named 'Alice'. Assume an Employees table: (EmployeeID, Name, ManagerID)


WITH RECURSIVE EmployeeHierarchy AS (
    -- 1. Anchor Member: Select the starting employee (Alice)
    SELECT EmployeeID, Name, ManagerID, 0 as Level
    FROM Employees
    WHERE Name = 'Alice'

    UNION ALL

    -- 2. Recursive Member: Join Employees to the CTE
    SELECT e.EmployeeID, e.Name, e.ManagerID, eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT * FROM EmployeeHierarchy;

Note: Some SQL dialects use WITH RECURSIVE, while others (like SQL Server) do not require the RECURSIVE keyword.

What are the `RANK()`, `DENSE_RANK()`, and `ROW_NUMBER()` window functions?

These are all window functions used for ranking rows, but they handle ties differently. Let's see them in action.

Scenario: Rank employees by salary within each department. Assume an Employees table with Name, Department, Salary.

Query:


SELECT
    Name,
    Department,
    Salary,
    RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) as RankNum,
    DENSE_RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) as DenseRankNum,
    ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) as RowNum
FROM Employees;

Result: Let's look at the 'Sales' department from the results.

| Name    | Department | Salary | RankNum | DenseRankNum | RowNum |
|---------|------------|--------|---------|--------------|--------|
| Alice   | Sales      | 90000  | 1       | 1            | 1      |
| Bob     | Sales      | 85000  | 2       | 2            | 2      |
| Charlie | Sales      | 85000  | 2       | 2            | 3      |
| David   | Sales      | 70000  | 4       | 3            | 4      |

Explanation:

ROW_NUMBER(): Assigns a unique, sequential number to each row. It doesn't care about ties. (1, 2, 3, 4)
RANK(): Assigns the same rank to tied rows. It then skips the next rank(s). Notice how it goes from rank 2 to rank 4 because two employees are tied for 2nd place. (1, 2, 2, 4)
DENSE_RANK(): Also assigns the same rank to tied rows, but it does not skip any ranks. The next rank after the tie is 3. (1, 2, 2, 3)

What are the `LEAD()` and `LAG()` window functions?

LEAD() and LAG() are window functions that give you access to a row at a specific physical offset which comes before or after the current row.

LAG(): "Looks back" at the previous row.
LEAD(): "Looks forward" to the next row.

Scenario: For each sale, show the sale amount from the previous and next sale for that same customer. Assume a Sales table with Customer, SaleDate, SaleAmount.

Query:


SELECT
    Customer,
    SaleDate,
    SaleAmount,
    LAG(SaleAmount, 1, 0) OVER (PARTITION BY Customer ORDER BY SaleDate) as PreviousSale,
    LEAD(SaleAmount, 1, 0) OVER (PARTITION BY Customer ORDER BY SaleDate) as NextSale
FROM Sales;

The PARTITION BY clause treats each customer's sales history independently. The 1 is the offset (how many rows to look back/forward), and the 0 is the default value if no previous/next row exists.

Result:

| Customer | SaleDate   | SaleAmount | PreviousSale | NextSale |
|----------|------------|------------|--------------|----------|
| Alice    | 2023-01-15 | 100        | 0            | 150      |
| Alice    | 2023-02-10 | 150        | 100          | 120      |
| Alice    | 2023-03-05 | 120        | 150          | 0        |
| Bob      | 2023-01-20 | 200        | 0            | 250      |
| Bob      | 2023-02-25 | 250        | 200          | 0        |

This is very useful for calculating period-over-period changes, like month-over-month growth.

What is a pivot?

Pivoting is an operation that transforms data from a row-level representation to a columnar representation. It "rotates" rows into columns. For example, you might want to turn a list of monthly sales into a single row with a column for each month.

While some databases have a PIVOT keyword (e.g., SQL Server), a more standard and portable way to do this is with conditional aggregation using CASE statements.

Scenario: Transform a list of quarterly product sales into a summary table. Assume a QuarterlySales table: (Product, Quarter, Sales)

Original Data:

| Product  | Quarter | Sales |
|----------|---------|-------|
| Apple    | Q1      | 1000  |
| Apple    | Q2      | 1200  |
| Banana   | Q1      | 800   |
| Banana   | Q2      | 900   |

Query using Conditional Aggregation:


SELECT
    Product,
    SUM(CASE WHEN Quarter = 'Q1' THEN Sales ELSE 0 END) AS Q1_Sales,
    SUM(CASE WHEN Quarter = 'Q2' THEN Sales ELSE 0 END) AS Q2_Sales,
    SUM(CASE WHEN Quarter = 'Q3' THEN Sales ELSE 0 END) AS Q3_Sales,
    SUM(CASE WHEN Quarter = 'Q4' THEN Sales ELSE 0 END) AS Q4_Sales
FROM QuarterlySales
GROUP BY Product;

Pivoted Result:

| Product  | Q1_Sales | Q2_Sales | Q3_Sales | Q4_Sales |
|----------|----------|----------|----------|----------|
| Apple    | 1000     | 1200     | 0        | 0        |
| Banana   | 800      | 900      | 0        | 0        |

What is an unpivot?

An unpivot is the reverse operation of a pivot. It transforms data from a columnar representation back to a row-level representation.

Scenario: Let's unpivot the result from the previous example back to its original format.

Query using UNION ALL: This is one of the most common ways to unpivot data in a standard way.


SELECT Product, 'Q1' as Quarter, Q1_Sales as Sales FROM PivotedSales
UNION ALL
SELECT Product, 'Q2' as Quarter, Q2_Sales as Sales FROM PivotedSales
UNION ALL
SELECT Product, 'Q3' as Quarter, Q3_Sales as Sales FROM PivotedSales
UNION ALL
SELECT Product, 'Q4' as Quarter, Q4_Sales as Sales FROM PivotedSales
WHERE Sales > 0; -- Optional: to remove zero-value rows

This would return the data to its original, normalized, row-based format.

What are user-defined functions (UDFs)?

A user-defined function (UDF) is a routine that you can define and then call from within SQL queries, just like a built-in function. UDFs are typically used for encapsulating custom calculations or formatting logic.

Example (T-SQL): This function takes a price and a discount percentage and returns the final discounted price.


CREATE FUNCTION dbo.GetDiscountedPrice(@price DECIMAL(10, 2), @discount DECIMAL(3, 2))
RETURNS DECIMAL(10, 2)
AS
BEGIN
    RETURN @price * (1 - @discount);
END;

Usage:


SELECT ProductName, Price, dbo.GetDiscountedPrice(Price, 0.10) AS DiscountedPrice
FROM Products;

What are stored procedures?

A stored procedure is a set of SQL statements that are stored in the database as an object. It can be called by applications or users to perform a specific task. They are used to encapsulate and centralize business logic, improve performance, and enhance security.

Example (T-SQL): This stored procedure retrieves all employees from a specific department.


CREATE PROCEDURE dbo.GetEmployeesByDepartment @DepartmentName VARCHAR(100)
AS
BEGIN
    SET NOCOUNT ON;
    SELECT EmployeeID, Name, Salary
    FROM Employees
    WHERE Department = @DepartmentName;
END;

Usage:


EXEC dbo.GetEmployeesByDepartment @DepartmentName = 'Sales';

What is the difference between a UDF and a stored procedure?

Feature	Stored Procedure	User-Defined Function (UDF)
Invocation	Can be called independently (`EXEC`)	Must be used in an SQL statement (`SELECT`)
Return Value	Can return multiple result sets, or none	Must return a single value (scalar or table)
Transaction	Can contain its own transaction (`BEGIN`/`COMMIT`)	Cannot contain its own transaction
Modification	Can modify the database state (`INSERT`, `UPDATE`)	Generally cannot modify the database state
Primary Use	Executing business logic	Reusing calculations or formatting

What is SQL injection and how can it be prevented?

SQL injection is a web security vulnerability that allows an attacker to interfere with the queries that an application makes to its database. It generally allows an attacker to view data that they are not normally able to retrieve.

The Vulnerability: It happens when application code constructs SQL queries by concatenating user input directly into the query string.

Vulnerable Code Example (in Python):


# User input is directly inserted into the query string
user_id = "105 OR 1=1"
query = "SELECT * FROM Users WHERE UserID = " + user_id

# The resulting SQL is:
# SELECT * FROM Users WHERE UserID = 105 OR 1=1
# This will return ALL users, not just one!

Prevention: Use Parameterized Queries (Prepared Statements) This is the most effective way to prevent SQL injection. The database treats the user input as data, not as executable code.

Secure Code Example (in Python with psycopg2):


# The '?' or '%s' is a placeholder. The value is sent separately.
user_id = "105 OR 1=1"
query = "SELECT * FROM Users WHERE UserID = %s"

# The database driver safely handles the parameter.
cursor.execute(query, (user_id,))

# The database sees the query and the data separately.
# It will correctly (and safely) look for a user with the literal ID "105 OR 1=1" and find nothing.

Other prevention methods include using stored procedures and validating/sanitizing user input, but parameterized queries are the primary defense.

What is the `NTILE()` window function?

The NTILE(n) function divides the rows in a partition into n groups, as equal in size as possible, and assigns an integer from 1 to n to each group.

What are the different types of NoSQL databases?

Document Databases: Store data in documents, similar to JSON objects. (e.g., MongoDB)
Key-Value Stores: Store data as a collection of key-value pairs. (e.g., Redis)
Column-Family Stores: Store data in columns rather than rows. (e.g., Cassandra)
Graph Databases: Store data in nodes and edges, representing relationships between entities. (e.g., Neo4j)

What is the CAP theorem?

The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

What are the advantages of NoSQL databases over SQL databases?

High scalability: NoSQL databases can be scaled horizontally, which is cheaper and more flexible than vertical scaling.
Flexible data models: NoSQL databases do not require a predefined schema, which makes it easier to store unstructured and semi-structured data.
High availability: NoSQL databases are typically designed to be highly available and fault-tolerant.
Big data: NoSQL databases are well-suited for handling large volumes of data.

What are the disadvantages of NoSQL databases?

Less mature: NoSQL databases are not as mature as SQL databases, and there are fewer tools and resources available.
Lack of standardization: There is no standard query language for NoSQL databases, which can make it difficult to switch between different databases.
Eventual consistency: Many NoSQL databases provide eventual consistency, which may not be suitable for all applications.
ACID transactions: Not all NoSQL databases support ACID transactions.

What is BASE consistency?

BASE (Basically Available, Soft state, Eventual consistency) is a consistency model used by many NoSQL databases.

Basically Available: The system is guaranteed to be available.
Soft state: The state of the system may change over time, even without input.
Eventual consistency: The system will eventually become consistent once it stops receiving input.

What is eventual consistency?

Eventual consistency is a consistency model used in distributed systems that guarantees that if no new updates are made to a given data item, all accesses to that item will eventually return the last updated value.

What is a document store?

A document store is a type of NoSQL database that stores data in documents, similar to JSON objects. Each document can have a different structure. (e.g., MongoDB)

What is a key-value store?

A key-value store is a type of NoSQL database that stores data as a collection of key-value pairs. It is the simplest type of NoSQL database. (e.g., Redis)

What is a column-family store?

A column-family store is a type of NoSQL database that stores data in columns rather than rows. This is efficient for queries that only need to access a subset of columns. (e.g., Cassandra)

What is a graph database?

A graph database is a type of NoSQL database that stores data in nodes and edges, representing relationships between entities. It is well-suited for applications that involve complex relationships, such as social networks and recommendation engines. (e.g., Neo4j)

When should you use a NoSQL database?

When you need to store large volumes of unstructured or semi-structured data.
When you need high scalability and availability.
When you need a flexible schema.
When you are building a real-time application.

When should you use a SQL database?

When you need to store structured data with a predefined schema.
When you need ACID transactions.
When you need to perform complex queries.
When you need a mature and well-supported database.

100

What are triggers?

Triggers are special stored procedures that automatically execute (or "fire") in response to specific events in a database. They cannot be directly called or executed - they only run when their triggering event occurs.

Types of Triggers:

DML Triggers: Fire in response to INSERT, UPDATE, or DELETE statements
DDL Triggers: Fire in response to CREATE, ALTER, or DROP statements
Logon Triggers: Fire in response to user login events

101

What is the difference between `BEFORE` and `AFTER` triggers?

BEFORE Triggers: Execute before the triggering event. Can modify the data before it's committed
AFTER Triggers: Execute after the triggering event. Often used for logging and auditing

102

What are cursors?

Cursors are database objects that allow you to process rows individually in a result set. They provide a way to iterate through query results row by row.


DECLARE cursor_name CURSOR FOR
SELECT column1, column2 FROM table_name;
OPEN cursor_name;
FETCH NEXT FROM cursor_name INTO @variable1, @variable2;
CLOSE cursor_name;
DEALLOCATE cursor_name;

103

What is the difference between OLTP and OLAP?

Feature	OLTP (Online Transaction Processing)	OLAP (Online Analytical Processing)
Purpose	Handle day-to-day transactions	Support analytical queries and reporting
Data Volume	Large number of short transactions	Fewer, complex queries on large datasets
Schema	Normalized (3NF)	Denormalized (star/snowflake schema)
Response Time	Milliseconds	Seconds to minutes
Example	Banking transactions, order processing	Data warehousing, business intelligence

104

What is database partitioning?

Partitioning is the process of dividing a large table into smaller, more manageable pieces while maintaining the logical structure of the table.

Types of Partitioning:

Horizontal Partitioning: Divides rows based on specific criteria
Vertical Partitioning: Divides columns into separate tables
Range Partitioning: Based on value ranges
Hash Partitioning: Based on hash function results
List Partitioning: Based on specific values

105

What is database replication?

Replication is the process of copying and maintaining database objects in multiple databases. It ensures data availability and improves performance.

Types of Replication:

Master-Slave: One master database with read-only replicas
Master-Master: Multiple writable database instances
Snapshot Replication: Periodic full data copies
Transactional Replication: Real-time transaction copying

106

What is connection pooling?

Connection pooling is a technique used to maintain a cache of database connections that can be reused across multiple requests. This improves performance by avoiding the overhead of establishing new connections.

107

What are the different types of backups?

Full Backup: Complete copy of the entire database
Differential Backup: Changes since the last full backup
Incremental Backup: Changes since the last backup (full or incremental)
Transaction Log Backup: Captures transaction log records

108

What is a data warehouse?

A data warehouse is a centralized repository that stores integrated data from multiple sources for analytical and reporting purposes. It's optimized for query and analysis rather than transaction processing.

109

What are temporary tables?

Temporary tables are tables that exist temporarily during a database session. They are automatically dropped when the session ends.


-- Local temporary table (SQL Server)
CREATE TABLE #TempTable (
    ID INT,
    Name VARCHAR(50)
);

-- Global temporary table (SQL Server)
CREATE TABLE ##GlobalTempTable (
    ID INT,
    Name VARCHAR(50)
);

110

What is the difference between temporary tables and table variables?

Feature	Temporary Tables	Table Variables
Scope	Session-wide	Batch/procedure scope
Indexes	Can create indexes	Limited indexing
Statistics	Auto-generated	Not generated
Memory	Can spill to disk	Memory-optimized
Syntax	`#TableName`	`@TableName`

111

What are the common SQL data types?

Numeric: INT, BIGINT, DECIMAL, FLOAT, NUMERIC
Character: CHAR, VARCHAR, NCHAR, NVARCHAR, TEXT
Date/Time: DATE, TIME, DATETIME, TIMESTAMP
Binary: BINARY, VARBINARY, BLOB
Boolean: BIT, BOOLEAN
JSON: JSON (in newer database systems)

112

What are the main database security threats?

SQL Injection: Malicious SQL code injection through user input
Privilege Escalation: Unauthorized access to higher-level permissions
Data Breaches: Unauthorized access to sensitive data
Insider Threats: Malicious activities by authorized users
Weak Authentication: Poor password policies and authentication mechanisms

113

What is role-based access control (RBAC)?

RBAC is a security model where permissions are assigned to roles, and users are assigned to roles. This simplifies permission management and follows the principle of least privilege.


-- Create role
CREATE ROLE sales_team;

-- Grant permissions to role
GRANT SELECT, INSERT ON customers TO sales_team;

-- Assign user to role
GRANT sales_team TO john_doe;

114

What is database encryption?

Database encryption protects data by converting it into a coded format that can only be read with the appropriate decryption key.

Types:

Transparent Data Encryption (TDE): Encrypts data at rest
Column-Level Encryption: Encrypts specific sensitive columns
Transport Encryption: Encrypts data in transit (SSL/TLS)

115

What is database auditing?

Database auditing is the process of monitoring and recording database activities to ensure compliance and detect security breaches.

Audit Events:

Login attempts
Data access and modifications
Schema changes
Permission changes
Failed operations

116

What are database views used for in security?

Views can be used to:

Restrict access to specific columns or rows
Hide complex table structures
Implement row-level security
Provide abstraction layers for sensitive data

117

How would you find the Nth highest salary from a table?

This is a classic problem that can be solved efficiently using a CTE with a window function like DENSE_RANK(). DENSE_RANK() is a good choice because it handles ties correctly without skipping ranks.


-- The 'N' should be replaced with the desired rank, e.g., 3 for the 3rd highest.
DECLARE @N INT = 3;

WITH SalaryRank AS (
    SELECT
        Salary,
        DENSE_RANK() OVER (ORDER BY Salary DESC) as SalaryRank
    FROM Employees
)
SELECT Salary
FROM SalaryRank
WHERE SalaryRank = @N;

Note: The syntax for declaring a variable (DECLARE @N) is for T-SQL (SQL Server). In PostgreSQL or MySQL, you would typically substitute the value directly into the query or use a session variable.

118

How would you find duplicate emails in a customer table?

The most straightforward way is to use GROUP BY on the email column and then use a HAVING clause to filter for groups with a count greater than one.


SELECT Email, COUNT(Email) as NumOccurrences
FROM Customers
GROUP BY Email
HAVING COUNT(Email) > 1;

119

Given two tables, `Customers` and `Orders`, write a query to find all customers who have never placed an order.

The best approach is to use a LEFT JOIN from Customers to Orders and then filter for rows where the Orders side is NULL. This indicates the customer had no matching orders.


SELECT c.CustomerName
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderID IS NULL;

An alternative using NOT EXISTS can also be efficient:


SELECT c.CustomerName
FROM Customers c
WHERE NOT EXISTS (
    SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID
);

120

Write a query to find the second highest salary in the `Employee` table.

While you can solve this by modifying the Nth salary query, a common method is to use OFFSET and FETCH.

Standard SQL (PostgreSQL, Oracle, SQL Server 2012+):


SELECT Salary
FROM Employees
ORDER BY Salary DESC
OFFSET 1 ROW
FETCH NEXT 1 ROW ONLY;

MySQL / PostgreSQL LIMIT clause:


SELECT Salary
FROM Employees
ORDER BY Salary DESC
LIMIT 1 OFFSET 1;

Using a Subquery (works on most systems):


SELECT MAX(Salary)
FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);

121

How do you find all employees who have the same salary?

A self-join is the classic approach, where you join the table to itself on the Salary column, making sure you don't match an employee with themselves.


SELECT DISTINCT e1.EmployeeName, e1.Salary
FROM Employees e1
JOIN Employees e2 ON e1.Salary = e2.Salary AND e1.EmployeeID <> e2.EmployeeID;

A more modern approach uses a window function, which can be more efficient as it avoids a join.


WITH SalaryCounts AS (
    SELECT
        EmployeeName,
        Salary,
        COUNT(*) OVER (PARTITION BY Salary) as Cnt
    FROM Employees
)
SELECT EmployeeName, Salary
FROM SalaryCounts
WHERE Cnt > 1;

122

How do you find the department with the highest number of employees?

You can use GROUP BY with COUNT(), ORDER BY, and then select the top result. The syntax for "top result" varies.

PostgreSQL / MySQL:


SELECT Department, COUNT(*) as NumberOfEmployees
FROM Employees
GROUP BY Department
ORDER BY NumberOfEmployees DESC
LIMIT 1;

SQL Server:


SELECT TOP 1 Department, COUNT(*) as NumberOfEmployees
FROM Employees
GROUP BY Department
ORDER BY NumberOfEmployees DESC;

123

Write a query to get the top 5 employees who have the highest salary?

This is another "top-N" query. Again, the syntax varies by database.

PostgreSQL / MySQL LIMIT:


SELECT EmployeeName, Salary
FROM Employees
ORDER BY Salary DESC
LIMIT 5;

SQL Server TOP:


SELECT TOP 5 EmployeeName, Salary
FROM Employees
ORDER BY Salary DESC;

Standard SQL FETCH:


SELECT EmployeeName, Salary
FROM Employees
ORDER BY Salary DESC
FETCH FIRST 5 ROWS ONLY;

124

You have a table of employees and their managers. How would you find all employees who do not have a manager?

This usually means finding rows where the ManagerID column is NULL. This often corresponds to the highest-level person in the hierarchy (e.g., the CEO).


SELECT EmployeeName
FROM Employees
WHERE ManagerID IS NULL;

125

How would you select all records from a table that have a duplicate value in a specific column?

You can use a window function like COUNT() to count occurrences of a value within its partition, and then select rows where that count is greater than 1.


WITH DuplicateCheck AS (
    SELECT
        ColumnName,
        -- The window function counts how many rows share the same ColumnName value
        COUNT(*) OVER (PARTITION BY ColumnName) as DuplicateCount
    FROM TableName
)
SELECT *
FROM DuplicateCheck
WHERE DuplicateCount > 1;

126

You have a table with a `StartDate` and `EndDate`. How would you find all records that fall within a given date range?

The BETWEEN operator is perfect for this and is inclusive of the start and end dates.


SELECT *
FROM TableName
WHERE EventDate BETWEEN '2023-01-01' AND '2023-01-31';

This is equivalent to, but more readable than:


SELECT *
FROM TableName
WHERE EventDate >= '2023-01-01' AND EventDate <= '2023-01-31';

127

How would you write a query to find employees who earn more than their manager?

This is a classic use case for a self-join. You join the Employees table to itself, aliasing one as the "employee" and the other as the "manager".


SELECT
    e.EmployeeName,
    e.Salary,
    m.EmployeeName as ManagerName,
    m.Salary as ManagerSalary
FROM Employees e
-- Join the table to itself
JOIN Employees m ON e.ManagerID = m.EmployeeID
WHERE e.Salary > m.Salary;

128

How would you find the running total of sales for each month?

A window function with SUM() is the ideal tool for this. It allows you to sum up values over a specified window of rows.


SELECT 
    Month,
    Sales,
    -- Sum the sales from the beginning of the partition up to the current row
    SUM(Sales) OVER (ORDER BY Month ROWS UNBOUNDED PRECEDING) as RunningTotal
FROM MonthlySales
ORDER BY Month;

The ROWS UNBOUNDED PRECEDING part defines the window frame: from the very first row to the current row.

129

How would you pivot data to show products as columns and months as rows?

The most portable way is to use conditional aggregation with the CASE statement. While some databases have a proprietary PIVOT keyword, this method works on almost any SQL database.


-- Using conditional aggregation
SELECT 
    Month,
    SUM(CASE WHEN Product = 'ProductA' THEN Sales ELSE 0 END) as ProductA,
    SUM(CASE WHEN Product = 'ProductB' THEN Sales ELSE 0 END) as ProductB,
    SUM(CASE WHEN Product = 'ProductC' THEN Sales ELSE 0 END) as ProductC
FROM Sales
GROUP BY Month;

130

What is the difference between SQL and NoSQL databases?

Feature	SQL (Relational)	NoSQL (Non-relational)
Data Model	Structured (tables with rows and columns)	Unstructured (documents, key-value, graph)
Schema	Predefined schema (static)	Dynamic schema
Scalability	Vertical scaling (scale-up)	Horizontal scaling (scale-out)
Consistency	ACID (Atomicity, Consistency, Isolation, Durability)	BASE (Basically Available, Soft state, Eventual consistency)
Examples	MySQL, PostgreSQL, Oracle	MongoDB, Cassandra, Redis

Popular Tags

Top Database and SQL Interview Questions

What is a primary key?

What is a foreign key?

What is the difference between a `PRIMARY KEY` and a `UNIQUE KEY`?

What is a database?

What is a DBMS?

What is a table?

What is a view?

What is a schema?

What is the difference between a table and a view?

What is the difference between SQL and MySQL?

What are constraints?

What is data integrity?

What are the different types of SQL commands?

What is the difference between `DELETE`, `TRUNCATE`, and `DROP` commands?

What is the difference between WHERE and HAVING clauses?

What is the `GROUP BY` clause and how is it used?

What is the `ORDER BY` clause used for?

What is the `MERGE` statement?

What is the purpose of the `SAVEPOINT` command in TCL?

What is the difference between `GRANT` and `REVOKE`?

What is the `ALTER` command used for?

What is the difference between `UNION` and `INTERSECT`?

What is the `LIKE` operator used for?

What is the `CASE` statement?

What are SQL joins? Explain the different types of joins.

INNER JOIN

LEFT JOIN (or LEFT OUTER JOIN)

RIGHT JOIN (or RIGHT OUTER JOIN)

FULL OUTER JOIN

CROSS JOIN

What is the difference between `UNION` and `UNION ALL`?

What is a self-join?

What is a natural join?

What is the difference between `INNER JOIN` and `LEFT JOIN`?

What is a `CROSS JOIN` and when would you use it?

What is the `INTERSECT` operator?

What is the `EXCEPT` operator?

Can you join more than two tables?

What is the difference between a join and a subquery?

What are some best practices for writing joins?

What is a `FULL OUTER JOIN`?

What is database normalization?

What is denormalization?

Explain 1NF, 2NF, and 3NF with examples.

First Normal Form (1NF): Eliminate Repeating Groups

Second Normal Form (2NF): Remove Partial Dependencies

Third Normal Form (3NF): Remove Transitive Dependencies

Final 3NF Schema

What is BCNF?

What are the different types of relationships in a database?

What is a data model?

What is the difference between a logical and a physical data model?

What is an ER diagram?

What is the purpose of normalization?

What is a composite key?

What is an attribute?

What is a relationship?

What is an index? What are the different types of indexes?

What is the difference between a clustered and a non-clustered index?

What is query optimization?

What is an execution plan?

What is the difference between an index scan and an index seek?

What is a covering index?

What is the impact of indexes on `INSERT`, `UPDATE`, and `DELETE` statements on indexes?

What are statistics in the context of databases?

What is parameter sniffing?

What are some common ways to optimize a query?

What is a filtered index?

What is a heap?

What are ACID properties in a transaction?

What are the different transaction isolation levels?

What is a transaction?

What is a deadlock and how can it be prevented?

What is locking?

What are the different types of locks?

What is a dirty read?

What is a non-repeatable read?

What is a phantom read?

What is the difference between optimistic and pessimistic locking?

`INNER JOIN`

`LEFT JOIN` (or `LEFT OUTER JOIN`)

`RIGHT JOIN` (or `RIGHT OUTER JOIN`)

`FULL OUTER JOIN`

`CROSS JOIN`