---
title: DBMS Interview Questions and Answers (2026): Normalization, ACID & Indexing
description: DBMS interview questions and answers for 2026 — normalization, keys, ACID, transactions, isolation levels, deadlocks, indexing and joins — with worked examples and what each one is testing.
url: https://usegreenroom.app/blog/dbms-interview-questions
last_updated: 2026-06-21
---

← Back to blog

Technical

# DBMS interview questions and answers

June 21, 2026 · 33 min read

![DBMS interview questions and answers — cover from Greenroom, the AI mock interviewer](/assets/blog/dbms-interview-questions-hero.webp)

It's 11:40 AM at a TCS mass walk-in drive in a Pune business park, and you are candidate number 214 of roughly 600, which means you have been awake since 5 AM, you have eaten exactly one vending-machine samosa, and you have now explained your "tell me about yourself" four separate times to four separate panels that all asked the same question in the same order. Round three is the technical panel. The interviewer, who has clearly done this 213 times already today, slides a sheet of paper across the table with a single denormalized table on it — student ID, course, course instructor, instructor's office number, all jammed into one row repeated for every course a student takes — and says: "Normalize this to 3NF. Walk me through it."

You know the acronym. You have flash cards with "1NF, 2NF, 3NF, BCNF" written on them in three different colors. What you do not have, it turns out, is the actual difference between a **partial dependency** and a **transitive dependency**, and the interviewer can tell within four seconds, because you say "so basically we remove the redundant data" twice in slightly different words and then trail off staring at the instructor's office number like it owes you money. Three minutes later you are explaining **ACID** properties by reciting the acronym — "atomicity, consistency, isolation, durability" — in the confident, singsong rhythm of someone who memorized it on the auto rickshaw ride over, and the interviewer, not impressed, asks the follow-up that ends most candidates on this exact question: "Okay — what specifically does isolation prevent, that wouldn't be prevented by atomicity alone?" Silence. Somewhere in the waiting room outside, candidate 215 is rehearsing the same flash cards.

This is, almost beat for beat, the **DBMS interview questions** experience for a huge share of Greenroom's actual audience — campus and early-career candidates at TCS, Infosys, Wipro, Accenture, Capgemini, Cognizant, and the product-company screens that borrow the same fundamentals round. DBMS is one of the most heavily tested, most *finite* subjects in the entire campus interview circuit — there are only so many normal forms, only so many isolation levels, only so many join types — and yet candidates consistently underperform on it relative to how "knowable" it actually is, because they prepare by reading definitions instead of practicing explanations. This guide goes through the **DBMS interview questions and answers** that actually come up — keys, referential integrity, normalization with worked examples through every normal form, ACID with concrete failure modes, transactions, isolation levels and the exact anomalies each one prevents, deadlocks, indexing, join types, and DBMS vs RDBMS vs NoSQL — the way an interviewer actually wants to hear them: with an example, out loud, surviving a follow-up.

## Keys and relationships

Every DBMS round starts here, because keys are the cheapest possible question to ask and the fastest way to find out if a candidate has only memorized vocabulary. Let's set up one running example table we'll reuse throughout this guide — a simple `Enrollments` schema for a college:

```
Enrollments
------------------------------------------------------------------
student_id | student_email      | course_id | course_name | grade
------------------------------------------------------------------
101        | anu@college.edu    | CS101     | DBMS        | A
101        | anu@college.edu    | CS102     | OS          | B
102        | rahul@college.edu  | CS101     | DBMS        | A
103        | priya@college.edu  | CS103     | DSA         | A
```

### What's the difference between a primary key, candidate key, super key, and foreign key?

A **super key** is any column (or combination of columns) that uniquely identifies a row — it can contain extra, unnecessary columns. `{student_id, course_id, student_email}` is a super key for `Enrollments` because it does uniquely identify a row, even though `student_email` adds nothing useful.

A **candidate key** is a *minimal* super key — drop any column and it stops being unique. In our table, `{student_id, course_id}` is a candidate key (a student can enroll in many courses, so neither column alone is unique, but the pair is). If `student_email` were guaranteed unique per student and never reused, `{student_email, course_id}` would be a second, independent candidate key.

The **primary key** is the one candidate key the table designer actually picks to be the official unique identifier, used for foreign-key references and typically the clustered index. A table can have several candidate keys but only one primary key — the rest become "unique keys" you can still enforce constraints on without making them the primary identifier.

A **foreign key** is a column (or set of columns) in one table that references the primary key of another table, to model a relationship. If we split this into `Students(student_id, student_email)` and `Courses(course_id, course_name)` and `Enrollments(student_id, course_id, grade)`, then `Enrollments.student_id` is a foreign key referencing `Students.student_id`, and `Enrollments.course_id` is a foreign key referencing `Courses.course_id`.

### What's a composite key?

A **composite key** (or composite primary key) is a primary key made of two or more columns together, where no single column is unique on its own. In `Enrollments`, `{student_id, course_id}` is a composite primary key — a given student can appear many times (once per course) and a given course can appear many times (once per enrolled student), but the *pair* appears exactly once. Composite keys show up constantly in many-to-many join/bridge tables — order line items (`order_id` + `product_id`), exam seat allocations (`student_id` + `exam_session_id`), and so on.

### What's the difference between a unique key and a primary key?

Both enforce uniqueness across rows, but a primary key additionally disallows `NULL` and a table can have only one. A unique key allows exactly one `NULL` (in most RDBMS implementations) and a table can have several unique keys alongside its single primary key. Practically: `student_email` might be a unique key (no two students share an email, but email might theoretically be unknown/null for a brand-new record awaiting verification), while `student_id` is the primary key.

### What's referential integrity, and what breaks without it?

**Referential integrity** is the rule that a foreign-key value must either be `NULL` or match an existing primary-key value in the referenced table — you can't have an `Enrollments` row pointing at a `course_id` that doesn't exist in `Courses`. Without it enforced, two concrete things go wrong in production: first, **orphaned rows** — delete `CS101` from `Courses` and every `Enrollments` row referencing it now points at nothing, so a report joining the two tables silently drops those students or crashes depending on join type; second, **garbage inserts** — someone (or a buggy migration script) inserts `Enrollments(student_id=999, course_id='ZZZZ')` where neither `999` nor `'ZZZZ'` exist anywhere, and your analytics dashboard now reports enrollment counts that don't reconcile with the actual student or course tables. `FOREIGN KEY` constraints with `ON DELETE CASCADE` / `ON DELETE RESTRICT` exist specifically so the database — not application code that someone might forget to write correctly — enforces this. The official PostgreSQL documentation on foreign keys covers exactly this set of `ON DELETE`/`ON UPDATE` behaviors and is worth reading once end to end, because nearly every "why did deleting this row break that report" production bug traces back to one of these settings being missing or wrong.

![DBMS interview topics shown on a structured technical interview screen — normalization, keys, ACID, indexing](/assets/blog/pool-structured-screen.webp)

A normalization or ACID question on a whiteboard looks identical whether you actually understand it or just memorized the acronym — the interviewer only finds out which, once they ask the follow-up.

## Normalization — 1NF through BCNF, with worked examples

**Normalization** is the process of structuring tables to eliminate redundant data and the **update, insert, and delete anomalies** that redundancy causes. The "why" matters more than the definition: if the same fact (a course instructor's office number, say) is stored in five rows because five students take that course, then updating the office number means you must update it in all five places — miss one, and your database now contains *two different, contradictory answers* to the same factual question. That's an anomaly, and it's exactly what normalization exists to prevent. This is also the canonical example used in Elmasri and Navathe's *Fundamentals of Database Systems*, one of the standard DBMS textbooks assigned in most Indian engineering curricula — if any of this looks familiar from a semester exam, that's not a coincidence.

### 1NF — First Normal Form

A table is in 1NF if every column holds a single, atomic value — no repeating groups, no comma-separated lists crammed into one cell.

**Before (violates 1NF):**

```
StudentCourses
------------------------------------
student_id | name  | courses
------------------------------------
101        | Anu   | CS101, CS102
102        | Rahul | CS101
```

The `courses` column holds multiple values. This breaks the moment you try to query "find every student in CS101" with a plain `WHERE` clause, or count enrollments accurately — you'd need string parsing instead of relational logic, which defeats the point of a relational database.

**After (1NF):**

```
StudentCourses
------------------------
student_id | name  | course
------------------------
101        | Anu   | CS101
101        | Anu   | CS102
102        | Rahul | CS101
```

Now each row has one atomic course value. The cost: `name` is now repeated, which is the redundancy 2NF and 3NF go on to fix.

### 2NF — Second Normal Form

A table is in 2NF if it's in 1NF *and* has no **partial dependency** — meaning every non-key column depends on the *entire* composite primary key, not just part of it. A partial dependency only exists when your primary key is composite; it's not a concept that applies to a single-column key.

**Before (violates 2NF):**

```
Enrollments
--------------------------------------------------------
student_id | course_id | student_name | course_name | grade
--------------------------------------------------------
101        | CS101     | Anu          | DBMS        | A
101        | CS102     | Anu          | OS          | B
102        | CS101     | Rahul        | DBMS        | A
```

Primary key is `{student_id, course_id}`. But `student_name` only depends on `student_id` (not on `course_id` too), and `course_name` only depends on `course_id` — both are **partial dependencies** on just one part of the composite key, not the whole key. `grade`, by contrast, genuinely depends on the *combination* — a student's grade is specific to that student in that course. The anomaly: rename "Anu" and you must update it in every row she has an enrollment in, or her name diverges across rows.

**After (2NF) — split into three tables:**

```
Students                    Courses                  Enrollments
------------------------    ----------------------   --------------------------
student_id | student_name   course_id | course_name   student_id | course_id | grade
------------------------    ----------------------   --------------------------
101        | Anu            CS101     | DBMS          101        | CS101     | A
102        | Rahul           CS102     | OS            101        | CS102     | B
                                                       102        | CS101     | A
```

`student_name` now lives once per student, `course_name` once per course, and `Enrollments` holds only the facts that genuinely depend on the *pair*.

### 3NF — Third Normal Form

A table is in 3NF if it's in 2NF *and* has no **transitive dependency** — a non-key column depending on another non-key column, rather than depending directly on the primary key.

**Before (violates 3NF):**

```
Courses
---------------------------------------------------------
course_id | course_name | instructor   | instructor_office
---------------------------------------------------------
CS101     | DBMS        | Dr. Mehta    | Room 304
CS102     | OS          | Dr. Mehta    | Room 304
CS103     | DSA         | Dr. Iyer     | Room 212
```

`course_id` (the key) determines `instructor` — fine, that's a direct dependency. But `instructor_office` depends on `instructor`, not on `course_id` directly — it's a **transitive dependency**: `course_id → instructor → instructor_office`. The anomaly: if Dr. Mehta moves offices, you must update *every course row he teaches*, and if you miss one, the database now claims he has two different offices simultaneously — which is exactly the kind of contradiction normalization exists to make structurally impossible.

**After (3NF) — split out the instructor's attributes:**

```
Courses                              Instructors
-----------------------------        ---------------------------
course_id | course_name | instructor  instructor   | instructor_office
-----------------------------        ---------------------------
CS101     | DBMS        | Dr. Mehta   Dr. Mehta     | Room 304
CS102     | OS          | Dr. Mehta   Dr. Iyer      | Room 212
CS103     | DSA         | Dr. Iyer
```

Now `instructor_office` lives in exactly one place, keyed directly by `instructor`.

**The precise distinction interviewers actually want:** a **partial dependency** is a non-key attribute depending on *part of* a composite key (only possible with composite keys); a **transitive dependency** is a non-key attribute depending on *another non-key attribute* rather than the key itself (possible with any key shape). Confusing the two — which is exactly what happened to our candidate at the TCS table in the opening scene — is the single most common DBMS interview stumble, because the two terms sound similar and most flash-card prep doesn't force you to say the distinction out loud in your own words.

### BCNF — Boyce-Codd Normal Form

BCNF is a slightly stricter version of 3NF: for *every* functional dependency `X → Y` in the table, `X` must be a candidate key (super key, really — minimal isn't strictly required for the BCNF test, but it's commonly taught with candidate keys). 3NF allows a narrow edge case BCNF doesn't: a non-key attribute can determine part of a composite key, as long as the dependency target attribute is itself "prime" (part of some candidate key). The classic textbook example is a table tracking which student is taught which subject by which instructor, where each instructor teaches only one subject — `{student, subject}` is the candidate key, but `instructor → subject` is also a valid dependency where `instructor` isn't a key at all. That table is in 3NF but not BCNF, and decomposing it further (splitting `instructor → subject` into its own table) fixes it, at the cost of occasionally needing an extra join to reconstruct the original relationship. Most real interview rounds at the campus level stop at 3NF and only push to BCNF for candidates who are visibly comfortable — if you can correctly explain why most practical schemas are "3NF and good enough," you're already ahead of most candidates who try to bluff a BCNF example they don't actually understand.

### When would you deliberately denormalize?

**Denormalization** is intentionally reintroducing redundancy that normalization removed, traded off against fewer joins and faster reads. The textbook case: a reporting/analytics table or data warehouse fact table that's read thousands of times for every one time it's written — pre-joining `course_name` and `instructor_name` directly into a `daily_enrollment_report` table means a dashboard query hits one table instead of joining four, at the cost of needing to keep that redundant copy in sync (usually via a scheduled batch job or a trigger) whenever the source data changes. The honest framing for an interview: normalize for transactional correctness where writes matter and contradictions are unacceptable (the actual `Enrollments` system of record), denormalize for read-heavy, write-rare, "approximately correct as of the last refresh is fine" reporting layers — and say *why* out loud, because "it's faster" alone is the kind of answer that invites the obvious follow-up, "faster at what cost?"

<div class="verdict"><strong>The core truth on normalization:</strong> every normal form exists to kill one specific kind of anomaly — repeating groups (1NF), partial dependencies (2NF), transitive dependencies (3NF). If you can name the anomaly each form prevents and show one table before/after, you've answered the question better than 90% of candidates who only recite "1NF, 2NF, 3NF, BCNF" in order.</div>

## Transactions and ACID — with real failure modes

A **transaction** is a sequence of one or more database operations executed as a single logical unit — either every operation in it takes effect, or none do. The canonical example, and the one almost every interviewer reaches for, is a bank transfer: debit ₹5,000 from Account A, credit ₹5,000 to Account B. Two separate `UPDATE` statements, one logical operation.

```sql
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 5000 WHERE account_id = 'A';
UPDATE accounts SET balance = balance + 5000 WHERE account_id = 'B';

COMMIT;
```

`COMMIT` makes both updates permanent. If anything goes wrong before `COMMIT` — a constraint violation, a crash, an explicit `ROLLBACK` — the database undoes whatever partial work happened, as if the transaction never ran:

```sql
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 5000 WHERE account_id = 'A';
-- network blip / app crash / explicit cancel here
ROLLBACK;
-- Account A's balance is restored to its pre-transaction value, no partial debit survives
```

### ACID, explained with a failure mode for each letter

**Atomicity** — all operations in a transaction succeed, or none do; there's no "halfway committed" state. Without it: the debit from Account A succeeds, the credit to Account B fails (say, the server crashes between the two statements), and now ₹5,000 has simply vanished from the system — debited from A, never credited to B. Atomicity is what guarantees that can't happen; either both statements land, or the whole transaction rolls back and A keeps its money.

**Consistency** — a transaction can only take the database from one valid state to another valid state, respecting all constraints (foreign keys, check constraints, application-level invariants like "balance can't go negative"). Without it: a transaction could leave a `CHECK (balance >= 0)` constraint violated, or leave a foreign key pointing at a row that got deleted mid-transaction — consistency is the DBMS refusing to commit a transaction that would leave the database in a state that breaks its own rules.

**Isolation** — concurrent transactions don't see each other's uncommitted, in-progress changes; each transaction behaves as if it ran alone, even when others are running simultaneously. This is the specific property the candidate in our opening story couldn't articulate under follow-up. Without isolation: imagine Transaction 1 debits Account A but hasn't committed yet, and Transaction 2 reads Account A's balance *during that gap* and sees the debited (but not-yet-committed) value. If Transaction 1 then rolls back, Transaction 2 made a decision based on a balance that, as far as the database is concerned, never actually existed. Atomicity alone only guarantees Transaction 1 itself is all-or-nothing — it says nothing about what *other* transactions are allowed to see while Transaction 1 is mid-flight. That's precisely why isolation is a separate property with its own separate set of levels, covered next.

**Durability** — once a transaction commits, the change survives a crash, even if the power goes out one millisecond later. This is implemented via write-ahead logging (the DBMS writes the change to a durable log on disk *before* acknowledging the commit, so it can replay the log on recovery even if the in-memory state was lost). Without it: a customer sees "Payment Successful," the server crashes ten seconds later before the change is actually flushed to durable storage, and on restart, the database has silently forgotten the payment ever happened — exactly the kind of bug that turns into a support escalation and, eventually, a very uncomfortable postmortem.

### Isolation levels and the exact anomalies each one prevents

This is the question that separates "memorized ACID" from "actually understands isolation," and it's asked constantly, because the answer requires you to hold three distinct anomalies in your head and map each isolation level to which ones it allows versus blocks.

The three anomalies, defined precisely:

- **Dirty read** — reading another transaction's *uncommitted* changes, which might get rolled back, leaving you with data that never officially existed.
- **Non-repeatable read** — reading the same row twice within one transaction and getting two different values, because another transaction committed an update to that row in between your two reads.
- **Phantom read** — re-running the same *range* query twice within one transaction and getting a different set of rows, because another transaction inserted or deleted a row matching that range in between.

| Isolation level | Dirty read | Non-repeatable read | Phantom read |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed | Prevented | Possible | Possible |
| Repeatable Read | Prevented | Prevented | Possible (varies by engine) |
| Serializable | Prevented | Prevented | Prevented |

**Read Uncommitted** is the loosest level — transactions can see each other's uncommitted writes, which is fast but allows dirty reads; it's rarely used in practice for anything where correctness matters. **Read Committed** (the default in PostgreSQL and SQL Server) only ever lets you see committed data, eliminating dirty reads, but a row you read can still change underneath you if you re-read it later in the same transaction. **Repeatable Read** (MySQL/InnoDB's default) guarantees that if you read a row once, re-reading it later in the same transaction returns the same value — it locks or snapshots the rows you've touched. **Serializable** is the strictest: transactions behave as if they ran one at a time in some serial order, eliminating all three anomalies, at the cost of the most blocking/retries and the worst throughput under contention. The honest, practical answer interviewers want at the end: most real systems default to Read Committed or Repeatable Read because Serializable's throughput cost usually isn't worth it outside of genuinely correctness-critical paths like financial ledgers — pick the loosest level that your specific operation can tolerate, not reflexively the strictest one "to be safe."

It's worth being able to walk through the non-repeatable read example concretely, because "explain it with an example" is the near-guaranteed follow-up. Transaction 1 begins and reads Account A's balance: ₹10,000. While Transaction 1 is still open, Transaction 2 commits a deposit of ₹2,000 into Account A. Transaction 1 then reads Account A's balance *again*, before committing, and now sees ₹12,000 — a different value than its first read, within the same transaction, because it never re-read its own earlier read as a fixed snapshot. Under Read Committed, this is allowed (you only avoid seeing *uncommitted* changes, not changes committed by someone else mid-transaction). Under Repeatable Read or Serializable, Transaction 1's second read would still return ₹10,000, because the isolation level guarantees the row's value is stable for the duration of that transaction.

### What's a deadlock, and how does a DBMS handle it?

A **deadlock** happens when two (or more) transactions each hold a lock the other one needs, and each is waiting for the other to release it — neither can proceed, forever, unless something intervenes. Concretely: Transaction 1 locks Account A then tries to lock Account B; Transaction 2, running concurrently, has already locked Account B and now tries to lock Account A. Each is waiting on the other. Without intervention, both wait indefinitely.

**Detection** — most DBMS engines maintain a **wait-for graph**: a node per transaction, with a directed edge from Transaction X to Transaction Y if X is waiting on a lock Y holds. A cycle in this graph means a deadlock exists. The database periodically checks for cycles (or runs detection on every new lock wait, depending on the engine).

**Resolution** — once a cycle is found, the DBMS picks a **victim** (commonly the transaction that's done the least work, or holds the fewest locks, depending on the engine's heuristic) and forcibly rolls it back, releasing its locks so the other transaction(s) can proceed. The rolled-back transaction's application code typically gets a "deadlock detected, please retry" error and is expected to retry the whole transaction.

**Prevention** — the most common practical prevention strategy is *consistent lock ordering*: if every transaction that touches both Account A and Account B always locks them in the same order (say, always lower account ID first), the cyclic wait-for pattern in the example above simply can't form, because both transactions would queue for A first rather than approaching from opposite ends. Other strategies include lock timeouts (give up and retry after N seconds rather than waiting forever) and keeping transactions short so locks are held for less time.

<div class="verdict"><strong>The core truth on ACID and isolation:</strong> "atomicity, consistency, isolation, durability" is the easy half of the answer. The actual signal is whether you can name a concrete failure mode for each — a vanished ₹5,000, a decision made on data that got rolled back, a payment the server forgot after a crash — without needing the interviewer to ask "can you give an example" as a separate follow-up question.</div>

## Indexing and joins

### What's an index, and why does it speed up reads but cost writes?

An **index** is an auxiliary data structure — typically a B-tree or B+-tree, sometimes a hash index for pure equality lookups — that lets the database find rows matching a condition without scanning every row in the table. Without an index on `student_id`, finding one student's enrollments means scanning the entire `Enrollments` table row by row (a full table scan, O(n)). With a B-tree index on `student_id`, the database can navigate down the tree in roughly O(log n) comparisons to find the matching rows directly.

The cost is on the write side: every `INSERT`, `UPDATE`, or `DELETE` that touches an indexed column must also update the index structure itself, not just the underlying table row. A table with five indexes pays that index-maintenance cost five times on every write — which is exactly why "just add an index" isn't free advice, and why interviewers like asking "would you index every column" (the answer is no — index columns that are actually filtered/joined/sorted on frequently, and be deliberate about it on write-heavy tables).

### Clustered vs non-clustered index

A **clustered index** determines the *physical* order rows are stored on disk — there can be only one per table, because a table's rows can only be physically sorted one way. It's usually built on the primary key by default in many engines (InnoDB for MySQL, for instance, clusters on the primary key automatically). Looking up a row by the clustered key is fast because the index *is* the data, in sorted order — no extra lookup step.

A **non-clustered index** is a separate structure that stores the indexed column's values alongside a pointer (or, in InnoDB's case, the primary key value) back to the actual row — the table's physical storage order is untouched. A table can have many non-clustered indexes. Looking something up via a non-clustered index typically means one traversal of the index structure, then a second lookup ("bookmark lookup") back to the actual row — slightly more work than a clustered lookup, but still vastly cheaper than a full table scan.

Concretely, on our `Students(student_id, student_email, name)` table: `student_id` as the clustered/primary key means rows are physically stored in `student_id` order. A non-clustered index on `student_email` lets `WHERE student_email = 'anu@college.edu'` resolve quickly without scanning every row, even though the table itself isn't physically sorted by email.

### Types of joins, with example rows

Given:

```
Students                    Enrollments
-----------------------     --------------------------
student_id | name           student_id | course_id
-----------------------     --------------------------
101        | Anu            101        | CS101
102        | Rahul           103        | CS102
103        | Priya
```

(Note: `Rahul`, id 102, has no enrollment row; `course_id CS102`'s `student_id 103` matches `Priya`, but there's also an enrollment for a `student_id 999` not present in `Students` — useful for illustrating outer joins.)

- **INNER JOIN** returns only rows where both sides match — `Anu`/`CS101` and `Priya`/`CS102`. Rahul (no enrollment) and any orphaned enrollment row are excluded entirely.
- **LEFT JOIN** returns every row from the left table (`Students`), with `NULL`s filling in where there's no match on the right — so Rahul appears with `course_id = NULL`, since he's not enrolled in anything.
- **RIGHT JOIN** is the mirror image — every row from the right table (`Enrollments`), with `NULL`s for unmatched left-side columns — so an orphaned enrollment for `student_id 999` would appear with `name = NULL`.
- **FULL (OUTER) JOIN** returns everything from both sides — matched rows, Rahul with nulls on the right, and the orphaned enrollment with nulls on the left, all in one result set. It's the union of what LEFT and RIGHT would each return.

A common follow-up once you've named all four: "what's a self join, and when would you use one?" A **self join** is a table joined to itself, treated as if it were two separate tables via aliases — useful when rows in a table reference other rows in the *same* table. The classic example is an `Employees(employee_id, name, manager_id)` table, where `manager_id` points back to another `employee_id` in the same table: `SELECT e.name, m.name AS manager_name FROM Employees e JOIN Employees m ON e.manager_id = m.employee_id` resolves every employee's manager's name without a separate `Managers` table. It's worth having this one ready, because interviewers like it specifically because it tests whether you actually understand what a join does structurally, rather than having memorized "join = combine two tables" as a fixed phrase.

If joins specifically are where you're shakiest, our [SQL interview questions guide](/blog/sql-interview-questions) goes deeper on join syntax, window functions, and query-writing practice beyond the conceptual level covered here.

## DBMS vs RDBMS, and SQL vs NoSQL

A **DBMS** (Database Management System) is the broad category — any software that manages storage, retrieval, and updates of data, including ones that don't use tables or relationships at all (a flat-file system, a hierarchical or network-model legacy system). An **RDBMS** (Relational DBMS) is a specific kind of DBMS that organizes data into tables (relations) with rows and columns, enforces a schema, and supports relationships via keys — MySQL, PostgreSQL, Oracle, SQL Server are all RDBMSs. Every RDBMS is a DBMS; not every DBMS is an RDBMS. This is a near-guaranteed quick-fire question early in a DBMS round, and a surprising number of candidates answer it backwards under pressure, which is itself a useful thing to rehearse saying correctly out loud once before you're in the chair.

**SQL** (relational) databases enforce a fixed schema upfront, support ACID transactions natively, and use SQL as the query language — they're the right default when your data has clear structure and relationships, and correctness/consistency genuinely matters (financial records, inventory, anything where "is this number exactly right" is non-negotiable). **NoSQL** databases (MongoDB, Cassandra, DynamoDB, Redis, and others) trade some of that structure and ACID strictness for horizontal scalability and flexible/schema-less data models — they fit naturally when your data doesn't map cleanly to rows and columns (deeply nested documents, key-value caches, time-series or graph data), or when you need to scale writes across many machines more easily than a single relational instance typically allows. The honest answer interviewers want isn't "SQL good, NoSQL bad" or vice versa — it's a real decision framework: structured, relationship-heavy, correctness-critical data leans SQL; flexible-schema, massive-scale, eventually-consistent-is-acceptable data leans NoSQL, and plenty of real production systems use both side by side for different parts of the same product.

## How candidates actually prepare for DBMS rounds — and where each method falls short

Almost everyone preparing for a DBMS-heavy interview reaches for the same handful of resources. Each one teaches you *something real* — the problem is what each one fails to rehearse.

**GeeksforGeeks-style DBMS notes and PDF dumps.** These are genuinely useful as a syllabus — they reliably cover the right *topics* (normalization, ACID, indexing, joins), and several of the questions in this guide will look familiar if you've browsed one. The gap: a GfG page gives you a clean, pre-written definition and a tidy example, which trains recognition, not production. The first time an interviewer changes the table in front of you — swaps the columns, asks you to normalize a *different* schema than the textbook one you memorized — candidates who only ever read the dump (rather than worked through their own example) visibly stall, because they were never asked to *generate* an example from scratch, only to recognize one that was handed to them.

**A senior's WhatsApp-forwarded "Important DBMS Questions" PDF.** Every Indian engineering college has a version of this file, usually titled something like `DBMS_Imp_Qs_Final_FINAL.pdf`, forwarded from batch to batch with the same twenty questions and the same answers, occasionally with a typo from three placement cycles ago still intact. It's not useless — it's a real signal of what *has* been asked before, and pattern frequency is genuinely informative. But it's a closed list, optimized for recall of an exact answer, not for handling the live follow-up that changes the scenario — and interviewers know these PDFs exist and circulate, so the better ones deliberately ask a slightly different version of the "classic" question specifically to filter out candidates who memorized the PDF's answer verbatim without understanding why it's the answer.

**Generic ChatGPT Q&A prompting.** Typing "ask me DBMS interview questions" into a chat window and typing answers back is better than nothing — it's genuinely fine for drilling factual recall (normal form definitions, ACID letters, isolation level names). But it's a silent, text-based, self-paced exercise with no real interruption, no spoken delivery pressure, and the follow-ups only go as deep as the prompt you happened to write — which means it's easy to accidentally make your own practice easier than the real thing, without realizing it, because you control both sides of the conversation.

**YouTube placement-prep channels.** These are excellent for building the initial mental model — a well-made video walking through 3NF with diagrams genuinely helps the concept click for a lot of people, more than a wall of text does. The limitation is structural, not a knock on the content: watching someone else explain 3NF clearly is not the same skill as being able to explain it yourself, live, when someone you've never met says "okay, now do it with this table instead" thirty seconds after you finish your first answer.

The honest throughline: every one of these methods trains *recognition* — can you tell a correct answer when you see one — and none of them trains *production under follow-up*, which is the actual skill a TCS or Infosys technical panel is grading. That's the specific gap [Greenroom](/)'s spoken mock interview format is built for: you explain 3NF, or ACID, or why an index speeds reads, out loud, with your own example — not the GfG example, not the WhatsApp PDF's example — and the AI interviewer asks a real follow-up that changes the scenario the way an actual interviewer would (a different table to normalize, a different transaction to walk through), instead of letting you silently re-read a memorized definition off a flash card. It doesn't replace actually learning the material — you still need to understand normalization and ACID in the first place — but it's the one preparation method on this list that rehearses saying it out loud before the real panel does.

## Practise explaining, not just memorizing

DBMS rounds are explanation-heavy by design — "explain 3NF with an example," "why use an index," "what does isolation actually prevent" — and explanation is a verbal skill, not a recognition skill. You can read every answer in this guide twice, nod along, and still freeze the first time an interviewer changes the table on you mid-explanation, the way candidate 214 did at that TCS drive in the opening scene. [Greenroom](/) runs spoken **DBMS interview questions and answers** practice sessions that ask realistic follow-ups and give feedback on how clearly you explained the concept — not just whether your final answer matched a definition. Pair it with our [SQL interview questions guide](/blog/sql-interview-questions) and [PL/SQL interview questions guide](/blog/pl-sql-interview-questions) for the query-writing side, our [operating system interview questions guide](/blog/operating-system-interview-questions) since OS and DBMS get tested back-to-back in the same technical round at most service companies, our [data structures interview questions guide](/blog/data-structures-interview-questions) for the DS&A half of the same drive, and our [database administrator interview questions guide](/blog/database-administrator-interview-questions) if you're interviewing specifically for a DBA-track role rather than a general SDE one.

## Frequently asked questions

### What are the most common DBMS interview questions?

The most common **DBMS interview questions** cover keys (primary, candidate, super, foreign, composite) and referential integrity; normalization (1NF through BCNF with worked examples, partial vs transitive dependencies, when to denormalize); transactions and ACID properties; isolation levels and the specific read anomalies each one prevents (dirty read, non-repeatable read, phantom read); deadlocks and how the database detects and resolves them; indexing (clustered vs non-clustered); join types (inner, left, right, full); and the distinctions between DBMS vs RDBMS and SQL vs NoSQL.

### What is normalization and why is it important?

Normalization is the process of structuring database tables to eliminate redundant data and the update, insert, and delete anomalies that redundancy causes. It progresses through normal forms — 1NF removes repeating groups so every column holds one atomic value, 2NF removes partial dependencies on part of a composite key, 3NF removes transitive dependencies where one non-key column depends on another non-key column instead of the key itself, and BCNF tightens 3NF for a narrow set of edge cases. It keeps data consistent across the database, though it's sometimes deliberately relaxed (denormalized) for read-heavy reporting tables where query speed matters more than eliminating every last redundancy.

### What's the difference between a partial dependency and a transitive dependency?

A partial dependency is when a non-key attribute depends on only *part of* a composite primary key rather than the whole key — it can only happen when the key has more than one column. A transitive dependency is when a non-key attribute depends on *another non-key attribute* rather than depending directly on the primary key — it can happen regardless of whether the key is single-column or composite. 2NF eliminates partial dependencies; 3NF eliminates transitive dependencies. Confusing the two is one of the most common stumbles in DBMS interviews, because the terms sound similar but describe structurally different problems.

### What does ACID stand for in DBMS, and what does isolation specifically prevent?

ACID describes the four guarantees of a reliable transaction: Atomicity (every operation in a transaction succeeds, or none do — no halfway-committed state), Consistency (a transaction can only move the database from one valid state to another, respecting all constraints), Isolation (concurrent transactions don't see each other's uncommitted, in-progress changes, so each behaves as if it ran alone), and Durability (once committed, a change survives a crash, via mechanisms like write-ahead logging). Isolation specifically prevents one transaction from reading another transaction's uncommitted data (a dirty read) — atomicity alone only guarantees a single transaction is all-or-nothing, it says nothing about what other concurrent transactions are allowed to see while that transaction is still in progress, which is why isolation exists as its own separate property with its own separate isolation levels.

### What's a deadlock and how does a DBMS resolve it?

A deadlock happens when two or more transactions each hold a lock another one needs, so every transaction in the cycle is waiting on another one indefinitely. Databases detect deadlocks using a wait-for graph — a node per transaction with edges showing who's waiting on whom — and a cycle in that graph signals a deadlock. The database resolves it by picking a victim transaction (often the one that's done the least work) and forcibly rolling it back, releasing its locks so the remaining transaction(s) can proceed; the rolled-back transaction typically retries. The most common prevention strategy is enforcing a consistent lock-acquisition order across all transactions so a cyclic wait pattern can't form in the first place.

### How should I prepare for DBMS interview questions and answers?

Work through one concrete example per concept in your own words rather than memorizing a definition — a real table you normalize step by step through 1NF, 2NF, and 3NF; a real transfer scenario for ACID; a real pair of queries for isolation-level anomalies. Then practise saying those explanations out loud, ideally with a voice-based mock interview that asks realistic follow-up questions, since DBMS rounds at TCS, Infosys, Wipro, Accenture, and most product companies are explanation-heavy and verbal, not multiple-choice — the interviewer is listening for whether you understand *why*, not whether you can recite an acronym in order.

DBMS rounds reward clear explanations with real examples, spoken out loud, under a follow-up that changes the scenario. Greenroom runs spoken technical interviews that follow up on your answers and give feedback. Free to start.