---
title: Database Administrator Interview Questions and Answers (2026)
description: Database administrator interview questions that actually get asked in 2026 — backups, point-in-time recovery, RPO/RTO, slow query tuning, indexing, replication, sharding, HA and security, with worked examples.
url: https://usegreenroom.app/blog/database-administrator-interview-questions
last_updated: 2026-06-21
---

← Back to blog

Roles

# Database administrator interview questions

June 21, 2026 · 35 min read

![Database administrator interview questions and answers — cover from Greenroom, the AI mock interviewer](/assets/blog/database-administrator-interview-questions-hero.webp)

"Walk me through restoring this database to exactly 2:47pm yesterday." The interviewer says it the way someone orders a coffee — completely unbothered — and you feel your stomach do something it has never done before in a job interview. Not 2:45. Not "sometime yesterday afternoon." 2:47pm. There's a comma in the scenario about a bad deployment at 2:50pm that corrupted three tables, and the entire weight of a fictional company's fictional Tuesday is suddenly resting on whether you know the difference between a full backup and a transaction log.

You say "I'd just restore from backup," with the calm confidence of someone who has never actually been paged at 3am. The interviewer doesn't even blink. "Which backup?" they ask. "The full backup from Sunday is six days old. What do you do with the other six days of data?" And now you're doing math in your head about backup chains while trying to look like a person who has slept recently.

This is, almost word for word, the scene that plays out across thousands of **database administrator interviews** every year, because DBA interviews aren't really testing whether you've memorized backup terminology — they're testing whether you can stay calm and reason clearly when someone's actual data is on the line and the clock in the scenario is ticking. The job itself is a 3am pager going off because replication lag just spiked past the alert threshold, and the interview is designed to simulate exactly that pressure, minus the coffee and the open Slack channel full of panicking product managers.

This guide covers the **database administrator interview questions** that actually come up in 2026 — backup types and recovery sequencing, point-in-time recovery, RPO and RTO, how to actually test a backup instead of just hoping it works, diagnosing a slow query with execution plans, indexing strategy (including when an index actively hurts you), deadlocks, replication and lag, sharding versus partitioning, high availability and failover, and the security questions every DBA round eventually reaches — access control, encryption, and auditing. Each section includes a worked example, not just a definition, because "what is a backup" and "walk me through restoring to 2:47pm" are different questions, and only one of them is what you'll actually be asked.

## Backup types: full, incremental, differential

Every DBA interview opens with some version of "explain the backup types," and it's a fair opening question because if you're shaky here, everything downstream — recovery, RPO/RTO, the whole "protect the data" mandate of the job — falls apart. **Database administrator interview questions** about backups are rarely asking for a dictionary definition; they're checking whether you understand the trade-off each type makes between backup speed, storage cost, and restore complexity.

A **full backup** copies the entire database — every table, every row, every index — to a backup file or storage target. It's the simplest to restore from (you just... restore it) and the most expensive to take repeatedly: a 500GB production database doesn't shrink because you backed it up yesterday, so a nightly full backup of a large database can take hours and consume serious I/O and storage.

An **incremental backup** copies only the data that changed since the *last backup of any type* — full or incremental. This makes incrementals fast and small, but it creates a dependency chain: to restore, you need the last full backup plus every incremental since, applied in order. Lose or corrupt one incremental in the middle of the chain, and everything after it becomes unrestorable. This is the detail interviewers specifically probe, because it's the detail that bites real DBAs in production.

A **differential backup** copies everything that changed since the *last full backup* — not since the last differential. This means differentials grow larger over time (Monday's differential is small, Saturday's differential before next week's full backup is large, because it's capturing six days of changes), but restoring only ever needs two pieces: the last full backup, plus the most recent differential. No chain to walk, no risk of one broken link taking down the whole restore.

**Worked example — a realistic weekly schedule:**

```
Sunday    02:00  FULL backup        (baseline — all data)
Monday    02:00  DIFFERENTIAL       (changes since Sunday's full)
Tuesday   02:00  DIFFERENTIAL       (changes since Sunday's full, growing)
Wednesday 02:00  DIFFERENTIAL       (changes since Sunday's full, growing more)
...
Saturday  02:00  DIFFERENTIAL       (largest of the week)
```

If the database fails on Thursday at noon, the restore sequence is: Sunday's full backup, then Wednesday's differential (the most recent one before the failure) — two files, two steps. Compare that to an incremental-only schedule, where the same Thursday-noon restore would require Sunday's full plus Monday's, Tuesday's, *and* Wednesday's incrementals, applied strictly in order. Differentials trade some extra storage growth across the week for a faster, simpler, less fragile restore path — which is exactly the kind of trade-off an interviewer wants you to state out loud, unprompted, rather than just naming the three types and stopping.

Most production setups in 2026 actually combine all three with transaction log backups layered on top (more on that next), and the official documentation for whichever engine you're using — PostgreSQL's documentation on continuous archiving and point-in-time recovery, or Microsoft's SQL Server backup documentation — is worth reading end to end at least once, because every engine has its own specific terminology for the same underlying chain logic.

## Point-in-time recovery — how "restore to 2:47pm" actually works

This is the question from the cold open, and it's a favorite precisely because "I'd restore from backup" is an incomplete answer that sounds complete. **Point-in-time recovery (PITR)** lets you restore a database to any specific moment — not just to the timestamp of your last backup — and the mechanism that makes it possible is the **transaction log** (called the write-ahead log or WAL in PostgreSQL, the redo log in MySQL/InnoDB, and the transaction log in SQL Server).

Every change to the database — every insert, update, delete, schema change — gets written to the transaction log *before* it's applied to the actual data files, which is also what guarantees durability and crash recovery even outside a restore scenario. Because the log is a sequential, timestamped record of every change, you can replay it forward from a known starting point (a full backup) up to any arbitrary instant, then stop.

**Worked restore scenario, matching the cold open:** a bad deployment at 2:50pm Tuesday corrupted three tables. You have Sunday's full backup, Monday's differential, and continuous transaction log backups taken every 15 minutes since Sunday. The restore sequence is:

1. Restore Sunday's full backup (baseline).
2. Apply Monday's differential (gets you to Monday 02:00, fast-forwarded).
3. Replay transaction logs from Monday 02:00 forward, in order, stopping the replay *exactly* at Tuesday 2:47pm — three minutes before the bad deployment's writes hit the log.
4. Bring the database online in a recovered state, verify row counts and a few known records against expectations, then redirect application traffic.

The "stop the replay at a specific timestamp" step is the actual mechanic behind "restore to 2:47pm" — it's not a special button, it's transaction log replay with an early exit condition. Most engines support this natively (`RECOVERY_TARGET_TIME` in PostgreSQL, `STOPAT` in SQL Server's `RESTORE LOG` command, binlog replay up to a position or timestamp in MySQL). The interview signal here isn't whether you remember the exact flag name — it's whether you can explain *why* this is possible at all, which is the transaction log, not backup magic.

## RPO and RTO — the two numbers that actually define your backup strategy

**RPO (Recovery Point Objective)** and **RTO (Recovery Time Objective)** are the two numbers that should drive every backup decision you make, and a DBA who can't define both crisply is a DBA who's been backing things up without ever asking why.

**RPO** answers: "how much data can we afford to lose?" — measured in time. An RPO of 15 minutes means the business accepts losing at most 15 minutes of transactions in a disaster; an RPO of 24 hours means losing up to a full day is tolerable. RPO is what determines how *frequently* you back up — a 24-hour RPO is satisfiable with nightly full backups alone, but a 15-minute RPO requires continuous transaction log shipping or near-real-time replication, because nothing else closes the gap.

**RTO** answers: "how long can we be down?" — also measured in time, but it's about restore speed and process, not data loss. An RTO of 1 hour means the restore-and-recover process, end to end, must complete within an hour of the incident being declared; an RTO of 5 minutes basically rules out a manual restore-from-backup process entirely and points you toward an already-running standby/replica you can fail over to instead.

**Why this distinction matters in practice:** a payments database might need an RPO of under 1 minute (you genuinely cannot lose a transaction record) and an RTO of under 5 minutes (checkout being down for an hour is a revenue and trust disaster) — which together rule out backup-based recovery as the *primary* strategy and point straight at synchronous replication with automated failover. A internal analytics warehouse, by contrast, might tolerate an RPO of 24 hours and an RTO of several hours, because losing a day of half-processed analytics events and being down for an afternoon is annoying, not catastrophic — nightly full backups are genuinely fine here, and spending engineering effort on synchronous replication for this system would be solving a problem nobody has.

The interview question to expect: "given this system, what RPO/RTO would you target, and what backup/replication strategy gets you there?" — and the strongest answers state the numbers first, then derive the strategy from them, rather than describing a generic backup setup and hoping it happens to match unstated requirements.

## Testing backups — the point nobody skips on a slide but everybody skips in production

Here's the line that should be tattooed on every DBA's monitor: an **untested backup is not a backup** — it's a file you *hope* is a backup, and hope is not a recovery strategy. This is one of the most common **DBA interview questions and answers** pairs because it separates candidates who've actually run an on-call rotation from candidates who've only read about one.

Backups fail silently more often than people expect: a backup job can complete "successfully" while the resulting file is truncated, corrupted, encrypted with a key nobody can find anymore, or simply backing up the wrong database after a config drift nobody noticed for three weeks. None of these show up as a red alert in your monitoring unless you specifically check for them.

**How a DBA actually verifies backups work, in order of rigor:**

- **Checksum verification** — most backup tools (and `pg_basebackup`, `mysqldump --routines`, native SQL Server `BACKUP` with `CHECKSUM`) can validate that the backup file itself isn't corrupted at the bit level. This catches storage-layer corruption but tells you nothing about whether the backup is *logically* complete or restorable.
- **Restore drills** — actually restoring the backup to a separate environment on a schedule (weekly, monthly, depending on how critical the system is) and running validation queries against it: row counts matching expectations, key tables present, a handful of known records spot-checked. This is the only test that actually proves you can recover, because it exercises the real restore path under real conditions, not a simulation of it.
- **Recovery time measurement** — timing the restore drill end to end and comparing it against your stated RTO. A backup that restores correctly but takes 14 hours when your RTO is 1 hour is a backup that technically "works" and operationally fails the actual requirement.
- **Documented runbooks** — a written, current, step-by-step restore procedure that someone other than you could follow at 3am. The worst time to discover your restore process lives only in your head is during the incident it's supposed to solve.

AWS's own documentation on RDS backup and restore, and similarly Azure SQL's operational guidance, both explicitly recommend scheduled restore testing as a baseline practice — not an optional nice-to-have — precisely because so many real outages trace back to a backup that "should have worked." The interview question "how do you know your backups actually work" is fishing for whether you've internalized this, or whether your answer stops at "the backup job shows green."

There's a second layer to this question interviewers sometimes add as a follow-up: "how often should you run a restore drill, and on what?" The honest answer scales with how critical the system is and how often its schema or data shape changes — a payments database might warrant a monthly full restore drill plus automated checksum verification on every single backup, while a low-traffic internal tool might reasonably get a quarterly drill. What you're avoiding either way is the scenario where the *first* time anyone actually restores from a backup is during a real incident, because that's also the worst possible time to discover the restore script references a server hostname that was decommissioned eight months ago, or that the backup user's credentials expired and nobody noticed because the job kept "succeeding" against an empty target.

## Diagnosing a slow query — execution plans, EXPLAIN, and reading the evidence

Performance tuning is the other half of the DBA interview, and the gateway question is almost always some version of: "a query that used to run in 50ms now takes 8 seconds — walk me through how you'd find out why." The expected process is to reason from evidence, not guess.

The first move is always the **execution plan** — the database engine's own explanation of how it intends to (or did) retrieve the data: which indexes it used, in what order it joined tables, whether it scanned an entire table or used an index to jump straight to relevant rows. `EXPLAIN` shows the *planned* execution without running the query; `EXPLAIN ANALYZE` actually runs it and reports real timings and row counts alongside the plan, which is what you want when the plan itself looks reasonable but the query is still slow — the gap between estimated and actual rows is often where the real problem hides.

**Worked example.** Say this query is the slow one:

```sql
SELECT order_id, customer_name, total
FROM orders
WHERE customer_email = 'priya@example.com'
  AND status = 'shipped';
```

Running `EXPLAIN ANALYZE` against it on an unoptimized `orders` table might return something like:

```
Seq Scan on orders  (cost=0.00..48213.00 rows=3 width=64)
                     (actual time=0.02..812.55 rows=3 loops=1)
  Filter: ((customer_email = 'priya@example.com'::text) AND (status = 'shipped'::text))
  Rows Removed by Filter: 1999997
Planning Time: 0.110 ms
Execution Time: 812.61 ms
```

`Seq Scan` is the tell — a **sequential (table) scan**, meaning the engine read all two million rows and threw away 1,999,997 of them to find the 3 that matched. That's the table-scan-versus-index-scan distinction interviewers are specifically listening for you to name: a table scan reads every row linearly; an **index scan** uses a data structure built ahead of time to jump almost directly to matching rows, the same way a book index gets you to a topic without reading every page.

After adding an index — `CREATE INDEX idx_orders_email_status ON orders (customer_email, status);` — the same query's plan changes shape entirely:

```
Index Scan using idx_orders_email_status on orders
                     (cost=0.42..8.44 rows=3 width=64)
                     (actual time=0.03..0.04 rows=3 loops=1)
  Index Cond: ((customer_email = 'priya@example.com'::text) AND (status = 'shipped'::text))
Planning Time: 0.095 ms
Execution Time: 0.06 ms
```

812ms down to 0.06ms — four orders of magnitude — and the actual lesson here, the part interviewers want stated explicitly, is the *process*: read the plan, find the scan type and the filter that's discarding nearly all the rows, add an index that matches the filter columns, re-run the plan, confirm the scan type changed and the timing improved. Guessing "maybe add an index somewhere" without first reading the plan is the wrong process even when it accidentally produces the right answer.

## Indexing strategy — when an index helps, and when it actively hurts

**Indexing strategy** is where DBA interviews separate "knows what an index is" from "has actually managed index sprawl on a real production table," because the honest answer to "should I add an index" is "it depends," and interviewers want to hear the *depends*.

Indexes help reads dramatically — that's the whole point, as shown above — but they're not free. Every index has to be updated on every `INSERT`, `UPDATE`, or `DELETE` to the columns it covers, which means a table with ten indexes pays a write-amplification cost ten times over on every write. **Over-indexing** is a real, common production problem: a table accumulates indexes added reactively over years, each one solving a single past slow-query incident, until writes slow to a crawl and nobody remembers which indexes are actually still used by any live query. A good DBA periodically audits index usage (most engines expose this — `pg_stat_user_indexes` in PostgreSQL, for instance) and drops indexes with zero or near-zero scans.

The rule of thumb worth stating in an interview: index columns that appear in `WHERE` clauses, `JOIN` conditions, and `ORDER BY` clauses on large, frequently-queried tables — and be deliberately cautious about indexing small tables (a full scan of 500 rows is already fast; an index adds write overhead for no real read benefit) or tables with heavy write traffic and rare reads (a logging/audit table you write to constantly and query once a quarter is a bad index candidate).

**Clustered vs non-clustered indexes** is the natural follow-up. A **clustered index** determines the actual physical order data is stored on disk — there can only be one per table, because data can only be physically sorted one way at a time. A **non-clustered index** is a separate structure that stores the indexed column(s) plus a pointer back to the actual row, and a table can have many of these. A concrete example: an `orders` table clustered on `order_id` (so rows are physically stored in `order_id` order, making range scans like "all orders between ID 1000 and 2000" very fast) can additionally have a non-clustered index on `customer_email` for fast lookups by customer — that lookup goes through the non-clustered index, finds the pointer, then fetches the actual row from wherever it physically sits in the `order_id`-ordered data. Choosing the clustered index column wisely matters more than people expect, because changing it later means physically rewriting the entire table.

A sharper version of this question some interviewers ask: "you have a non-clustered index on `customer_email`, but a query filtering by `customer_email` is still slow — why?" The likely answer is that the index lookup found the right rows fast, but then had to do a separate trip back to the actual table data for every matching row to fetch the other columns the query asked for (`SELECT order_id, customer_name, total`) — a **lookup** (sometimes called a "bookmark lookup") that can dominate the cost when many rows match. The fix many engines support is a **covering index** — including the extra columns the query needs directly in the index itself, so the engine never has to leave the index structure at all. This is the kind of follow-up that rewards someone who's actually read execution plans under load, not someone who memorized "clustered means physically sorted."

## Deadlocks — what causes them and how a DBA actually prevents recurrence

A **deadlock** happens when two (or more) transactions each hold a lock the other one needs, and neither can proceed — Transaction A holds a lock on Row 1 and is waiting for a lock on Row 2; Transaction B holds the lock on Row 2 and is waiting for the one on Row 1. Neither will ever release what the other needs, so without intervention both would wait forever.

Database engines don't actually let this happen forever — they run **deadlock detection** (typically by building a wait-for graph and checking for cycles) and, once a deadlock is confirmed, the engine picks one transaction as the "victim," forcibly rolls it back, and lets the other proceed. The victim's application code gets an error back and is expected to retry.

The interview question is rarely "what is a deadlock" alone — it's almost always followed by "how do you stop this from happening repeatedly in production," and the real answer has a few concrete levers: **access tables and rows in a consistent order** across all transactions (if every code path that touches both Orders and Inventory always locks Orders first, then Inventory, the circular wait pattern that causes deadlocks structurally can't form); **keep transactions short** so locks are held for the minimum time needed, reducing the window where two transactions can collide; **use the lowest isolation level that's actually correct for the use case**, since stricter isolation levels hold broader and longer locks; and **add retry logic with backoff** in the application layer for the cases that do still occur, since some deadlocks under high concurrency are a cost of doing business, not a bug to eliminate to zero. A DBA who's actually debugged a recurring deadlock will usually also mention pulling the engine's deadlock graph/log output (SQL Server's deadlock graph XML, Postgres's `log_lock_waits`) to identify the *exact* two queries and lock order involved, rather than guessing from the application code alone.

A good interviewer will push one level further: "the retry logic is in place and the error rate dropped, but you're still seeing occasional deadlocks once a week under peak load — is that a problem?" The honest, senior-sounding answer is: not necessarily. Some baseline rate of deadlocks under genuinely concurrent, overlapping access patterns is normal and expected in any system with meaningful write concurrency — the goal isn't a deadlock count of zero, it's a deadlock rate low enough that retries absorb it invisibly to the user, with monitoring in place to catch if that rate suddenly spikes (which usually signals a new code path was deployed that broke the consistent lock-ordering convention everyone else follows).

## Replication — primary/replica, sync vs async, and lag

**Database replication** keeps a copy of your data on one or more additional servers, continuously, and it's foundational to both high availability and read scaling. The standard topology is **primary/replica** (sometimes still called master/slave in older docs): all writes go to the primary, and the primary streams its changes — usually via the same transaction log mechanism that powers point-in-time recovery — to one or more replicas, which apply those changes and stay in sync.

**Synchronous replication** means the primary waits for the replica (or replicas) to confirm they've received and applied a write before telling the application the write succeeded. This guarantees zero data loss on failover — the replica is always exactly caught up — but it adds latency to every single write, since you're now waiting on a network round-trip to another machine for every commit, and if the replica goes unreachable, you have to choose between blocking writes entirely or degrading your guarantee.

**Asynchronous replication** means the primary commits the write and responds to the application immediately, then ships the change to replicas without waiting. Writes stay fast regardless of replica health, but it introduces **replication lag** — a window of time where the replica is behind the primary — and if the primary fails during that window, whatever hadn't yet replicated is gone.

**A concrete replication lag scenario:** an e-commerce primary is replicating asynchronously to a read replica that serves the "order history" page (a common pattern — route reads that can tolerate slight staleness to a replica, keep the primary free for writes). A customer places an order, the primary commits it instantly, the confirmation page shows success — but the order history page, reading from the replica, is 800ms behind under load and doesn't show the new order yet. The customer refreshes, panics, opens a support ticket. This isn't a bug; it's the explicit, known trade-off of asynchronous replication, and the DBA-relevant decision is *where* lag is tolerable (order history, definitely) versus where it isn't (the payment confirmation itself must read from the primary, always). Monitoring replication lag as a first-class metric, with alerting on a threshold tied to your actual RPO, is the operational discipline that keeps this trade-off from becoming an incident.

It's also worth knowing what actually *causes* lag to spike, because "replication lag is high, what do you check" is a common follow-up. The usual suspects: a long-running transaction on the primary that the replica can't apply changes past until it commits; a sudden burst of write volume that outpaces the replica's apply rate even though network bandwidth is fine; a replica under its own read load contention, where queries running against it are competing for the same CPU/I/O the replication apply process needs; or, in the worst case, a network partition between primary and replica that's degrading throughput without fully breaking the connection, which is sneaky precisely because it doesn't trip an obvious "replica unreachable" alert. A DBA who's actually chased this down checks the apply lag metric, the primary's recent write volume, and the replica's own resource utilization, roughly in that order, rather than immediately assuming "the network is slow."

## Sharding and partitioning — different tools for scale

These two get conflated constantly, and a DBA interview will often ask you to draw the line explicitly. **Partitioning** splits a single table's data across multiple physical segments *within the same database instance* — by range (orders partitioned by month), by list (customers partitioned by region), or by hash — while the table still presents as one logical table to queries, and a single database server still owns all the data. It mainly helps with manageability (you can drop an entire old month's partition instantly instead of running a slow `DELETE`) and query performance (a query filtered by date range only has to scan the relevant partitions, not the whole table — "partition pruning").

**Sharding** splits data across *multiple separate database instances/servers*, each owning a distinct subset of the data (often by a shard key — customer ID, tenant ID, geographic region), with no single server holding the full dataset. Sharding solves a different problem than partitioning: it's what you reach for when a *single machine's storage or throughput ceiling* is the actual constraint, not just query/manageability convenience within one machine's data.

The practical guidance worth stating in an interview: **reach for partitioning first** — it's far less operationally complex, doesn't require application-layer routing logic, and solves the manageability/pruning problem for the vast majority of "this table is huge" situations. **Reach for sharding only when partitioning genuinely can't solve the problem anymore** — when write throughput or total data volume has outgrown what any single machine can handle regardless of how it's internally organized — because sharding brings real new complexity: cross-shard joins become expensive or impossible, transactions spanning shards need special handling, and rebalancing shards as data grows unevenly is a genuinely hard operational problem. Plenty of companies partition their way to a comfortable scale and never need to shard at all; treating sharding as the default "scaling" answer in an interview, without acknowledging this ordering, reads as someone who's read about sharding rather than operated a large table.

![DBA technical interview screen with a structured live database scenario](/assets/blog/pool-structured-screen.webp)
<figcaption>A DBA technical screen rarely stays abstract for long — expect the scenario to change mid-answer, the same way a real outage does.</figcaption>

## High availability and failover — and the split-brain risk nobody wants to find out about live

**High availability (HA)** means the system keeps serving traffic through component failures — specifically, in the DBA context, surviving the loss of the primary database server without an extended outage. The mechanism is **failover**: detecting that the primary is down or unreachable, promoting a replica to become the new primary, and redirecting traffic to it — ideally automatically, ideally fast enough that the outage barely registers.

Failover sounds simple in a sentence and is genuinely hard in practice, and the specific hard part interviewers like to probe is **split-brain**: a scenario where a network partition makes the old primary *think* it's still primary (it's still up, still accepting writes from clients that can still reach it) while the failover system has *also* promoted a replica to primary, believing the original primary is dead. Now you have two servers both accepting writes as if they're the single source of truth, and reconciling those two divergent histories afterward can mean real, unrecoverable data loss or duplication.

The standard defenses worth naming: **quorum-based failover** (a majority of nodes in a cluster must agree the primary is actually down before promoting a replica — this is what etcd/Raft-based systems and tools like Patroni for PostgreSQL provide, specifically to prevent a single confused node from triggering a bad promotion), **fencing** (actively cutting off the old primary's ability to accept writes or reach storage once a new primary is promoted, rather than just hoping it notices and steps down gracefully), and **a clear, tested runbook for manual failover** as a fallback when automation itself is in question. The interview-grade answer to "how would you design failover for this system" names the split-brain risk unprompted and explains which defense you'd use — not just "we'd have a standby replica," which is the part everyone already assumes.

## Database security — access control, encryption, auditing

Every DBA interview eventually reaches security, because data being safe isn't just about backups — it's about who can touch it at all.

**Access control** should follow the **principle of least privilege**: every user, service account, and application connection gets only the permissions it actually needs, never broad admin access "just in case." Practically, this means **role-based access control** — defining roles like `read_only_analyst` or `app_write_service` with a specific, minimal permission set, then assigning users and services to roles rather than granting permissions individually per account, which becomes unauditable at scale. A DBA should be able to answer "who can write to the payments table" with a short list of roles, not a long list of individual accounts granted ad hoc over the years.

**Encryption** has two distinct contexts that get tested separately: **encryption at rest** protects data sitting on disk — if someone steals a physical drive or an unencrypted backup file, the data is unreadable without the key — and is usually handled at the storage/filesystem or database-engine level (transparent data encryption, full-disk encryption). **Encryption in transit** protects data moving across the network — between application and database, or between primary and replica — typically via TLS, preventing anyone with network access from reading queries or results as they pass by. Both matter and solve different threats; a system encrypted at rest but sending plaintext queries over an internal network is still exposed to anyone who can sniff that traffic.

**Auditing** means maintaining a record of who accessed or changed what, and when — essential for compliance (many regulated industries require it outright) and for incident forensics, since "we don't know who changed that row" is a much worse answer during an investigation than having a clean audit log to point to. Most engines support native audit logging (PostgreSQL's `pgaudit` extension, SQL Server's built-in Audit feature) that can log at varying granularity — every query, or just schema changes and privilege grants — and the DBA-relevant trade-off to mention is that fine-grained auditing has a real performance and storage cost, so the right granularity depends on what you're actually trying to detect or prove.

A realistic follow-up in this section: "a row in the payments table changed and nobody on the team admits to it — walk me through how you'd find out what happened." The answer that actually demonstrates DBA judgment starts with the audit log (who, what query, what timestamp), cross-references it against the transaction log if the audit log's granularity is too coarse to show the exact statement, and checks whether the change came from a human session or a service account — because "an automated job did it" and "someone ran an ad hoc UPDATE through a shared admin login" are very different incidents with very different fixes. This is also usually the moment a DBA brings up why shared admin logins are a bad practice in the first place: if every engineer connects as the same generic `admin` user, your audit log can tell you *that* a change happened but not *who* made it, which defeats half the point of auditing before the investigation even starts.

<div class="verdict"><strong>The core truth:</strong> DBA interviews reward people who reason from evidence under pressure — reading an execution plan instead of guessing at indexes, naming the actual backup chain instead of saying "restore from backup," naming split-brain risk before being asked. The job is "don't lose the data, keep it fast, keep it available," and the interview is just that job compressed into 45 minutes with a hypothetical 3am incident attached.</div>

## How candidates actually prepare for DBA interviews — and where each method falls short

Most candidates preparing for a **database administrator interview** lean on some combination of four methods, and each one trains a real but partial slice of what the actual interview tests.

**Generic "DBA interview questions" PDF dumps and GeeksforGeeks-style lists.** These are genuinely useful for knowing *what topics* show up — backups, indexing, replication, the usual suspects — and several of the questions in this guide will look familiar if you've browsed one. The gap is that a question list gives you the prompt and often a clean model answer, but real DBA interviews change the scenario mid-question specifically to see if you understand the mechanism or just memorized a sentence. "Explain point-in-time recovery" from a dump becomes, live, "okay, now walk me through the actual restore steps to reach 2:47pm given this backup schedule" — and a memorized definition with no mechanism underneath it stalls exactly there.

**A friend's WhatsApp notes or a senior colleague's "questions I got asked" list.** Calibration value is real — knowing that someone's onsite included a live EXPLAIN-plan exercise, or that a particular company leans hard on replication questions, helps you mentally prepare for the *shape* of the round. The honest limitation is that it's one data point from one interviewer on one team in one cycle, and treating it as a syllabus rather than orientation means you might walk in deep on sharding and shallow on the deadlock question that actually gets asked.

**Generic ChatGPT mock-interview prompting.** Typing "ask me DBA interview questions" into a chat window and typing answers back is better than nothing, and it drills factual recall well — backup type definitions, RPO/RTO terminology, the vocabulary of replication. What it doesn't replicate is the specific discomfort of being asked to *narrate a recovery plan out loud, live*, to someone who can interrupt you mid-sentence and change the scenario ("actually, the incremental backup from Wednesday is also corrupted — now what"). It's a text-based, turn-taking exercise with no real-time pressure, and the follow-ups are only as sharp as the prompt you wrote — which makes it easy to accidentally prompt yourself into an easier interview than the one you'll actually get.

**LeetCode-style SQL query drills.** Genuinely useful for the SQL-writing parts of a DBA round (window functions, joins, query rewriting) but it trains a narrower skill than the job — writing a correct query silently in an editor is a different exercise from explaining, out loud, why you'd choose a differential backup over an incremental for a specific RPO target, or defending an indexing decision when the interviewer pushes back with "but doesn't that slow down writes on that table?"

The honest throughline: DBA interviews specifically grade **reasoning out loud under a changed scenario** — the interviewer telling you "now the lead DBA is on vacation, the runbook is six months stale, and the lead's deputy just told you the standby replica hasn't been tested in three months — what now?" is closer to the real job than any flashcard. That's the specific gap [Greenroom](/)'s spoken mock-interview format is built to close: you talk through a backup/recovery plan or a slow-query diagnosis out loud, the AI interviewer changes the scenario mid-answer the way a real DBA round does (a corrupted incremental, a replica that's further behind than you assumed), and you get feedback on the clarity of your reasoning, not just whether your final answer matched a script. It's not a replacement for actually knowing the material — you still need to understand transaction logs and execution plans cold — but it's the one prep method that rehearses the verbal, interrupted, "the scenario just changed" format the real interview actually is.

## Practise explaining the recovery plan, not just knowing it

You can recite every backup type and replication mode in this guide from memory and still freeze the moment an interviewer asks you to narrate the restore sequence out loud while changing a detail halfway through — that's the actual interview, not a formality wrapped around it. [Greenroom](/) runs spoken technical mock interviews for DBA and backend-adjacent roles, asks realistic follow-ups that change the scenario the way a real on-call incident would, and gives feedback on how clearly you reasoned through the decision, not just whether the final answer was technically correct. Pair it with our [DBMS interview questions guide](/blog/dbms-interview-questions) for the underlying database fundamentals, our [SQL interview questions guide](/blog/sql-interview-questions) for the query-writing half of most DBA loops, our [system design interview guide](/blog/system-design-interviews-what-they-test) for how database choices show up in larger architecture questions, and [coding interview communication tips](/blog/coding-interview-communication-tips) for general advice on narrating technical reasoning out loud under pressure.

## Frequently asked questions

### What questions are asked in a database administrator interview?

Database administrator interview questions cover backups and recovery (full, incremental and differential backups, point-in-time recovery, RPO and RTO, how to test backups), performance tuning (diagnosing slow queries with execution plans, indexing strategy, clustered versus non-clustered indexes, deadlocks), availability and scale (replication, sharding, partitioning, high availability and failover), and security (access control, encryption at rest and in transit, auditing) — almost always wrapped in a scenario the interviewer changes partway through to see if you understand the underlying mechanism rather than a memorized answer.

### What is the difference between full, incremental and differential backups?

A full backup copies the entire database. An incremental backup copies only the data changed since the last backup of any type, making it fast and small but requiring the entire chain of backups since the last full to restore correctly. A differential backup copies all data changed since the last full backup specifically, growing larger over the week but needing only the full backup plus the latest differential to restore — no chain to walk, less risk of one broken link breaking the whole restore.

### How does point-in-time recovery actually work?

Point-in-time recovery works because every database change is written to a transaction log (the write-ahead log in PostgreSQL, the redo log in MySQL, the transaction log in SQL Server) before it's applied to the data files. To restore to a specific moment, you restore the most recent backup before that moment, then replay the transaction log forward from there, stopping the replay at the exact target timestamp — which is why "restore to 2:47pm yesterday" is a real, specific operation and not just restoring the nearest backup.

### How do you fix a slow query as a database administrator?

Start by reading the query's execution plan with EXPLAIN or EXPLAIN ANALYZE to see whether it's doing a sequential table scan instead of using an index, and check whether the estimated row counts match the actual ones. Add or adjust an index that matches the query's filter and join columns, re-run the plan to confirm the scan type changed from a sequential scan to an index scan, and verify the actual execution time improved. The goal is reasoning from the plan's evidence, not guessing at which index might help.

### What's the difference between sharding and partitioning?

Partitioning splits a single table's data into segments within the same database instance — it helps query performance through partition pruning and makes maintenance like dropping old data fast, but one server still holds all the data. Sharding splits data across multiple separate database servers, each owning a distinct subset, and is what you reach for when a single machine's storage or throughput is the real constraint. Partitioning is simpler and should usually be tried first; sharding adds real complexity (cross-shard joins, distributed transactions, rebalancing) and is worth it only once partitioning genuinely can't scale further.

### How should I prepare for a DBA interview?

Make sure you can explain, with a worked example, the backup chain and restore sequence for your chosen schedule, the difference between RPO and RTO and how each shapes strategy, how to read an execution plan to diagnose a slow query, when an index helps versus hurts, the mechanics of replication lag and split-brain failover risk, and the access-control/encryption/auditing basics of database security. Then practise saying all of that out loud under a changed scenario — a corrupted backup, a replica further behind than expected — since DBA interviews are graded on live reasoning, not on a written answer with no time pressure.

DBA interviews test whether you can reason through a recovery plan out loud, under a scenario that changes mid-answer. Greenroom runs spoken technical mock interviews that follow up the way a real on-call incident would. Free to start.