---
title: Backend Developer Interview Questions (2026): APIs, Databases & System Design
description: Backend developer interview questions for 2026 — REST API design, databases and transactions, caching, auth, scaling, and a worked system design example, with real answers.
url: https://usegreenroom.app/blog/backend-developer-interview-questions
last_updated: 2026-06-20
---

← Back to blog

Technical

# Backend developer interview questions

June 20, 2026 · 28 min read

![Backend developer interview questions — cover from Greenroom, the AI mock interviewer](/assets/blog/backend-developer-interview-questions-hero.webp)

You're forty minutes into a backend round at a Series B startup, and it's going fine — better than fine, you nailed the REST design question — right up until the interviewer leans back and says, "okay, now your service gets 50x traffic in the next ten minutes, no warning. Walk me through what breaks first." Your mouth says "well, the database connection pool—" while your brain is frantically trying to remember if you've ever actually seen a connection pool exhaust in real life or if you just read about it once in a Hacker News comment at 1am. You did read about it at 1am. You are now explaining a thing you read about at 1am to a person whose job depends on knowing you're not just explaining a thing you read about at 1am.

This is the entire emotional arc of a backend developer interview in one scene, and it's also exactly why these rounds are structured the way they are. Backend interviews go deeper than "can you write an endpoint." They test whether you understand the layers *underneath* the API — how data is stored and queried, what happens under load, how systems fail, and the trade-offs behind every design choice — and they test it through follow-ups, specifically because follow-ups are the only reliable way to tell "read about it once" from "has actually debugged it." This guide covers the **backend developer interview questions** that actually get asked in 2026 — REST API design, databases and transactions, caching, auth, scaling, and a worked system design example — with a real answer for each one, not just a topic list.

## REST API design

### What makes an API actually RESTful, beyond "it uses HTTP"?

REST means treating data as **resources** addressed by URLs, manipulated through a small, consistent set of HTTP verbs, with each request carrying everything the server needs to handle it — no server-side session state between calls. In practice, interviewers are checking three things: do you name resources as nouns (`/users/42/orders`, not `/getUserOrders?id=42`), do you use the right verb for the right operation, and do you understand that statelessness is what lets you scale horizontally — any server in the pool can handle any request, because none of them are holding session state in memory. A surprising number of "RESTful" APIs in production aren't, and interviewers use this question to see if you'd notice. See our <a href="/blog/rest-api-interview-questions">full REST API guide</a> for more.

### How should you name and structure resource URLs?

Use plural nouns for collections (`/users`, not `/user`), nest resources to express ownership (`/users/42/orders/7`), and keep verbs out of the URL — the HTTP method already is the verb, so `/users/42/delete` is a smell; `DELETE /users/42` is correct. Filtering, sorting, and pagination belong in query parameters (`/orders?status=paid&sort=-created_at&page=2`), not new endpoints. The interview signal here is whether you've actually designed an API that other teams consumed, because these conventions only become obvious once someone else's frontend breaks from an inconsistent one.

### Explain HTTP status codes — which ones do you actually need to get right?

`200 OK` for a successful read, `201 Created` for a successful resource creation (often with a `Location` header pointing at the new resource), `204 No Content` for a successful action with no body (like a `DELETE`). On the client-error side, `400` means the request itself is malformed, `401` means "we don't know who you are" (missing/invalid auth), `403` means "we know who you are and you're not allowed," and `404` means the resource doesn't exist. `409 Conflict` covers things like a duplicate unique key or a version mismatch on update. The most common mistake candidates make: returning `200` with an `{ "error": ... }` body, which breaks every HTTP-aware client (caches, monitoring, retries) that relies on the status code to know what happened.

### What does idempotency mean, and which HTTP methods are idempotent?

An idempotent operation produces the same end state no matter how many times you repeat it with the same input. `GET`, `PUT`, and `DELETE` are idempotent by the HTTP spec — calling `DELETE /orders/7` five times in a row leaves the same end state as calling it once (the order is gone, errors on retries are fine). `POST` is *not* idempotent — calling it twice typically creates two resources, which is exactly the bug that happens when a client retries a flaky network request and double-charges a customer or double-submits a form. The fix in real systems is an **idempotency key**: the client generates a unique token per logical operation, sends it in a header, and the server stores it long enough to recognize a retry and return the original result instead of redoing the work. Payment APIs (Stripe, Razorpay) are built around this pattern specifically because retries are guaranteed to happen at scale.

### How do you version an API, and why does it matter?

You version because breaking changes are inevitable once other teams or external clients depend on your API, and you can't force everyone to update in lockstep. The three common approaches: URL versioning (`/v1/users`, `/v2/users` — simple, visible, but clutters the URL), header versioning (`Accept: application/vnd.myapi.v2+json` — cleaner URLs, harder to test by just pasting a link in a browser), and query-param versioning (rare, mostly seen in legacy systems). Most teams default to URL versioning for its simplicity and debuggability. The deeper point interviewers want: a version bump is a contract change, and you should be able to explain what counts as "breaking" (removing a field, changing a field's type) versus "safe" (adding an optional field) without bumping the version at all.

### Pagination — offset vs cursor-based, and when does each break?

**Offset-based** pagination (`?page=3&limit=20`, or `?offset=40&limit=20`) is simple to implement and lets users jump to an arbitrary page, but it breaks under concurrent writes: if a row is inserted or deleted while a user is paging through results, items can shift, causing duplicates or skipped rows on the next page — and on large tables, `OFFSET` gets progressively slower because the database still has to scan and discard all the skipped rows. **Cursor-based** pagination (`?after=eyJpZCI6NDJ9`, an opaque token usually encoding the last seen ID or timestamp) stays stable under concurrent writes and stays fast at any depth, because the query becomes `WHERE id > :cursor LIMIT 20` with an index doing the work — the trade-off is you lose the ability to jump to "page 7" directly. Interviewers ask this because it reveals whether you've actually paginated a table with real concurrent traffic, not just a static demo dataset.

### REST vs GraphQL vs gRPC — when would you actually choose each?

**REST** is the default for public APIs and most CRUD-shaped services — simple, cacheable by URL, universally understood. **GraphQL** earns its complexity when clients (especially mobile, with bandwidth constraints) need to fetch deeply nested, variably-shaped data in one round trip instead of chaining multiple REST calls or over-fetching fixed payloads — the cost is a harder caching story and the N+1 query problem showing up at the resolver layer if you're not careful. **gRPC** fits internal service-to-service communication where you control both ends, want strongly-typed contracts (via protobuf), and need lower latency/smaller payloads than JSON-over-HTTP — it's a poor fit for public-facing APIs because browsers can't call it directly without a proxy layer. The answer interviewers want is a decision framework tied to who's calling the API and what they need, not a claim that one is universally "better."

## Databases and transactions

### Explain ACID — what does each property actually guarantee?

**Atomicity**: a transaction's operations either all succeed or all roll back together — there's no partial state where some writes happened and others didn't. **Consistency**: a transaction can only move the database from one valid state to another, respecting constraints (foreign keys, unique indexes, check constraints) — it doesn't mean "the data is correct," it means "the rules you defined are enforced." **Isolation**: concurrent transactions don't see each other's uncommitted intermediate state — how strictly depends on the isolation level (below). **Durability**: once a transaction commits, it survives a crash — typically via a write-ahead log flushed to disk before the commit is acknowledged. Interviewers ask this because "I know what ACID stands for" and "I can explain why a money-transfer operation needs atomicity" are very different levels of understanding, and only the second one matters in practice.

### What are isolation levels, and what's a phantom read vs a dirty read?

Isolation levels trade correctness for concurrency performance, weakest to strongest: **Read Uncommitted** allows **dirty reads** — you can see another transaction's uncommitted changes, which it might later roll back, so you've now acted on data that never actually existed. **Read Committed** (Postgres's default) fixes dirty reads but allows **non-repeatable reads** — re-running the same query twice in one transaction can return different rows if another transaction committed in between. **Repeatable Read** fixes that but can still allow **phantom reads** — a range query can return a different *set* of rows on a second run if rows were inserted that match the filter. **Serializable** is the strongest — transactions behave as if run one at a time — at the cost of more lock contention and aborted transactions under high concurrency. The practical interview answer: most apps run fine on Read Committed; reach for Serializable only for operations where the cost of a subtle concurrency bug (double-booking a seat, double-spending a balance) outweighs the throughput cost.

### What's the N+1 query problem, and how do you fix it?

It happens when you fetch a list of N records, then loop over them issuing a separate query for each one's related data — one query becomes N+1 queries instead of 2. A classic case: fetching 50 blog posts, then looping to fetch each post's author with a separate `SELECT * FROM users WHERE id = ?` — 51 round trips to the database for data you could've gotten in one or two. The fix is to **eager-load** the relation up front, either with a `JOIN` or a single batched `WHERE id IN (...)` query for all the author IDs at once (most ORMs expose this as `.include()`, `.with()`, or `select_related()`). This bug is invisible in local development with 10 rows of seed data and devastating in production with 10,000 — which is exactly why interviewers ask it: it tests whether you think about query *count*, not just query correctness.

### What does an index actually do, and what does it cost you?

An index is a separate, sorted data structure (almost always a B-tree) that lets the database find rows matching a condition without scanning the whole table — turning an O(n) table scan into an O(log n) lookup for queries on the indexed column. The cost is on writes: every `INSERT`, `UPDATE`, or `DELETE` has to update every index on that table too, so a table with eight indexes is meaningfully slower to write to than one with two. This is why you index columns you frequently filter, join, or sort on (foreign keys, things in `WHERE` clauses) and avoid indexing columns you rarely query by or that change constantly with low query value. The follow-up interviewers like: "how would you find out if a query is actually using your index?" — `EXPLAIN ANALYZE` in Postgres/MySQL, and the answer should mention looking for a sequential scan where you expected an index scan.

### When do you denormalize, and what's the trade-off?

Normalization (splitting data into related tables to eliminate redundancy) is the right default — it keeps data consistent because each fact lives in exactly one place. You denormalize — duplicating some data across tables, or storing a precomputed aggregate — when read performance matters more than write simplicity and the duplicated data doesn't change often: a classic example is storing a `comment_count` column directly on a `posts` row instead of running `COUNT(*)` on the comments table every single page load. The trade-off is real: now you have two places that can disagree, and you need a reliable mechanism (a trigger, an application-level increment, or a periodic reconciliation job) to keep them in sync. Interviewers want to hear that you treat denormalization as a deliberate, measured trade-off — not a default, and not something to avoid on principle either.

### How do you scale a database past a single instance — replication vs sharding?

**Replication** keeps full copies of the same data on multiple machines — a primary handles writes, and one or more replicas serve reads, which scales read throughput and gives you failover if the primary dies (at the cost of replication lag: replicas are slightly behind, so a read right after a write might miss it — "read-your-own-writes" bugs come from here). **Sharding** splits the data itself across multiple machines by some key (user ID range, geographic region, a hash) so each shard holds only part of the dataset — this scales both reads *and* writes, but it's a much bigger architectural commitment: cross-shard queries and transactions become hard or impossible, and resharding later (because your shard key choice turned out wrong) is one of the most painful migrations in backend engineering. Replication is almost always the first lever; sharding is what you reach for once a single primary can't hold the write volume or dataset size at all.

![Backend interview topics — APIs, databases, caching, concurrency, design](/assets/blog/pool-structured-screen.webp)

Backend rounds test the layers under the API — data, caching, concurrency, trade-offs.

## Caching

### Cache-aside vs write-through — what's the actual difference?

**Cache-aside** (lazy loading): the application checks the cache first; on a miss, it reads from the database and writes the result into the cache for next time. This is the simplest and most common pattern — the cache only ever holds data that's actually been requested, but the *first* request for any key is always a slow "cache miss" that hits the database. **Write-through**: every write goes through the cache, which immediately writes to the database too, so the cache is always warm and consistent with the database for anything that's been written — at the cost of every write now paying the latency of two systems instead of one, and the cache holding data nobody may have asked for yet. Most web apps default to cache-aside for reads and handle writes with explicit invalidation, reaching for write-through only when read-after-write consistency matters a lot for a specific hot path.

### How do you handle a "thundering herd" when a popular cache key expires?

A thundering herd happens when a heavily-requested cache key expires and many concurrent requests all miss the cache at the same moment, all rush to the database to recompute the same value, and the database gets hit with a spike of identical, redundant queries it wasn't sized for — a real production incident pattern, not a theoretical one, especially on a popular product page or a viral piece of content. The standard fixes: a **lock or "single-flight" pattern**, where the first request to miss acquires a lock and recomputes while subsequent requests wait briefly for that result instead of all hitting the database independently; **probabilistic early expiration**, where you recompute a key slightly before it actually expires, staggered randomly across requests, so you never hit a hard cliff where every client misses at once; or simply **never letting popular keys expire at all**, refreshing them on a schedule in the background instead of relying on TTL-driven expiry. Interviewers ask this specifically because it's a bug that doesn't show up in a demo with ten users and absolutely will show up in production with ten thousand.

### What's the actual hard part of caching — invalidation?

Deciding what to cache is easy; knowing when a cached value has gone stale and needs to be thrown out is the hard part — this is the literal meaning behind the old line "there are only two hard problems in computer science: cache invalidation and naming things." The three common strategies: **TTL-based** (the value just expires after N seconds — simple, but you're trading some staleness for simplicity), **explicit invalidation** (the write path actively deletes or updates the cache key the moment the underlying data changes — more correct, but you must remember to do it on *every* code path that writes that data, which is exactly where bugs creep in), and **versioned/keyed invalidation** (bake a version or timestamp into the cache key itself, so old keys simply become unreachable and naturally expire from the cache via its eviction policy rather than needing an explicit delete). Interviewers ask this because it's the single most common source of "the UI shows stale data" bugs in production systems.

### When do you reach for Redis specifically, versus an application-level cache?

An in-process application cache (a dict, an LRU cache library) is fastest — no network hop — but it's per-instance: with multiple app servers, each one has its own copy, so invalidating a key means invalidating it everywhere, and a cold-started instance starts with an empty cache. **Redis** (or another shared cache) solves that by being a single source of truth all your app instances share — invalidate once, every instance sees it — at the cost of a network round trip per cache access. Redis also gives you data structures beyond simple key-value (sorted sets for leaderboards, lists for queues, pub/sub for invalidation broadcasts, atomic increments for rate limiters) that a plain in-process cache doesn't. The practical pattern in larger systems is actually both layers: a small in-process cache for extremely hot, rarely-changing data, backed by Redis as the shared layer behind it.

### CDN caching vs application caching — different problems?

A **CDN** caches static or semi-static responses (images, JS/CSS bundles, sometimes whole HTML pages or API responses) at edge locations physically close to the user, cutting latency and offloading traffic from your origin servers entirely — it's the right tool for anything that's the same for every user or cacheable by URL with standard `Cache-Control` headers. **Application/Redis caching** is for data that's dynamic, per-user, or computed from a database query — things a CDN has no way to know how to invalidate correctly. The two work together: a CDN in front absorbs traffic for anonymous, cacheable requests, while your application cache speeds up the personalized, authenticated requests that actually reach your servers.

## Auth

### Session-based vs token-based auth — what's actually different?

**Session-based**: after login, the server creates a session record (in memory, a database, or Redis) and gives the client an opaque session ID in a cookie; every request, the server looks up that ID to find out who's calling. This is simple and lets the server instantly revoke a session by deleting the record — but it means the server has to maintain state, which complicates horizontal scaling (every server needs access to the same session store) and adds a database/cache lookup to every request. **Token-based** (typically JWT): the server issues a signed token containing the user's identity and claims; the client sends it on every request, and the server verifies the signature without needing to look anything up — fully stateless, easy to scale horizontally. The cost: you can't easily revoke a single token before it expires (the server isn't tracking which tokens are "still valid," only whether the signature checks out), so token-based auth usually pairs short-lived access tokens with a revocable refresh token to bound the blast radius of a leaked token.

### What's actually inside a JWT, and what are the common pitfalls?

A JWT is three base64-encoded, dot-separated parts: a **header** (algorithm and token type), a **payload** (claims — user ID, roles, expiry, anything else you put in it), and a **signature** (proves the token wasn't tampered with, but does *not* encrypt the payload — anyone can base64-decode and read a JWT's contents, they just can't forge a valid one without the signing key). The pitfalls interviewers probe for: storing sensitive data in the payload (it's readable, not secret), accepting the `alg: none` header or letting the client dictate the algorithm (a real historical vulnerability class — always pin the expected algorithm server-side), not checking expiry, and — the most common real-world mistake — storing JWTs in `localStorage` where they're exposed to XSS, instead of an HttpOnly cookie. Knowing "JWTs are stateless and that's the whole point, but stateless means hard to revoke" is the one-sentence summary interviewers want to hear you arrive at.

### Explain the OAuth2 flow at a level you could defend under follow-ups.

OAuth2 lets a user grant a third-party app limited access to their data on another service, without ever handing that app their password. The flow most web apps use (Authorization Code flow): the app redirects the user to the provider (Google, GitHub) to log in and approve the requested scopes; the provider redirects back with a short-lived **authorization code**; the app's backend exchanges that code, plus its own client secret, for an **access token** (and usually a refresh token) by calling the provider's token endpoint directly, server-to-server — the code never touches the browser's URL bar as a usable credential for long, and the client secret never reaches the browser at all. The access token is what your app then uses to call the provider's API on the user's behalf. The interview follow-up to be ready for: "why is the code exchanged server-side instead of just returning the token directly to the browser?" — because a token in a redirect URL can leak via browser history, referrer headers, or logs, while the code is single-use and short-lived, and the secret needed to redeem it never leaves your server.

## Scaling

### Horizontal vs vertical scaling — what actually breaks each?

**Vertical scaling** means a bigger machine — more CPU, RAM, faster disk. It's simple (no architecture changes) but has a hard ceiling (there's a biggest machine you can buy) and a single point of failure (one machine, one outage). **Horizontal scaling** means more machines sharing the load — no practical ceiling, and a machine dying doesn't take the whole system down — but it requires your application to actually support running multiple instances: no in-memory session state tied to one server, a shared cache/database instead of local state, and a load balancer in front to distribute traffic. Most real systems vertically scale the database (because horizontal database scaling is hard — see sharding above) while horizontally scaling the stateless application tier, which is exactly why "statelessness" keeps coming up across REST, auth, and scaling questions — it's the property that makes horizontal scaling possible at all.

### What does a load balancer actually decide, and what can go wrong?

A load balancer sits in front of multiple application instances and distributes incoming requests across them — by round robin, least-connections, or a hash of some request attribute. The failure mode interviewers want you to know: if your app keeps any per-server state (an in-memory cache, an in-memory session, a WebSocket connection tied to one specific instance), and the load balancer doesn't route a given user consistently to the same server ("sticky sessions"), that state effectively becomes invisible half the time. Health checks matter too — a load balancer needs to detect and stop routing to an unhealthy instance quickly, or users keep hitting a server that's failing every request.

### When do you introduce a message queue, and what does it actually solve?

A queue (RabbitMQ, SQS, Kafka) decouples producing work from doing it: instead of a request handler doing slow work synchronously (sending an email, processing an image, generating a report) and making the user wait for all of it, the handler just enqueues a message and returns immediately, while a separate worker process consumes the queue and does the actual work. This solves three things at once: it makes the user-facing request fast, it smooths out traffic spikes (the queue absorbs a burst that would otherwise overload the database or a downstream API), and it adds resilience — if a worker crashes mid-task, the message can be retried instead of the work being silently lost. The trade-off interviewers want named: you've now introduced eventual consistency (the work isn't done yet when the request returns) and a new failure mode to design for — duplicate delivery — which is why most queue consumers are written to be idempotent.

### Read replicas vs sharding, revisited — how do you actually decide which one you need?

Reach for **read replicas** first, and almost always — they're a smaller architectural change, and most real systems are read-heavy, so replicating reads buys you a lot of headroom cheaply. Reach for **sharding** only when writes themselves are the bottleneck, or the dataset itself is too large for any single machine to hold — because sharding means giving up easy cross-entity transactions and joins, and picking a shard key you're stuck with unless you're willing to do a painful re-shard later. The interview signal: candidates who jump straight to "I'd shard it" for a problem that read replicas would have solved are over-engineering; the strongest answer walks through *why* replication isn't enough before reaching for sharding.

## Observability and reliability

### What's the difference between logging, metrics, and tracing — and why do you need all three?

**Logging** records discrete events with detail — "user 42 failed login, invalid password, at 14:32:01" — invaluable for understanding exactly what happened in one specific case, but expensive to query at scale and easy to drown in noise if you log everything at the same verbosity. **Metrics** are aggregated numbers over time — request rate, error rate, p99 latency — cheap to store and graph, great for dashboards and alerting, but they tell you *that* something's wrong, not *why*. **Tracing** follows a single request across every service it touches, showing exactly where time was spent and where it failed in a distributed system — essential once a request spans more than one service, because logs and metrics alone can't tell you which of five downstream calls in a request actually caused the slow response. The interview-grade answer ties them together: metrics tell you something's wrong, tracing tells you where, logs tell you the exact detail of why, once you've narrowed down to the specific request or service.

### How do you debug a backend service that's "slow" with no other symptoms?

Resist the urge to guess and start changing code — the actual first step is measuring, not fixing. Check whether it's CPU-bound, I/O-bound, or waiting on a downstream dependency: a profiler or APM tool (Datadog, New Relic, or open-source equivalents) will usually show this in minutes rather than hours of guessing. A classic real pattern: the app server's CPU and memory both look fine, but request latency is high — that's almost always a downstream dependency (a slow query missing an index, a third-party API having a bad day, a lock contention issue) rather than the app code itself. The interview signal: candidates who jump straight to "I'd add more servers" without first identifying *what's* actually slow are treating a measurement problem as a scaling problem, and those are usually unrelated fixes.

### What does graceful degradation mean, and where do you apply it?

Graceful degradation means a system keeps functioning, in a reduced but still useful way, when a dependency fails — instead of the whole request failing because one non-critical piece broke. A concrete example: an e-commerce product page that shows a "recommended products" section powered by a separate recommendation service — if that service times out or errors, the page should still render the product details and just omit recommendations, rather than the entire page throwing a 500. Implementing this usually means wrapping non-critical calls in a timeout and a fallback (cached or default data, or simply omitting the section) rather than letting one slow dependency take the whole response down with it — a pattern sometimes formalized with a circuit breaker that stops calling a failing dependency for a cooldown period instead of repeatedly waiting on a timeout that's going to fail anyway.

## A worked system design example: rate limiter

Mid and senior backend rounds almost always include a design question — a URL shortener, a rate limiter, a notification service, an order system. Drive it the same way every time: clarify requirements, then work through data model, algorithm/logic, storage, and scaling, narrating trade-offs out loud rather than jumping straight to an answer. Here's the skeleton applied to "design a rate limiter":

1. **Clarify scope.** Per-user or per-IP? What's the limit (100 requests/minute)? Is slight inaccuracy under extreme load acceptable, or does it need to be exact?
2. **Pick an algorithm.** Fixed window (simple counter per time window, but allows bursting at window edges — 100 requests at 0:59 and another 100 at 1:00 is 200 in two seconds), sliding window (more accurate, more bookkeeping), or token bucket (a bucket refills at a steady rate and each request consumes a token — naturally smooths bursts and is the most common real-world choice). Name the trade-off, don't just pick one silently.
3. **Storage.** Redis is the standard choice — atomic `INCR` with a `TTL` implements a basic fixed-window limiter in two commands, and Redis's single-threaded execution model avoids the race condition you'd hit doing read-then-write in application code across multiple servers.
4. **Where it runs.** At the API gateway/edge (rejects abusive traffic before it reaches your application at all — cheaper) versus inside the application (more context-aware, e.g. different limits per endpoint or per subscription tier). Real systems often do both — a coarse edge limit, a finer-grained application limit.
5. **What happens when the limit is hit.** Return `429 Too Many Requests` with a `Retry-After` header so well-behaved clients know when to retry, rather than a generic error they'll immediately retry into another rejection.
6. **Scaling the limiter itself.** If Redis is shared across many app instances, it becomes a single point of contention at extreme scale — mention that as a known limitation rather than pretending the simple design has no ceiling.

The same skeleton — requirements, core algorithm/data model, storage choice, where logic lives, failure behavior, scaling limits — applies just as well to a URL shortener (ID generation strategy, redirect lookup, custom-alias collisions) or a notification service (fan-out strategy, retry/dedup, delivery guarantees). Our <a href="/blog/system-design-interview-guide-india">system design guide</a> covers the full framework in more depth.

<div class="verdict"><strong>The core truth:</strong> Backend interviews reward engineers who reason about <em>trade-offs out loud</em> — "I'd add a cache here, but that introduces staleness, so I'd invalidate on write for this specific path." There's rarely one right answer; there's the answer you can justify under follow-up questions.</div>

## How people actually prepare, and why most of it stops short

If you Google "backend developer interview questions" right now, you'll land on a handful of GeeksforGeeks-style pages, a LeetCode discuss thread, and maybe a Medium post titled "50 Backend Questions That Got Me Hired at FAANG" written by someone who interviewed once, three years ago. None of these are *wrong*, exactly — the questions repeat for a reason, they're the questions that actually come up — but reading "what's the N+1 query problem" and being able to recognize the correct answer when it's printed in front of you is a completely different skill from generating that explanation from scratch, out loud, while someone who runs a production database for a living asks "okay, but how would you actually detect that in our system?"

**LeetCode** is genuinely useful for the coding-round half of a backend interview, but it trains a narrower skill than the design-and-trade-off half this guide is mostly about — solving an isolated algorithm problem with a known input/output doesn't rehearse explaining why you picked Read Committed over Serializable for a specific feature, or defending a caching strategy against "what happens when two requests race." **A friend's WhatsApp-forwarded PDF** of "questions Razorpay asked me" is a reasonable signal of *that specific interviewer's* style, but treating it as gospel for your own interview (different company, different interviewer, different follow-ups) is how people walk in over-prepared for the wrong thing. **Asking ChatGPT** to explain ACID or write a rate limiter gets you a clean, readable answer in five seconds — and that's exactly the problem: reading a clean answer and producing one live, under time pressure, while fielding "what if traffic 10x's" is not the same skill, and the second one is the only one a real interview actually grades.

None of this means skip these resources — read the question lists once to know what's coming, use LeetCode for the algorithmic warm-up, ask ChatGPT to explain a concept you're shaky on. It means recognize what they don't do: they don't make you say it out loud, defend a choice you didn't fully think through, or recover gracefully when an interviewer disagrees with your first answer. That's the actual skill gap, and it's the one Greenroom's spoken mock interviews are built to close — not by giving you better answers to memorize, but by putting you in the uncomfortable, useful position of explaining your reasoning live and getting a follow-up you didn't expect, the same way a real backend round will.

## How to prepare

Backend rounds are conversational — the interviewer keeps asking "why" and "what if traffic 10x's?" You can't prepare for that by reading; you prepare by *defending your reasoning out loud* against follow-ups, the same way you'd defend a design in a real engineering discussion. Greenroom runs spoken technical interviews that probe your design and trade-off thinking and give feedback on how clearly you reason, not just whether your final answer matched a rubric. Pair it with our <a href="/blog/sql-interview-questions">SQL</a> and <a href="/blog/system-design-interviews-what-they-test">system design</a> guides.

## Frequently asked questions

### What topics do backend developer interviews cover in 2026?

Backend interviews cover REST API design (resource naming, status codes, idempotency, versioning, pagination), databases and transactions (ACID, isolation levels, indexing, the N+1 query problem, when to denormalize), caching (cache-aside vs write-through, invalidation, Redis, CDN vs application cache), auth (session vs token-based auth, JWT structure, OAuth2), scaling (horizontal vs vertical, load balancing, message queues, read replicas and sharding), and a system design round for mid and senior roles.

### What database questions are asked in backend interviews?

Common database questions include the ACID properties and what each one guarantees, isolation levels and the difference between dirty reads and phantom reads, the N+1 query problem and how to fix it with eager loading, what indexing does and what it costs on writes, when to denormalize data for read performance, and how to scale a database through read replicas versus sharding. Interviewers want you to reason about when each choice is appropriate, not just recite definitions.

### What's the difference between cache-aside and write-through caching?

Cache-aside checks the cache first and only writes to it after a cache miss pulls fresh data from the database, so the cache only ever holds data that's actually been requested. Write-through writes to the cache and database together on every write, keeping the cache always warm and consistent at the cost of slower writes. Most applications default to cache-aside for reads and handle writes with explicit invalidation.

### Session-based or token-based auth — which should I know for interviews?

Both, and specifically the trade-off between them. Session-based auth keeps state on the server, which makes revocation instant but complicates horizontal scaling since every server needs access to the same session store. Token-based auth (typically JWT) is stateless and scales easily, but a token can't be revoked before it expires without extra infrastructure, which is why short-lived access tokens paired with a revocable refresh token are the common real-world pattern.

### How do backend interviews test system design?

Mid and senior backend rounds include a design question like a URL shortener, rate limiter, notification service, or order system. Interviewers look for clear requirements gathering, a core algorithm or data model, a storage choice with reasoning, where logic should live, explicit failure behavior, and an honest discussion of scaling limits. There's rarely one right answer — they score the reasoning you can defend under follow-up questions.

### How should I prepare for a backend developer interview?

Study REST API design, databases and transactions, caching, auth, scaling, and system design, but the key is practising defending your reasoning out loud, because backend rounds are conversational and interviewers keep asking "why" and "what if traffic 10x's?" A voice-based mock interview that probes your trade-off thinking and gives feedback prepares you for that back-and-forth far better than silent reading.

### Are LeetCode and question dumps enough to prepare for a backend interview?

They help with part of it. LeetCode trains algorithmic problem-solving for the coding-round half of a backend interview, and question dumps build recognition of common topics like ACID and the N+1 query problem. Neither rehearses the conversational, trade-off-defending half of the round — explaining out loud why you chose Read Committed over Serializable for a specific feature, or holding up when an interviewer pushes back on your caching strategy — which is the half most backend rounds actually weight the most heavily.

### What's the biggest gap between reading interview answers and passing a real backend interview?

The gap is verbal defense under follow-up questions. Reading a clear explanation of cache-aside versus write-through, or watching someone else solve a rate limiter, builds recognition — you can tell a correct answer is correct. Producing that same explanation from scratch, live, while an interviewer asks "what if two requests race right there," is a different and harder skill, and it's the one backend interviews are actually designed to test, since production systems fail in exactly those edge cases.

Backend rounds reward trade-off reasoning you can defend out loud. Greenroom runs spoken technical interviews that probe your thinking and give feedback. Free to start.