---
title: AWS Interview Questions & Answers (2026): EC2, S3, IAM & VPC
description: The AWS interview questions that get asked in 2026 — EC2, S3, Lambda, IAM, VPC, storage classes and architecture scenarios — with real answers for each.
url: https://usegreenroom.app/blog/aws-interview-questions
last_updated: 2026-06-20
---

← Back to blog

Technical

# AWS interview questions and answers

June 20, 2026 · 35 min read

![AWS interview questions and answers — cover from Greenroom, the AI mock interviewer](/assets/blog/aws-interview-questions-hero.webp)

Twelve minutes into a system design round at a Bangalore fintech, a candidate is asked to explain how S3 actually works. He leans back, the confidence of a man who has read exactly one Medium article on the topic, and says: "S3 is basically like Dropbox." The interviewer's pen stops moving. There's a pause — the kind of pause that has its own weather system. "Okay," she says slowly, "and what happens to your Dropbox folder if you upload the same file from two different laptops at the same time?" He does not have an answer for this. He has, in fact, never thought about this, because nobody thinks about this when they're using Dropbox to send their cousin wedding photos.

This is the recurring genre of AWS interview disaster: someone has used AWS — clicked some buttons in the console, deployed a side project, maybe even kept a t2.micro alive for eight months without remembering to turn it off — and mistakes "I have used this" for "I understand this." Another version of the same disaster: a candidate is asked to design a scalable web app on AWS and responds by reciting service names like alphabet soup — "so we'd use EC2, and S3, and also RDS, and Lambda, and CloudFront, and IAM for security, and—" at which point the interviewer, mercifully, interrupts to ask "okay, but how do these talk to each other, and what happens when traffic spikes 10x?" Silence. The alphabet soup had no broth.

And then there's the live-demo version, which is somehow worse because it has consequences in real time: a candidate pulls up their portfolio project running on a single EC2 instance to show off a feature, the interviewer (out of pure professional curiosity, not malice) opens dev tools and fires twenty rapid requests at the API just to see what happens, and the demo visibly chokes — because there was never an Auto Scaling group, never a load balancer, just one server quietly doing its best. The candidate's face in that moment deserves a museum exhibit.

None of these candidates were dumb. They'd just prepared for AWS interviews the way most people prepare for AWS interviews — by memorizing what each service *is*, not by being able to explain, defend, and adapt an *architecture* under live questioning. That gap is exactly what this guide is for. AWS is the dominant cloud, and its interviews test the core services (compute, storage, networking, identity) and how you architect with them for scale, reliability, and security — essential for cloud, DevOps, and backend roles. This guide covers the **AWS interview questions** that actually get asked — organized by area, with a real answer for each one and a note on what it's actually testing — plus a full worked system-design scenario, a section comparing how people typically (and inadequately) prep for this stuff, and an FAQ. (See also our [DevOps](/blog/devops-engineer-interview-questions) and [cloud engineer](/blog/cloud-engineer-interview-questions) guides.)

## Core compute services

### What is EC2, and how do you choose an instance type?

EC2 (Elastic Compute Cloud) gives you virtual servers — you pick an instance type, an OS image (AMI), and pay for compute by the second while it runs. Instance types are grouped by what they optimize for: general purpose (`t`/`m` families) balance compute, memory, and network for typical web apps; compute-optimized (`c` family) suit CPU-bound workloads like batch processing or gaming servers; memory-optimized (`r`/`x` family) suit in-memory caches and large databases; storage-optimized (`i`/`d` family) suit workloads needing high-IOPS local disk, like data warehouses. The interview signal isn't memorizing every family — it's being able to say "this workload is CPU-bound, so I'd reach for `c`-family over `m`-family" and explain why.

A detail that separates a memorized answer from a lived-in one: `t`-family instances (`t3`, `t4g`) are "burstable" — they accumulate CPU credits while idle and spend them during spikes, which is great for a dev box or a low-traffic app but a genuine trap for a sustained, CPU-heavy production workload, because once your credit balance hits zero, performance cliffs hard. Candidates who've actually been burned by a `t3.micro` running out of CPU credits during a traffic spike tend to bring this up unprompted — and interviewers notice.

### When would you choose Lambda over EC2?

Lambda runs your code in response to an event — an API call, a file landing in S3, a queue message — without you provisioning or managing any server, and you pay only for the milliseconds it actually executes. Reach for Lambda when work is short-lived, event-driven, and spiky (a thumbnail generator that fires on S3 upload, a webhook handler, a cron-style scheduled job) because you get automatic scaling to zero and to thousands of concurrent invocations with no capacity planning. Reach for EC2 instead when you need long-running processes, full control over the OS and runtime, predictable steady-state load that makes paying for idle capacity cheaper than per-invocation pricing, or workloads that exceed Lambda's execution time limit (15 minutes) and memory ceiling (10GB). The honest interview answer acknowledges the tradeoff: Lambda removes ops burden but adds cold-start latency and a ceiling on execution time; EC2 gives full control but you own patching, scaling, and capacity planning.

### What actually causes a Lambda cold start, and how do you reduce one?

This is the natural senior follow-up to "when would you choose Lambda," and it's where a lot of candidates who've only read about serverless (rather than operated it) start improvising. A cold start happens when AWS has to provision a fresh execution environment for your function — download the code package, start the runtime, run any module-level initialization — before it can run your handler, because no warm environment was sitting around idle. Cold starts get worse with larger deployment packages, VPC-attached functions (which historically had to provision an ENI, though AWS has substantially improved this), and runtimes with heavier startup costs (a JVM-based Java function cold-starts noticeably slower than a Python or Node function).

The practical levers, in rough order of effectiveness: keep deployment packages small and dependencies trimmed (don't bundle a 200MB package when your handler uses a fraction of it); move expensive setup (DB connections, SDK clients) to module-level code outside the handler so it only runs once per warm container, not once per invocation; use **Provisioned Concurrency** to keep a set number of execution environments pre-initialized and ready, which directly trades cost for guaranteed low latency — the right call for a latency-sensitive API path, overkill for an internal batch job that tolerates an occasional slow invocation; and pick a lighter runtime if cold-start latency is genuinely on your critical path. A senior-grade answer also flags that cold starts are a *per-execution-environment* problem, not a per-request one — once a container is warm, every subsequent invocation it serves is fast, which is why cold starts hurt low-traffic or spiky functions far more than high-throughput ones that keep environments naturally warm.

### ECS vs Fargate vs EKS — how do you decide between them for containers?

All three run containers, but at different levels of abstraction. **ECS** is AWS's own container orchestrator — simpler than Kubernetes, tightly integrated with other AWS services, and a good default if you don't need Kubernetes-specific tooling or portability across clouds. **Fargate** is a serverless compute layer that can sit under ECS (or EKS) — you stop managing EC2 instances entirely; AWS runs your containers on infrastructure it manages, and you pay per task based on CPU/memory reserved. **EKS** is managed Kubernetes — choose it when your team already knows Kubernetes, you need its ecosystem (Helm charts, operators, multi-cloud portability), or you're migrating an existing Kubernetes workload into AWS. The decision framework interviewers want: ECS+Fargate for "we just want containers running with minimal ops," EKS for "we need Kubernetes specifically," and EC2-backed ECS/EKS only when you need to control the underlying instances (e.g., GPU workloads, specific instance types, or cost optimization via reserved/spot capacity).

### How do EC2, Lambda, ECS/Fargate, and EKS fit together as a decision framework?

A clean way to answer this in an interview is to walk the spectrum of control vs. operational burden. EC2 gives maximum control (you manage the OS, scaling, patching) for the most ops work. Lambda gives minimum control (AWS manages everything below your function code) for the least ops work, but only fits workloads under its time/memory limits. ECS and EKS sit in between — you define how containers run, but the orchestrator handles placement and restarts; layering Fargate underneath either removes the EC2-management piece entirely. A senior answer also flags that real systems mix these: a typical web product might run its API on ECS/Fargate, a background image-processing job on Lambda, and a single legacy stateful service on EC2 — the question is never "which one wins," it's "which one fits each workload."

## Storage

### S3 storage classes, and how do you pick one?

S3 stores objects (files) and exposes them over HTTP with 99.999999999% (11 nines) durability — your data is replicated across multiple facilities within a region, so durability is rarely the deciding factor; *access pattern and cost* are. **S3 Standard** suits frequently accessed data. **S3 Standard-IA** (Infrequent Access) and **One Zone-IA** cost less per GB but charge a retrieval fee — good for backups you rarely read. **S3 Glacier** (Instant Retrieval, Flexible Retrieval, Deep Archive) trades retrieval speed (minutes to hours) for very low storage cost — right for compliance archives you almost never touch. **S3 Intelligent-Tiering** automatically moves objects between tiers based on observed access patterns, which is the pragmatic default when you don't want to manually classify every object. The interview signal: don't just name the tiers — explain that the real decision is "how often is this read, and how fast does a read need to come back," because that's what actually drives cost.

A worked example interviewers like: imagine a SaaS product storing user-uploaded PDFs that are read constantly in the first 30 days (onboarding documents getting reviewed) and almost never after. Naively, everything sits in S3 Standard forever, paying full price indefinitely. A **lifecycle policy** can automatically transition objects to Standard-IA after 30 days, then to Glacier Deep Archive after a year, without anyone manually moving files — and being able to describe that lifecycle transition out loud, with rough day counts, reads as someone who has actually managed a storage budget, not someone reciting a tier list.

### EBS vs EFS vs S3 — block, file, and object storage.

These solve different problems and interviewers want you to map use case to storage type, not just definitions. **EBS** is block storage — a virtual hard disk that attaches to exactly one EC2 instance at a time, giving low-latency, high-IOPS access; use it for a database's data volume or a boot volume. **EFS** is a fully managed NFS file system that multiple EC2 instances can mount concurrently — use it when several instances need shared, simultaneous read/write access to the same files (shared application config, a content management system's upload directory). **S3** is object storage accessed over HTTP, not mounted as a filesystem — use it for static assets, backups, data lake storage, or anything accessed via API rather than a file path. A fast way to phrase the distinction out loud: "EBS is a disk for one server, EFS is a shared drive for many servers, S3 is an API for storing files at any scale."

### How does S3 achieve high durability and availability, and what's the tradeoff?

S3 Standard replicates each object synchronously across a minimum of three Availability Zones within a region before confirming the write — that's what gets you 11 nines of durability and 99.99% availability. The tradeoff most candidates miss: durability and availability are different guarantees. Durability means your data won't be *lost*; availability means it's *reachable right now*. A storage class like One Zone-IA sacrifices availability (and disaster resilience, since it lives in a single AZ) to cut cost, while still being durable within that zone. Being able to articulate that distinction — rather than treating "highly durable" and "highly available" as the same claim — is a strong signal in a storage-focused round.

## Databases

### RDS vs DynamoDB — how do you choose?

**RDS** is managed relational database (Postgres, MySQL, MariaDB, SQL Server, Oracle) — choose it when your data has clear relational structure, you need complex joins and transactions (ACID), and your access patterns aren't fully known upfront. AWS handles patching, backups, and failover, but you still design a schema and write SQL. **DynamoDB** is a managed NoSQL key-value/document store built for single-digit-millisecond latency at any scale — choose it when access patterns are known and simple (fetch by key), you need to scale to very high throughput without managing read replicas yourself, or your data doesn't fit a relational model well. The classic interview follow-up is "what's the catch with DynamoDB" — the answer is that you must design your table around your access patterns *up front* (partition key, sort key, secondary indexes), because ad-hoc queries and joins that are trivial in SQL are awkward or impossible after the fact in DynamoDB.

### What's the difference between a DynamoDB partition key and a sort key, and why does hot partitioning matter?

The partition key determines which physical partition an item lives on — DynamoDB hashes it to distribute items across storage nodes. The sort key (optional) orders items that share the same partition key, enabling range queries within a partition (e.g., all orders for `customer_id=123`, sorted by `order_date`). "Hot partitioning" happens when too many requests hit the same partition key value — a celebrity user, a popular product ID — because that traffic can't be spread across nodes the way DynamoDB is designed to scale, causing throttling even though your overall table has capacity to spare. The fix interviewers want to hear: design partition keys with high cardinality and even access distribution, and consider write-sharding (appending a random suffix) for known hot keys.

### What is a Global Secondary Index in DynamoDB, and when do you need one?

A base DynamoDB table only supports efficient lookups by its partition key (and sort key, if defined) — which is fine until product requirements introduce a second access pattern you didn't design for. A **Global Secondary Index (GSI)** lets you query the same data by a different partition/sort key combination, maintained as an asynchronously-updated copy of the relevant attributes. The classic example: a table keyed by `user_id` works great for "fetch this user's orders," but if a support team needs to look up orders by `order_status` across all users, you add a GSI with `order_status` as the partition key. The nuance interviewers probe for: GSIs have their own provisioned throughput and can throttle independently of the base table, and because they update asynchronously, a GSI read can briefly lag behind the most recent write — eventual consistency, not strong consistency, which matters if your support tool needs to reflect a write that happened a second ago.

### How does RDS Multi-AZ differ from a read replica, and when do you need each?

Candidates frequently conflate these, and the distinction is a reliable interview tell. **Multi-AZ** maintains a synchronously replicated standby in a different Availability Zone purely for failover — you never query the standby directly; if the primary fails, AWS automatically promotes the standby and updates the DNS endpoint, typically within a minute or two, and it exists purely for availability, not for offloading read traffic. A **read replica** is an asynchronously replicated copy you *can* query directly, specifically to offload read-heavy traffic from the primary (a reporting dashboard, a search feature) — but because replication is asynchronous, a read replica can lag the primary by anywhere from milliseconds to seconds under load, so it's the wrong place to read data immediately after writing it if your application logic depends on reading its own write. Production systems commonly run both at once: Multi-AZ for failover protection, plus one or more read replicas (which can themselves be Multi-AZ) for read scaling — they solve different problems and one doesn't substitute for the other.

## IAM and security

### What is IAM, and what's the difference between users, roles, and policies?

IAM (Identity and Access Management) controls who — or what — can do what in your AWS account. A **user** is a persistent identity, usually mapped to a real person or a long-lived application credential, with attached access keys. A **role** is a temporary identity that anything — a user, an EC2 instance, a Lambda function, another AWS account — can *assume* to get short-lived permissions without storing long-lived credentials. A **policy** is a JSON document attached to a user, group, or role that explicitly grants or denies specific actions on specific resources. The principle that ties them together, and the single most common interview talking point, is **least privilege**: grant only the permissions an identity actually needs, scoped as narrowly as possible, and prefer roles over long-lived user access keys wherever the workload allows it (e.g., an EC2 instance should assume an instance role, not store an IAM user's static keys).

A genuinely useful interview move, if asked to "show" an IAM policy rather than just describe one, is to be able to sketch the JSON shape from memory. A least-privilege policy granting a deploy role permission to upload to exactly one S3 bucket, and nothing else, looks roughly like this:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject"],
      "Resource": "arn:aws:s3:::my-app-deploys/*"
    },
    {
      "Effect": "Deny",
      "Action": "s3:DeleteObject",
      "Resource": "arn:aws:s3:::my-app-deploys/*"
    }
  ]
}
```

Walking through this out loud — "the `Resource` is scoped to one bucket's objects, not `*`; the actions are the minimum two needed for a deploy step; and I've added an explicit deny on delete because a deploy role has no business removing objects" — demonstrates least privilege as a habit rather than a slogan, which is exactly the gap between candidates who've read about IAM and candidates who've written IAM policies under a security review.

### Security groups vs network ACLs — what's the actual difference?

Both filter traffic, but at different layers and with different statefulness. **Security groups** attach to individual resources (an EC2 instance, an RDS instance) and are *stateful* — if you allow inbound traffic on a port, the corresponding outbound response is automatically allowed, you don't need a matching outbound rule. **Network ACLs** attach to a subnet and apply to everything in it, and are *stateless* — you must explicitly allow both the inbound request and the outbound response, since ACLs don't track connection state. Security groups only support "allow" rules; NACLs support both "allow" and explicit "deny" rules, which is why NACLs are the right tool for blocking a specific IP range at the subnet level (a security group literally cannot deny anything — it can only fail to allow). The interview-grade summary: security groups are your default, fine-grained, per-resource firewall; NACLs are a coarser, subnet-wide backstop, useful mainly for explicit denies.

### How would you design IAM permissions for a small engineering team, applying least privilege?

A strong answer walks through structure, not just principle. Create IAM roles per function (deploy-role, read-only-analyst-role, on-call-emergency-role) rather than per person, and have humans assume roles via federated SSO rather than holding individual IAM users with standing permissions. Scope policies to specific resources and actions — a deploy role for a single service shouldn't have `s3:*` across the whole account, it should have `s3:PutObject` on the one bucket it deploys to. Use IAM groups to manage permission sets at scale instead of attaching policies to each user individually. Require MFA for any human access, especially anything touching production, and rotate or eliminate long-lived access keys in favor of temporary credentials via roles. Finally, use a permissions boundary or AWS Organizations SCPs (Service Control Policies) to cap what even an admin-level role can do in sensitive accounts — defense in depth, not just one well-written policy.

## Networking and VPC

### What is a VPC, and what's the difference between a public and private subnet?

A VPC (Virtual Private Cloud) is your own logically isolated network within AWS — you control its IP range, subnets, route tables, and gateways. A **public subnet** has a route table entry pointing `0.0.0.0/0` to an **internet gateway**, so resources in it can reach (and be reached from) the public internet directly — typically where you'd put a load balancer or a bastion host. A **private subnet** has no route to an internet gateway, so resources there (an application server, a database) aren't directly internet-reachable, which is exactly the point: you don't want your database exposed to the open internet. The standard pattern interviewers expect you to describe: a load balancer sits in public subnets, application servers and databases sit in private subnets behind it, and the load balancer is the only thing actually exposed.

### What is a NAT gateway, and why does a private subnet still need internet access?

Resources in a private subnet have no route *in* from the internet, but they often need a route *out* — to download OS patches, call a third-party API, or reach an AWS service over the public endpoint. A **NAT gateway**, placed in a public subnet, lets private-subnet resources initiate outbound connections to the internet while remaining unreachable from it: the NAT gateway has a public IP and performs the translation, but nothing on the internet can open a connection *into* your private resources through it. This is the detail that separates a memorized definition from real understanding — an internet gateway is two-way (in and out), a NAT gateway is intentionally one-way (out only).

A practical follow-up worth pre-empting: NAT gateways are billed per hour *and* per GB processed, which surprises teams the first time a bill arrives after a private-subnet fleet does something data-heavy (like every instance independently pulling large container images through the NAT gateway instead of from a VPC endpoint). The fix — and a good thing to volunteer in an interview — is **VPC endpoints** (gateway endpoints for S3/DynamoDB, interface endpoints for most other services), which let private-subnet resources reach AWS services directly over AWS's internal network without traversing a NAT gateway at all, cutting both cost and latency for traffic that's AWS-to-AWS anyway.

### What is VPC peering, and when would you use it instead of Transit Gateway?

VPC peering connects two VPCs directly so resources in each can talk to each other using private IPs, as if they were on the same network — it's simple and has no extra cost beyond data transfer, but it doesn't transit through a third VPC (no "hub and spoke") and doesn't scale cleanly past a handful of VPCs because you need a dedicated peering connection between every pair. **Transit Gateway** acts as a central hub that many VPCs (and on-premises networks via VPN/Direct Connect) connect to once, simplifying routing as the number of VPCs grows — the right choice once you're past a small, static set of VPCs, especially across many accounts in an AWS Organization. The interview-grade answer: peering for a small, simple, mesh of VPCs; Transit Gateway once that mesh would otherwise require dozens of point-to-point connections.

![AWS interview topics — EC2, S3, IAM, VPC, Lambda, load balancing](/assets/blog/pool-system-design.webp)

AWS rounds test core services and how you architect for scale and reliability.

## Scaling and reliability

### Application Load Balancer vs Network Load Balancer — when do you use each?

An **ALB** (Application Load Balancer) operates at layer 7 (HTTP/HTTPS) — it can route based on URL path or hostname, terminate TLS, and inspect request content, which makes it the right choice for typical web applications and microservices that need content-aware routing. An **NLB** (Network Load Balancer) operates at layer 4 (TCP/UDP) — it doesn't inspect content, but it handles extreme throughput and sudden traffic spikes with ultra-low latency and preserves the client's source IP, which matters for protocols where layer-7 inspection isn't needed or possible (raw TCP services, gaming backends, anything requiring a static IP per AZ). The quick framing: ALB when you need to route on what's *in* the request; NLB when you need raw speed and a static IP and don't need to look inside the packet.

### Regions vs Availability Zones — how do you design for high availability?

A **region** is a geographic area (e.g., `us-east-1`) containing multiple **Availability Zones** — physically separate data centers with independent power, cooling, and networking, but connected by low-latency links within the region. High availability within a region means deploying across at least two AZs, so a single data center failure doesn't take your application down — an ALB distributing traffic across instances in multiple AZs, an RDS Multi-AZ deployment with a synchronously replicated standby, an Auto Scaling group spanning AZs. High availability *across* regions (multi-region) is a much bigger step up in cost and complexity — it protects against a regional outage, but requires solving cross-region data replication and failover, and most companies don't need it until they're at a scale where regional outages are a real, costed risk. The interview signal: know that multi-AZ is the default expectation for "highly available," and multi-region is a deliberate, expensive escalation you justify with a specific business requirement, not a default.

### What should you actually monitor with CloudWatch, and how does it tie into Auto Scaling?

A surprising number of candidates can define CloudWatch ("it's AWS's monitoring service") but go blank when asked what they'd actually alarm on for a production web app. A reasonable, specific answer: at the infrastructure layer, CPU utilization, memory (via the CloudWatch agent, since EC2 doesn't expose memory natively), and disk space on hosts; at the load balancer layer, target response time, HTTP 5xx count, and unhealthy host count; at the application layer, custom metrics for request latency percentiles (p50/p95/p99 — averages hide the slow tail that actually upsets users) and business-relevant error rates; and at the database layer, CPU, connection count, and replica lag.

The tie-in to Auto Scaling is where this becomes architecture, not just a dashboard: an Auto Scaling group's scaling policy reads a CloudWatch metric (commonly average CPU or, better for request-driven services, ALB request-count-per-target) and adds or removes instances to keep it near a target. The follow-up interviewers love here is "what's wrong with scaling purely on CPU for an I/O-bound API" — the answer is that a service waiting on database or network calls can have low CPU while still being overloaded in terms of concurrent connections or latency, so request-count or latency-based scaling policies often reflect real load better than CPU alone. CloudWatch Logs and Logs Insights queries, plus **CloudWatch Alarms** feeding an SNS topic that pages on-call, round out the observability picture a senior candidate is expected to sketch unprompted.

### Design a scalable, highly-available web application on AWS — walk through it.

This is the scenario question AWS rounds reach for once they're past trivia, and it rewards a structured walkthrough over a list of service names.

1. **Edge and DNS.** Route 53 for DNS, CloudFront in front of static assets (and often the whole app) for caching and to absorb traffic at the edge before it reaches your origin.
2. **Load balancing.** An ALB in public subnets across at least two AZs, terminating TLS and routing to your application tier.
3. **Compute.** An Auto Scaling group of EC2 instances (or an ECS/Fargate service) in private subnets across multiple AZs, scaling out on CPU/request-count metrics — this is where you justify EC2 vs. Fargate vs. Lambda based on the workload's shape, not by default.
4. **Data tier.** RDS Multi-AZ for relational data (synchronous standby for automatic failover) with read replicas if read load is heavy, or DynamoDB if access patterns are simple and you need to scale writes without managing replica fleets yourself.
5. **Caching.** ElastiCache (Redis/Memcached) in front of the database for hot reads, cutting database load and latency.
6. **Asset storage.** S3 for user uploads and static files, served through CloudFront rather than the application servers.
7. **Observability and resilience.** CloudWatch alarms on key metrics, health checks driving Auto Scaling and ALB target deregistration, and a deliberate answer for what happens if an AZ goes down (the architecture above already tolerates it, because nothing is single-AZ).

The follow-up interviewers love is "now traffic spikes 10x overnight — what breaks first, and what do you do?" The honest answer: your database is usually the bottleneck before compute is, because Auto Scaling groups and Fargate scale out fast, but a single RDS writer doesn't scale horizontally the same way — so the real answer is caching aggressively, adding read replicas ahead of time, considering DynamoDB for the hottest-path data, and pre-warming CloudFront/Auto Scaling if the spike is predictable (a product launch) rather than purely reactive.

A second follow-up that separates strong candidates from good ones: "what if it's not traffic that 10x'd, but the size of a single customer's data?" That's a different failure mode entirely — it's not about more requests, it's about a single DynamoDB partition or a single large RDS table getting hot or slow. The answer there leans on what you described earlier about partition key design and read replicas, which is exactly why interviewers ask scenario questions instead of trivia: they're checking whether your earlier answers were memorized facts or a connected mental model you can apply to a new twist.

### How do you architect a fault-tolerant application?

Fault tolerance means the system keeps working when a *component* fails, not just that it eventually recovers — the distinction interviewers are checking for is fault-tolerant (no visible disruption) vs. merely highly-available (brief disruption, then recovery). Practical levers: eliminate single points of failure by spreading every tier across multiple AZs; use health checks so a load balancer stops sending traffic to an unhealthy instance within seconds; make services stateless where possible so any instance can serve any request, with shared state pushed to RDS/DynamoDB/ElastiCache instead of living on a single box; use retries with exponential backoff and circuit breakers for calls to downstream services so one slow dependency doesn't cascade into a full outage; and design for graceful degradation — if a recommendation service is down, show the page without recommendations rather than failing the whole request.

## Cost optimization

### Reserved Instances, Savings Plans, and Spot Instances — when does each make sense?

**On-Demand** pricing is the default — pay per hour/second with no commitment, right for unpredictable or short-lived workloads. **Reserved Instances** and **Savings Plans** commit to 1 or 3 years of usage in exchange for a significant discount (up to ~70%) — right for steady-state baseline load you can confidently predict, like the always-on portion of your fleet. **Spot Instances** bid on AWS's spare capacity at up to 90% off, but can be reclaimed with two minutes' notice — right for fault-tolerant, interruptible workloads like batch processing, CI/CD runners, or stateless worker fleets that can simply restart a job elsewhere. A senior cost answer combines all three: Reserved/Savings Plans for your predictable baseline, On-Demand for the variable portion above that baseline, and Spot for anything interruptible — rather than running an entire fleet at On-Demand rates by default.

### What does "right-sizing" mean, and how do you actually do it?

Right-sizing means matching instance type and size to actual observed resource usage instead of over-provisioning "just in case." In practice: pull CloudWatch metrics (CPU, memory if you have the agent installed, network) over a representative window, and look for instances consistently running well under capacity — a common finding is `m5.xlarge` instances sitting at 8% average CPU that would run identically on `m5.large` at half the cost. AWS Compute Optimizer automates this analysis and gives concrete downsizing (or upsizing) recommendations based on real usage history. The interview-grade point: right-sizing isn't a one-time exercise — usage patterns change as an application evolves, so it's something you revisit periodically, not a box you check once at launch.

### Beyond compute — where else does AWS spend hide, and how do you find it?

Interviewers who've actually owned a cloud bill will sometimes push past "Reserved Instances and right-sizing" to ask where cost surprises tend to come from outside compute, because compute is usually the obvious line item, not the sneaky one. Real answers worth having ready: **data transfer** costs, especially cross-AZ and cross-region transfer, which are easy to ignore until a chatty microservice architecture racks up meaningful inter-AZ traffic just from services calling each other; **NAT gateway** per-GB processing charges, as mentioned above, often fixable with VPC endpoints; **unattached EBS volumes and old snapshots** that nobody deleted after decommissioning an instance, which is the single most common "oh, that's been silently costing money for eight months" finding in a cost audit; **over-provisioned DynamoDB read/write capacity** on tables still using provisioned mode instead of on-demand or auto-scaling; and **idle load balancers and NAT gateways** left running for an environment nobody tore down after a project ended. AWS Cost Explorer and AWS Budgets (with anomaly detection alerts) are the tools for catching these proactively rather than discovering them at the end of the month — naming the actual tool, not just "I'd monitor costs," is what makes this answer land as experience rather than theory.

<div class="verdict"><strong>The core truth:</strong> AWS interviews reward architectural thinking — picking the right service for the workload and explaining the tradeoff, not reciting service names. "I'd use S3 over EBS because this is unstructured, infrequently-read data accessed via API" is the signal; "S3 is object storage" is not.</div>

## How most people actually prep for AWS interviews — and where it falls short

Almost nobody preps for an AWS interview from zero. The realistic starting point is one of a handful of well-worn paths, and each one is genuinely useful for *something* — the trick is knowing what each one doesn't cover.

**GeeksforGeeks-style question dumps** are the first thing most candidates open, and for good reason: they're free, fast to skim, and decent at surfacing the *breadth* of topics — you'll walk away knowing that NAT gateways and VPC peering exist, which is more than nothing. What they don't do is simulate the follow-up. A dump gives you "Q: What is a NAT gateway? A: It allows outbound internet access for private subnets" and stops there — it never asks "okay, your private-subnet fleet's NAT gateway bill just tripled, why, and what do you do about it," which is the actual shape of the question in a real architecture round.

**A Cloud Guru, Udemy cert-prep courses, and similar certification-oriented training** are genuinely good — better than most people give them credit for — at teaching the services themselves in depth, often better than a quick blog post, because they're built to get you through the Solutions Architect or Developer exam. The honest tradeoff: certification exams are multiple-choice. They test recognition ("which of these four options correctly describes X") under no time pressure to *produce* an explanation from scratch, and they never push back on an architecture decision the way a live interviewer does. You can pass the SAA-C03 exam and still freeze when an interviewer asks "you just said you'd use DynamoDB here — walk me through what happens if that partition key turns out to be a celebrity's user ID."

**LeetCode and generic coding-round prep** matter for the algorithmic portion of many cloud/backend roles, but they're a different skill entirely from an AWS architecture round — solving a graph problem in 25 minutes doesn't train you to defend why you put a database in a private subnet. Candidates sometimes over-invest here because it's the most familiar grinding format, and under-invest in the service-and-architecture half of the interview as a result.

**A friend's WhatsApp-forwarded PDF of "AWS interview questions"** (every cohort has one, photocopied in spirit if not in form from some other cohort two years earlier) is a real artifact in how people actually prepare, and it's not worthless — it's usually a reasonably accurate list of *what gets asked*. Its failure mode is that it's a list of questions with terse, sometimes outdated or subtly wrong answers, with zero mechanism for someone to tell you that your answer to "EBS vs EFS vs S3" was vague, or that you said "highly available" when you meant "fault tolerant," or that you've been confusing Multi-AZ with a read replica this whole time (a genuinely common mix-up, see above).

**Generic ChatGPT prompting** ("explain VPC peering," "what's the difference between security groups and NACLs") is fast and often factually fine, and there's no shame in using it to fill a knowledge gap quickly. But reading a generated explanation is a passive, silent activity — there's no live follow-up, no one noticing that you said "stateful" when you meant "stateless," and critically, no practice *producing* the explanation yourself, out loud, under the mild social pressure of someone listening and about to ask "okay, but why."

Where Greenroom is meant to sit differently in this stack: it doesn't replace learning what the services do — none of the above approaches are a waste of time, and the cert courses in particular remain a genuinely strong way to learn AWS deeply. What Greenroom adds is the part none of those formats simulate — **spoken practice with live follow-ups** that behave like a real AWS architecture round actually behaves. You describe your design for a scalable web app, and the interviewer doesn't just nod — it asks "what if the database becomes the bottleneck," or "you said Multi-AZ, does that help with read scaling too?", or "your NAT gateway bill just spiked, walk me through why," the same way a real interviewer pushes on the weak point in your design rather than letting you read a definition and move on. It's the difference between recognizing a correct answer on a page and being able to produce one, defend it, and adapt it live — which is the actual skill an AWS interview is measuring.

## How to prepare

AWS rounds mix service knowledge with architecture scenarios, and the scenario questions are where most candidates lose points — not because they don't know the services, but because they haven't practiced reasoning through a design out loud under follow-up pressure. Walk through the "design a scalable web app" exercise above for a system you'd actually pick, then have someone push back on it: what if the database becomes the bottleneck, what if one AZ goes down, what's the cheapest way to handle a predictable traffic spike. For service-level depth, AWS's own documentation at docs.aws.amazon.com and the talks from AWS re:Invent remain the most accurate primary sources — worth cross-checking anything you've half-remembered from a course or a question dump before an interview, since AWS services genuinely do change year to year. [Greenroom](/) runs spoken technical interviews that follow up on your reasoning the way a real AWS-focused interview does. Pair it with our [DevOps](/blog/devops-engineer-interview-questions), [cloud engineer](/blog/cloud-engineer-interview-questions), and [system design](/blog/system-design-interviews-what-they-test) guides.

![A calm, structured interview checklist for technical prep](/assets/blog/pool-structured-screen.webp)

A structured run-through before the real round beats last-minute cramming on service names.

## Frequently asked questions

### What are the most common AWS interview questions?

Common AWS questions cover compute (EC2 instance types, when to choose Lambda vs ECS/Fargate vs EKS), storage (S3 storage classes, EBS vs EFS vs S3), databases (RDS vs DynamoDB and DynamoDB partition/sort keys), networking (VPC, public vs private subnets, NAT gateways, VPC peering vs Transit Gateway), identity (IAM users, roles, policies, least privilege, security groups vs NACLs), scaling and reliability (ALB vs NLB, regions vs AZs, fault tolerance), a system-design scenario like architecting a scalable web app, and cost optimization (Reserved/Spot Instances, right-sizing).

### What is the difference between EC2, S3, and EBS?

EC2 is compute — virtual servers you run applications on. S3 is object storage for files and unstructured data, accessed over HTTP with very high durability and virtually unlimited capacity. EBS is block storage that attaches to a single EC2 instance like a virtual hard disk, for low-latency, persistent volumes. In short: EC2 runs your app, S3 stores objects and files at scale, and EBS provides disk volumes for individual instances.

### What is IAM in AWS, and what does least privilege mean?

IAM (Identity and Access Management) controls who, or what, can do what in your AWS account. It uses users, groups, and roles, with policies (JSON documents) granting specific permissions on specific resources. Roles let services or users assume temporary permissions without long-lived credentials, which is generally safer than static access keys. Least privilege means granting only the permissions each identity actually needs and nothing more — a frequent interview talking point and the single principle that should guide every IAM design decision.

### When should I choose Lambda instead of EC2 or containers?

Choose Lambda for short-lived, event-driven work — a file-upload handler, a webhook, a scheduled job — where you want automatic scaling to zero and to high concurrency with no server management, and your workload fits within Lambda's execution time (15 minutes) and memory limits. Choose EC2 for long-running processes or full control over the runtime, and choose ECS/Fargate or EKS when you need to run containers with an orchestrator managing placement and restarts. Most real systems use a mix rather than picking one option for everything.

### How do I design for high availability on AWS?

Deploy across at least two Availability Zones within a region for every tier — load balancer, compute, and database — so a single data center failure doesn't take the application down. Use an Auto Scaling group or managed container service spanning AZs, RDS Multi-AZ or a distributed database like DynamoDB, and health checks that remove unhealthy instances from rotation automatically. Multi-region adds protection against a full regional outage, but it's a significant jump in cost and complexity that's usually justified only by a specific business requirement, not adopted by default.

### How should I prepare for an AWS interview?

Learn the core services across compute, storage, networking, and identity, but spend more of your prep time on architectural reasoning — when to choose S3 over EBS, why a database sits in a private subnet, how to design for high availability across Availability Zones, and how to apply least privilege. Practise walking through a full scenario, like architecting a scalable web app or handling a sudden traffic spike, out loud with a voice-based mock interview that asks realistic follow-up questions, since that is where most real AWS rounds actually go.

AWS rounds reward architectural thinking, explained out loud under follow-up questions. Greenroom runs spoken technical interviews that follow up on your reasoning. Free to start.