AWS Interview Questions: 25 Answers by Role and Scenario

Answer AWS interview questions by role with context, choice, trade-off, and result. Covers IAM, EC2, S3, VPC, Lambda, and RDS.

You already know what IAM is. You know S3 is durable to eleven nines, that Lambda is serverless, that a VPC is a virtual private cloud. The problem is that AWS interview questions don't actually test whether you memorized the docs — they test whether you can explain why you made a choice, what broke when you deployed it, and what you would do differently. Candidates who only studied definitions can answer the first question in every topic. They fall apart on the second.

The fix isn't more memorization. It's a different answer shape: context, choice, trade-off, result. Every section below shows you what that looks like for the services, scenarios, and judgment calls that actually show up in interviews — organized by role, not by the AWS console menu.

Which AWS Interview Questions Show Up by Role

The services are the same across seniority levels. The questions are not.

What Beginner Candidates Get Asked First

Entry-level interviews tend to open with the foundational five: IAM, S3, EC2, Lambda, and VPC. The questions sound simple — "what does IAM do?" or "what is the difference between a security group and a NACL?" — but the test isn't recall. It's whether you can explain the concept without defaulting to marketing language. "IAM manages who can do what in your AWS account" is a real answer. "IAM is a robust identity and access management service that enables granular policy-based permissions" is a cheat sheet.

For S3, expect questions about storage classes, versioning, and how you'd structure a bucket for a static website or a data archive. For EC2, you'll get asked about instance types and what stop vs. terminate actually means. For Lambda, the baseline question is usually "when would you use it?" — and the trap is listing features instead of naming a use case.

What Mid-Level Engineers Need to Be Ready For

At mid-level, the questions shift from "what is this" to "what would you do." Interviewers start asking about migrations, about choosing between services, and about debugging production failures. This is where memorized definitions stop working. You can know every Lambda feature and still fail the question "we had a Lambda timing out in production — walk me through how you'd debug it," because that question requires a mental model of the execution environment, not a feature list.

A common mid-level scenario: "your team is migrating a monolith to microservices on AWS — what services would you use and why?" The candidate who lists ECS, API Gateway, SQS, and RDS without explaining the reasoning sounds like they Googled the answer. The candidate who says "we chose SQS over SNS because our consumers needed to process at their own pace and we couldn't afford to lose messages during a downstream slowdown" sounds like they shipped it.

What Senior Interviewers Listen For

Senior interviews are mostly about judgment. The interviewer is not checking whether you know what CloudTrail does — they're watching how you reason about reliability, cost, and security under constraints. Questions like "how would you design for 99.99% availability across two regions?" or "we had a security incident — what's your first call?" are testing whether you think in trade-offs or in textbook answers.

According to feedback patterns collected by engineering hiring teams, the single clearest separator between junior and senior candidates is not depth of AWS knowledge — it's whether the candidate volunteers the downside of their own recommendation. Senior candidates say "we went multi-AZ, not multi-region, because the RPO was 15 minutes and the cost of active-active replication wasn't justified." Junior candidates say "multi-region is more available." Both statements are true. Only one sounds like someone who has owned the decision.

Answer AWS Interview Questions Like Someone Who Has Done the Job

The question isn't whether you know the service. It's whether your AWS interview answers sound like decisions or definitions.

Why Definitions Sound Safe but Weak

Definitions feel safe because they're verifiable. You can't be wrong about what S3 is. But the moment the interviewer asks "why S3 instead of EFS for this use case?" or "what went wrong when you tried to use Lambda for that batch job?" — the definition has nothing to offer. It describes the service. It doesn't describe you.

The failure mode looks like this: a candidate is asked about their experience with RDS. They say "RDS is a managed relational database service that supports MySQL, PostgreSQL, and several other engines, with automated backups and multi-AZ failover." The interviewer nods, then asks "what was the hardest part of running RDS in production?" The candidate has no answer, because they studied RDS but never operated it — and the definition didn't hide that for long.

The Four-Part Answer That Actually Lands

The pattern that works is: context, choice, trade-off, result. Context tells the interviewer what you were actually trying to solve. Choice names what you picked and why. Trade-off shows you understood the downside. Result closes the loop with something measurable or observable.

A real example: "We were storing user-uploaded media files for a social app. I chose S3 over building a file server on EC2 because we needed durability and didn't want to manage disk scaling. The trade-off was that S3 access patterns can get expensive with high-frequency small reads, so we added CloudFront in front of it. Result: storage costs dropped by 40% compared to our original EBS estimate, and we never had to think about capacity again." That answer takes thirty seconds. It covers the decision, the risk, and the outcome. It sounds like someone who shipped it.

How to Keep Answers Short Without Sounding Shallow

The instinct when you're nervous is to pad. You list more services, add more qualifications, cover more edge cases. The interviewer hears noise. Short answers land when they name the decision criteria and the outcome — not when they cover every possible scenario.

Plain English test: if you're using three AWS service names in a row without connecting them to a problem, you're probably listing. Stop. Pick the one that mattered most and explain why it mattered. "We used SQS because our downstream service was flaky and we needed the queue to absorb the backpressure" is more credible than "we used SQS, SNS, EventBridge, and Lambda to build a fully decoupled event-driven architecture."

IAM and Security: Answer the Question Behind the Question

IAM questions in AWS interview prep aren't really about IAM. They're about whether you think in terms of containment.

How Do IAM Users, Roles, and Policies Actually Differ?

Users are identities for humans. Roles are identities for services, applications, or cross-account access — they're assumed temporarily and don't have long-term credentials attached. Policies are the permission documents that say what any identity is allowed to do. The conceptual difference that interviewers care about: roles are preferred over users for anything that isn't a human logging in, because they don't produce access keys that get committed to GitHub.

The follow-up is almost always about least privilege or cross-account access. For least privilege, the question is usually "how would you give an application access to only what it needs?" For cross-account, it's usually "how do you let one AWS account assume a role in another?" Be ready for both.

How Would You Explain Least Privilege Without Sounding Scripted?

The scripted answer: "least privilege means giving only the permissions required to perform a task." The interviewer has heard this 200 times. The specific answer: "we had a Lambda that needed to read from one S3 bucket and write to a DynamoDB table. Instead of attaching AmazonS3FullAccess, we wrote an inline policy that allowed s3:GetObject on the specific bucket ARN and dynamodb:PutItem on the specific table ARN. When we ran an access review six months later, we found three other Lambdas with wildcard permissions that had never been scoped down — that's the real cost of not starting with least privilege."

That answer demonstrates that you've actually done an access review, not just read the AWS IAM best practices guide.

What Security Topics Matter Most in AWS Interviews?

KMS, Secrets Manager, MFA, and incident response tend to cluster together in senior interviews. The through-line is containment and verification. KMS is about controlling who can decrypt what — the interviewer wants to know whether you understand key policies and envelope encryption, not just that encryption exists. Secrets Manager is about not putting credentials in environment variables or source code. MFA is about enforcing a second factor for the IAM root account and privileged users.

Incident response questions usually start with "you get a CloudTrail alert for unusual API calls from a root account — what do you do?" The answer the interviewer wants is sequential: revoke active sessions, rotate credentials, isolate affected resources, review CloudTrail for scope, notify security team. The answer they don't want is "I would investigate and fix it."

S3, EC2, and Lambda: Say What You Would Pick, Not Just What They Are

AWS scenario questions about core compute and storage services are really tests of trade-off judgment.

When Should You Choose S3 Over Building Around Compute?

S3 is the right answer when you need durable, scalable object storage and don't need the filesystem semantics of EBS or EFS. The eleven-nines durability figure from AWS S3 documentation is real, but the more useful interview signal is knowing when S3's access model creates problems. S3 is not a database. It doesn't support conditional updates. If your application needs to read, modify, and write the same object atomically, you need something else.

For static assets, backups, and data archives, S3 with lifecycle policies is almost always the right answer. Lifecycle policies let you move objects to Glacier after 90 days automatically — the cost difference between S3 Standard and Glacier Deep Archive is roughly 20x, and most interview candidates don't mention this unless they've actually managed storage costs.

How Do You Talk About EC2 Without Listing Every Instance Family?

You don't need to know every instance type. You need to know the decision criteria: compute-optimized for CPU-bound workloads, memory-optimized for in-memory databases, general purpose for most web services. The more important EC2 concept for interviews is the stop-vs-terminate distinction and what it means for EBS volumes, plus when autoscaling groups make sense versus when they add unnecessary complexity.

The concrete example that lands well: "we had a legacy monolith that couldn't be containerized quickly. We ran it on EC2 with an autoscaling group behind an ALB, set a minimum of two instances for availability, and used scheduled scaling to handle known traffic peaks. It wasn't elegant, but it was stable and we could control the OS configuration in a way we couldn't with ECS."

When Does Lambda Help, and When Does It Get in the Way?

Lambda is excellent for event-driven, short-duration workloads: S3 event triggers, API Gateway backends, scheduled jobs under 15 minutes. The cold start problem is real but usually overstated for async workloads — it matters when you have latency-sensitive synchronous APIs. The timeout limit of 15 minutes is a hard architectural constraint: anything that can run longer needs ECS, Fargate, or a batch service.

Lambda breaks under load in a specific way: if your downstream service can't handle concurrency, Lambda's default behavior of scaling to thousands of concurrent invocations will overwhelm it. Reserved concurrency is the fix, but most candidates who haven't hit this in production don't mention it.

VPC and Load Balancing: Explain the Network Without Turning Robotic

Cloud interview questions about VPC networking are where candidates either sound like practitioners or like they're reading a diagram.

What Do Subnets, Route Tables, NAT, and Security Groups Actually Do?

A subnet is a range of IP addresses within a VPC. Public subnets have a route to an internet gateway; private subnets don't. Route tables control where traffic goes — a subnet is public because its route table has a 0.0.0.0/0 route pointing at an internet gateway, not because of any special flag. Security groups are stateful firewalls at the instance level. NACLs are stateless and operate at the subnet level.

The follow-up interviewers almost always ask: "how does a resource in a private subnet get outbound internet access?" The answer is NAT gateway — placed in a public subnet, with the private subnet's route table pointing 0.0.0.0/0 at the NAT gateway. If a candidate says "you just open the security group," they've demonstrated they don't understand the routing layer.

Why Do Private Subnet Problems Keep Showing Up in Interviews?

Because they're the most common real-world networking failure. The scenario: a Lambda or EC2 instance in a private subnet can't reach an external API. The instinct is to blame the security group. The actual root cause is usually one of three things: no NAT gateway, NAT gateway in the wrong subnet, or a route table that was never updated after a VPC change.

A concrete troubleshooting flow: check the route table for the private subnet first. If there's no 0.0.0.0/0 route, that's your answer. If the route exists but points at a NAT gateway, check whether the NAT gateway is in a public subnet and whether the public subnet's route table has an internet gateway route. Security groups are usually the last thing to check, not the first.

How Do You Explain ELB and High Availability Without Hand-Waving?

Load balancers exist to distribute traffic across multiple targets and to remove unhealthy targets from rotation automatically. The "high availability" claim only holds if you're actually running targets in multiple AZs — a load balancer with all targets in one AZ fails when that AZ has a problem. The health check configuration matters: if your health check interval is 30 seconds and your unhealthy threshold is 3, you can have 90 seconds of traffic hitting a broken target before it's removed. That's a real architecture decision, not a default to accept blindly.

Per AWS Elastic Load Balancing documentation, ALB operates at Layer 7 and supports path-based and host-based routing — useful when you're routing to multiple microservices behind one domain.

RDS, DynamoDB, and the Trade-offs Interviewers Actually Care About

AWS interview answers about databases fail in a predictable way: candidates describe both services accurately and then say "it depends on your use case." That's true and useless. The interviewer wants to know what it depends on.

How Do You Choose Between RDS and DynamoDB?

RDS is the right choice when your data has relationships, when you need ACID transactions across multiple tables, or when your team already knows SQL and the operational overhead of a managed relational database is acceptable. DynamoDB is the right choice when your access patterns are known and consistent, when you need single-digit millisecond latency at scale, or when you want to avoid managing schema migrations entirely.

The follow-up question is usually about access patterns: "what happens if you need to query DynamoDB by a field that isn't your partition key?" The answer — you either design a GSI upfront or you do a full table scan, which is expensive — is the kind of operational detail that separates someone who has used DynamoDB from someone who read about it.

What Does a Strong Database Answer Sound Like in an Architecture Interview?

Something like: "we had a user activity feed — high write volume, simple access patterns, always fetching by user ID. We chose DynamoDB with user ID as the partition key and timestamp as the sort key. The trade-off was that we couldn't do complex queries, but we didn't need them. RDS would have been overkill for the write throughput and would have required us to manage connection pooling under load."

That answer names the workload shape, explains the choice in terms of the workload, and acknowledges the limitation honestly.

What If the Interviewer Pushes on Performance or Cost?

DynamoDB on-demand pricing is convenient but can be expensive at high, predictable throughput — provisioned capacity with auto-scaling is usually cheaper if you can forecast. RDS performance questions often turn to read replicas and connection limits. Aurora is worth mentioning when the interviewer asks about scaling a relational workload, because Aurora's storage auto-scales and its reader endpoint abstracts read replica routing.

When Something Breaks, Senior Candidates Stop Guessing

AWS interview prep that skips troubleshooting scenarios is preparing for half the interview.

How Would You Troubleshoot a Lambda Timeout?

Start with CloudWatch Logs — look at the duration metric for recent invocations and check whether timeouts are happening consistently or only for certain input sizes. If duration is close to the configured timeout, the function itself is slow: profile the code, check for synchronous external calls, and look for N+1 database queries. If duration is normal but timeouts still appear, check the upstream trigger — an SQS queue with a visibility timeout shorter than the Lambda timeout will cause messages to be reprocessed while still in flight, which looks like timeouts but isn't.

The configuration mistake that trips up mid-level candidates: Lambda inside a VPC has a cold start penalty because it needs to provision an ENI. If cold start latency is pushing invocations over the timeout threshold, moving the Lambda out of the VPC or using provisioned concurrency are the two levers.

How Do You Work Through ALB 5xx Errors?

5xx errors from an ALB mean either the load balancer couldn't reach the target (502/503) or the target returned an error (500). Start with target group health in the ALB console — if targets are unhealthy, the health check is failing and you need to check the health check path, port, and response code configuration. If targets are healthy but 5xx errors persist, the application itself is returning errors: check application logs, not load balancer logs.

The failure pattern that catches candidates: assuming the ALB is broken because the error says "502 Bad Gateway." The ALB is almost always fine. The target is returning a malformed response or timing out. Trace the failure to the application layer before blaming the infrastructure.

How Do You Debug a Production Issue in a Private Subnet?

Step one: establish what can and can't be reached. Step two: check route tables before touching security groups. Step three: use VPC Flow Logs to confirm whether packets are being sent and whether they're being rejected at the network level or just not arriving. Step four: check security group rules for both the source and destination — outbound rules on the source and inbound rules on the destination both need to allow the traffic.

The phrase "the subnet is private" explains nothing about why connectivity is broken. Private just means there's no direct internet gateway route. It doesn't mean the resource can't communicate within the VPC, reach a VPC endpoint, or use a NAT gateway for outbound traffic. Candidates who treat "private subnet" as a root cause, rather than a starting condition, waste the interviewer's time.

What Senior Interviewers Listen for When the Answer Is Open-Ended

How Do You Discuss High Availability and Disaster Recovery?

Multi-AZ means your application survives the failure of one availability zone — RDS Multi-AZ, ALB across AZs, autoscaling groups with instances distributed across zones. Multi-region means your application survives the failure of an entire AWS region, which requires active-active or active-passive architecture, data replication across regions, and a routing layer like Route 53 with health checks. Backup-only is not high availability — it's recovery, and it has an RTO measured in hours, not minutes.

The business question behind the architecture question: what's your RTO and RPO? Recovery Time Objective is how long you can be down. Recovery Point Objective is how much data you can afford to lose. Every HA decision is a cost-benefit calculation against those two numbers, and senior candidates say so.

How Do You Explain Cost and FinOps Without Sounding Finance-First?

Cost control in AWS is an engineering problem, not a finance problem. Tagging resources by team, environment, and service is the foundation — without tags, you can't attribute costs and you can't find the anomalies. AWS Cost Anomaly Detection can alert you when spending spikes unexpectedly. Rightsizing EC2 instances using Compute Optimizer recommendations is usually the fastest win on an existing workload.

A real example: "we found a dev environment with a NAT gateway processing 2TB of traffic per month because a developer had accidentally pointed a data pipeline at a public endpoint instead of a VPC endpoint for S3. Switching to a VPC endpoint saved $90/month and took 20 minutes. The tag that flagged it as dev environment spend was what surfaced it." That kind of answer shows you've actually looked at a cost dashboard.

What Should You Say If You Built Something Hands-On?

Talk about ownership, not just participation. "I worked on a team that used AWS" is weak. "I owned the infrastructure for a service that processed 50k events per day — I designed the SQS-to-Lambda pipeline, set up CloudWatch alarms for DLQ depth, and reduced our processing latency by 60% by switching from polling to event-source mapping" is what the interviewer is listening for. Name the service, the problem it solved, what you measured, and what you changed.

Why Strong Candidates Still Sound Memorized

What Does the Template Answer Get Wrong?

Templates help anxious candidates get words out. That's their only real value. The problem is that templates produce answers with the right shape and no substance — every answer starts with "great question, so what we did was..." and ends with a vague positive outcome. After three or four of these, the interviewer knows the candidate is running a script, not recalling a real project.

Why Do Service Dumps Feel Convincing but Fail Follow-ups?

Listing five services in a single answer sounds comprehensive. It collapses immediately when the interviewer asks "why did you choose SQS over Kinesis for that?" and the candidate can't answer because they listed SQS because it sounded right, not because they made a deliberate choice. The habit of naming services without explaining the selection criterion is the single most common pattern in AWS interview feedback from engineering hiring managers — it reads as someone who studied the service catalog, not someone who made architecture decisions.

What Separates Someone Who Studied From Someone Who Shipped?

Not effort. Not intelligence. The candidate who studied hard and still sounds memorized prepared for the wrong version of the test. They memorized what services do. The interview is testing whether they can reason about why they'd choose one, what would go wrong, and how they'd know. That's a different preparation target entirely — and it's why the four-part answer pattern (context, choice, trade-off, result) matters more than any individual service definition.

How Verve AI Can Help You Prepare for Your Interview With AWS

The structural problem this article just described — that candidates know the services but can't reconstruct a coherent decision narrative under live pressure — doesn't get solved by reading more documentation. It gets solved by practicing the actual answer pattern out loud, getting feedback on where you slipped into listing instead of reasoning, and doing it again with a different follow-up question.

That's the specific gap Verve AI Interview Copilot is built to close. It listens in real-time to your answer as you give it, tracks whether you're hitting the decision-reasoning pattern or drifting into a service dump, and responds to what you actually said — not a canned prompt. If you say "we used SQS" and stop there, Verve AI Interview Copilot will push the follow-up: why SQS, what was the alternative, what would have broken if you'd chosen differently. That's the pressure that exposes whether you actually own the answer.

Verve AI Interview Copilot runs mock interviews across the exact question types covered here — IAM scenarios, VPC troubleshooting, database trade-offs, open-ended architecture design — and stays invisible while doing it, so the practice environment feels like the real thing. The goal isn't to give you better scripts. It's to build the live reasoning habit that scripts can't replicate.

Conclusion

The shift this article is asking you to make is small in description and hard in practice: stop rehearsing AWS facts and start answering like someone who has owned the choice, lived through the failure, and can explain both in plain English. The candidates who do this consistently aren't smarter or more experienced — they've just practiced the right version of the question.

Pick three questions from the sections above — one IAM scenario, one troubleshooting flow, one trade-off question. Answer each one out loud using the context-choice-trade-off-result pattern. Record yourself if you can. Listen for the moments where you drift into listing service names without connecting them to a decision. Those are the gaps. Tighten them, and you'll sound less like a glossary and more like someone the team actually wants to hire.

Casey Rivera

Interview Guidance