A ranked list of the 20 HTML5 interview questions you’re most likely to get first, with interview-ready answers, likely follow-ups, and the practical details.
You don't need to memorize every HTML attribute that's ever existed. You need the HTML5 interview questions that actually show up in screening rounds, and you need to be able to answer them cleanly before the follow-up lands. That's the whole game at the junior and returning-developer level — not encyclopedic recall, but confident, specific answers that don't fall apart when the interviewer pushes one level deeper.
This list is ordered by how often each question appears first, not by how impressive it sounds to know. Work through it in sequence and you'll cover the ground that matters most in the time you actually have.
Use This List Like a 30-Minute Cram Sheet, Not a Textbook
How I prioritized the top 20
The ordering here comes from three overlapping signals: the questions that appear most often in junior frontend screening templates, the follow-up chains that interviewers reliably pull on, and the topic clusters that keep surfacing in frontend job descriptions across the stack. HTML5 interview prep resources tend to dump 50+ questions in alphabetical order, which is useless when you have an hour before a call. This list treats study order as a product decision — the questions at the top are the ones that block you from passing a screen if you blank on them, not the ones that are technically interesting.
According to MDN Web Docs, the features that define HTML5 — semantic elements, native media, form enhancements, and browser APIs — are also the features that appear most often in real frontend work, which is exactly why interviewers lean on them. The questions aren't arbitrary; they map to what you'll actually write.
How to use the answers without sounding memorized
Read the answer once to get the idea. Then close the tab and say it out loud in your own words. The goal isn't to recite the definition — it's to own the concept well enough to handle the follow-up. For each item below, the follow-up is the real test. If you can answer the main question and then explain why it matters or when you'd use it differently, you already sound more senior than most candidates at your level.
The 20 HTML5 Interview Questions You're Most Likely to Get First
These are the HTML5 interview questions that show up earliest and most often in frontend screens. Work through them in order.
1. What is HTML5?
HTML5 is the current standard for structuring content on the web. It introduced semantic elements, native audio and video support, form enhancements, and browser APIs like localStorage and Geolocation — all things that previously required plugins or JavaScript workarounds. The follow-up is almost always "what changed from older HTML?" — so have a one-sentence answer ready: the doctype got simpler, layout tags got meaningful, and the browser can now do things that used to need Flash or jQuery hacks.
2. What is the difference between HTML and HTML5?
The old answer — "HTML5 is the newer version" — is technically correct and completely useless in an interview. The real distinction is what HTML5 added: semantic structural tags like `header`, `main`, and `footer`; native `<audio>` and `<video>` without plugins; input types like `email`, `date`, and `number` with built-in validation; and APIs that let the browser handle geolocation, offline storage, and canvas drawing. A modern news page using `<article>` for each story, `<nav>` for the menu, and `<aside>` for related links is a better example than any definition.
3. What does the HTML5 doctype do?
`<!DOCTYPE html>` tells the browser to render the page in standards mode rather than quirks mode. Quirks mode is a legacy behavior where browsers imitated old, buggy rendering to support decade-old pages — you don't want it. The old doctypes were long and referenced a DTD because they were tied to SGML validation rules. HTML5 dropped that dependency entirely, which is why the doctype is now three words. The follow-up is usually "what happens without it?" — the answer is inconsistent rendering across browsers, especially in older IE.
4. What are semantic elements in HTML5?
Semantic elements are tags that describe the meaning of their content, not just how it looks. `<header>`, `<nav>`, `<main>`, `<article>`, `<section>`, `<aside>`, and `<footer>` tell both the browser and a screen reader what role each block plays on the page. The follow-up interviewers love is "why does that matter beyond cleaner code?" — and the real answer is accessibility and SEO. Screen readers use document structure to build navigation menus for users who can't see the page. Search engines use it to understand what content is primary. A `<div>` soup technically renders, but it communicates nothing.
5. When should you use section, article, and div?
`<article>` is for self-contained content that makes sense on its own — a blog post, a news story, a product card. `<section>` is for thematically grouped content that belongs to a larger whole, like chapters in a document or tabs in a UI. `<div>` is for when you need a container for styling or scripting and the content has no inherent semantic meaning. On a blog page: each post is an `<article>`, the sidebar is an `<aside>`, and the layout wrapper around everything is a `<div>`. The follow-up is usually about the document outline algorithm — interviewers want to know if you understand that `<section>` creates a new outline scope, which affects how headings nest.
6. What is the difference between id and class?
An `id` is unique — one element per page. A `class` can be shared across as many elements as you want. In CSS, `id` carries higher specificity, which makes it harder to override and a liability in component-heavy codebases. In markup it's fine to use `id` for anchor links or JavaScript hooks, but using it for styling in a design system is a smell because you can't reuse it and you'll fight the cascade. The follow-up is almost always about specificity — know that an `id` outweighs any number of classes in the cascade.
7. What is the purpose of the alt attribute on images?
`alt` provides a text alternative for images when they can't be displayed — whether because the image failed to load, the user is on a screen reader, or a search engine is indexing the page. An empty `alt=""` tells assistive technology the image is decorative and should be skipped. A missing `alt` attribute is a WCAG failure and will come up if the interviewer asks about accessibility. The follow-up is usually "when would you leave alt empty?" — decorative images that add no information, like a background texture or a divider graphic.
8. What is the difference between `<b>` and `<strong>`, and `<i>` and `<em>`?
`<b>` and `<i>` are presentational — bold and italic with no semantic weight. `<strong>` signals importance; `<em>` signals stress emphasis. Screen readers may change their tone for `<em>` and `<strong>`. In practice, you'll almost always want the semantic versions because they communicate intent to browsers and assistive tools, not just visual renderers. The follow-up is whether you'd ever use `<b>` or `<i>` — yes, for stylistic offset like a product name or a technical term, where you're not implying importance.
9. What are data attributes?
`data-*` attributes let you store custom data on HTML elements without using non-standard attributes or abusing `class`. A carousel component might use `data-slide-index="3"` to track state without touching JavaScript variables directly. They're accessible via `element.dataset` in JavaScript. The follow-up is usually about when not to use them — they're not a substitute for application state management and shouldn't hold sensitive data since they're visible in the DOM.
10. What is the difference between `<script>`, `<script defer>`, and `<script async>`?
A plain `<script>` tag blocks HTML parsing while the script downloads and executes. `defer` downloads in parallel and executes after the document is parsed — preserving execution order. `async` downloads in parallel and executes as soon as it's ready, regardless of document state, which means order is not guaranteed. For most page scripts, `defer` is the right default. `async` is for independent third-party scripts like analytics where execution order doesn't matter. This question comes up in performance discussions and is a common follow-up after any question about page load.
11. What is the purpose of the `<meta>` viewport tag?
`<meta name="viewport" content="width=device-width, initial-scale=1">` tells mobile browsers not to scale the page down to fit a desktop layout. Without it, a mobile browser will render the page as if it's 980px wide and then zoom out, making everything tiny. It's the foundational line for responsive design. The follow-up is usually about what other viewport values do — `maximum-scale` can prevent user zoom, which is an accessibility problem and something interviewers will probe if you mention it.
12. What is the difference between `<link>` and `<a>`?
`<link>` is a metadata element in `<head>` that defines relationships between the document and external resources — stylesheets, preloaded fonts, canonical URLs. `<a>` is a content element that creates a hyperlink for users to click. They're not interchangeable. A common follow-up is about `rel` values on `<a>` — specifically `rel="noopener noreferrer"` on links that open in a new tab, which prevents the target page from accessing the opener's window object.
13. What is the `<canvas>` element?
`<canvas>` provides a bitmap drawing surface controlled entirely by JavaScript. It has no DOM representation of what's drawn — once pixels are on the canvas, they're just pixels. It's the right tool for games, real-time data visualization, image manipulation, and anything that requires frequent redraws at high frame rates. The follow-up is always the comparison with SVG — know that canvas is immediate-mode (draw and forget) while SVG is retained-mode (shapes stay in the DOM and can be styled and queried).
14. What is the difference between localStorage and sessionStorage?
Both store key-value pairs in the browser with no expiration built into the HTTP layer. The difference is persistence: `localStorage` survives closing the browser and persists until explicitly cleared. `sessionStorage` is scoped to the tab and cleared when the tab closes. Neither is appropriate for sensitive data — both are readable by any JavaScript on the page, which makes them vulnerable to XSS. The follow-up is usually "so where do you store auth tokens?" — the short answer is httpOnly cookies, which JavaScript can't read.
15. What are the new form input types in HTML5?
The ones worth knowing for an interview: `email`, `url`, `tel`, `number`, `range`, `date`, `time`, `color`, `search`, `checkbox`, `radio`, and `file`. The browser provides built-in validation and mobile-optimized keyboards for most of these — `email` triggers an email keyboard on iOS, `date` opens a date picker on Chrome. The follow-up is almost always "does that replace server-side validation?" — no, never. Client-side validation is a UX convenience; server-side validation is the actual security boundary.
16. What do the required, pattern, and autocomplete attributes do?
`required` prevents form submission if the field is empty. `pattern` takes a regex and validates the input against it before submission. `autocomplete` hints to the browser what kind of data the field expects — `autocomplete="email"` helps password managers and autofill. These are UX and accessibility tools, not security tools. The follow-up is usually about `novalidate` on the `<form>` element — it disables all HTML5 validation, which you'd use when you're handling validation entirely in JavaScript and don't want the browser to interfere.
17. What is the `<picture>` element for?
`<picture>` lets you serve different image sources based on media queries or format support. A `<source>` element inside `<picture>` can specify a WebP version for browsers that support it, with a JPEG fallback in the `<img>` tag. It also handles art direction — serving a cropped portrait on mobile and a wide landscape on desktop. The follow-up is usually about `srcset` and `sizes` on `<img>`, which handles resolution switching without art direction. Know the difference: `<picture>` for format or composition changes, `srcset` for the same image at different resolutions.
18. What is the Geolocation API?
The Geolocation API lets a page request the user's physical location via `navigator.geolocation.getCurrentPosition()`. It requires explicit user permission — the browser shows a permission prompt, and if the user denies it, you get an error callback. Real use cases: centering a map, finding nearby stores, tagging a photo. The follow-up is about privacy and fallbacks — always handle the denial case gracefully, and never assume geolocation will succeed.
19. What is the role of ARIA in HTML?
ARIA (Accessible Rich Internet Applications) attributes add semantic meaning to elements that HTML alone can't describe — particularly custom interactive components. `aria-label` gives an element an accessible name when visible text isn't available. `aria-expanded` signals whether a dropdown is open. `role` overrides the implicit role of an element. The follow-up is "when shouldn't you use ARIA?" — the first rule of ARIA is to use native HTML instead wherever possible. A `<button>` is always better than a `<div role="button">` because it comes with keyboard focus and click behavior built in.
20. What is Subresource Integrity?
Subresource Integrity (SRI) lets you include a `integrity` attribute on `<script>` and `<link>` tags with a cryptographic hash of the expected file content. If the CDN serving the file is compromised and the content changes, the browser refuses to execute it. It's a supply chain security control. The follow-up is about when you'd use it — primarily for third-party scripts loaded from external CDNs, not for your own bundled assets served from the same origin.
Explain HTML vs HTML5 Without Sounding Like You're Reading the Docs
Why interviewers keep asking this one
This is a diagnostic question. Interviewers aren't testing whether you know the version number — they're checking whether you understand modern browser markup well enough to build with it. A candidate who says "HTML5 added new tags" is giving a vocabulary answer. A candidate who says "HTML5 gave us semantic layout elements, native media, form validation, and browser APIs that replaced a lot of what used to need JavaScript or plugins" is giving a working answer. The follow-up is almost always "can you give me an example?" — so have one ready.
How to answer it in one minute
Here's how I'd say it out loud: "HTML5 is the version of HTML that's been the living standard since around 2014. The biggest practical changes were semantic structural tags like `<main>`, `<article>`, and `<nav>` that replaced generic divs; native `<audio>` and `<video>` that removed the Flash dependency; input types and validation attributes that gave forms real browser-level behavior; and APIs like localStorage and Geolocation that let the browser do things that used to need server round-trips or third-party libraries. The doctype also got simplified to just `<!DOCTYPE html>`, which triggers standards mode without the old SGML boilerplate." That's about 45 seconds. It covers the real changes, uses concrete examples, and doesn't trail off into version history.
According to the WHATWG HTML Living Standard, HTML is now maintained as a living standard rather than versioned releases — which is worth mentioning if the interviewer asks about "HTML6" or what comes next.
Know the Semantic Tags Interviewers Expect You to Use Without Thinking
Which tag goes where: header, nav, main, article, section, aside, footer
Think of a news website. The masthead with the logo and site navigation is `<header>`. The navigation links are `<nav>`. The primary content area — the feed of stories — is `<main>`. Each individual story is an `<article>` because it makes sense read on its own. A group of related stories under a topic heading is a `<section>`. The sidebar with trending topics or ads is `<aside>`. The copyright block at the bottom is `<footer>`. These aren't rigid rules, but that mental model covers 90% of interview questions about layout. `<header>` and `<footer>` can also appear inside `<article>` elements — they're not limited to the page level.
Why semantics matter for accessibility and maintenance
A screen reader using a div-only page has no structural landmarks to offer its user. With semantic HTML, the browser exposes a landmarks list — main, navigation, complementary, contentinfo — that lets screen reader users jump directly to the section they want, the same way sighted users scan visually. That's not a nice-to-have; it's a WCAG 2.1 requirement. On the maintenance side, semantic markup is easier to debug in DevTools, easier for new team members to read, and more resilient to CSS refactors because the structure communicates intent independently of the styles.
The mistakes that make you sound junior
Three patterns come up in code reviews constantly. First: using `<section>` as a generic wrapper when `<div>` is correct — `<section>` implies a thematic grouping with a heading, not just a visual block. Second: wrapping every piece of content in `<article>` because it sounds more semantic — `<article>` means the content is independently distributable, not just "important." Third: putting `<nav>` inside `<footer>` and wrapping it in another `<nav>` — footer links are often a `<nav>`, but duplicating the landmark without distinguishing them via `aria-label` creates confusion for screen reader users. Interviewers who have done real code reviews will probe exactly these edge cases.
Forms Are Where Easy Answers Fall Apart
Which input types are actually worth memorizing?
For interviews and real product work: `email`, `url`, `tel`, `number`, `date`, `range`, `color`, `search`, `checkbox`, `radio`, and `file`. The reason these matter in interviews is that browsers do real work for you automatically — `email` validates format on submit, `date` renders a picker without JavaScript, `number` adds increment controls. The follow-up is usually "what does the browser give you for free with these?" — validation, mobile keyboard optimization, and accessible labeling via the input type's implicit role.
What do required, pattern, minlength, maxlength, and autocomplete really do?
These are the validation and UX layer that lives in the HTML before JavaScript touches anything. `required` blocks submission on an empty field. `pattern` runs a regex check — useful for things like postal codes or phone formats. `minlength` and `maxlength` constrain string length. `autocomplete` tells the browser what kind of data to suggest, which matters for password managers and mobile autofill. None of these replace server-side validation. A user can open DevTools, remove the `required` attribute, and submit whatever they want. The browser layer is for UX; the server is the actual gatekeeper.
Why fieldset and legend matter more than people think
Group related inputs inside a `<fieldset>` and label the group with `<legend>`. For a shipping address form, that means one `<fieldset>` with a `<legend>` of "Shipping Address" wrapping the street, city, state, and zip fields. For a payment form, a separate `<fieldset>` for card details. Screen readers announce the legend text before each field inside the group, so a user hears "Shipping Address — Street" rather than just "Street" — which matters when the same field label appears in multiple contexts. Most candidates skip this entirely, which is exactly why interviewers ask about it.
Media and Graphics Questions Are Simpler When You Know What Each Tool Is For
When do you use audio and video instead of custom embeds?
Native `<audio>` and `<video>` give you built-in browser controls, keyboard accessibility, and no plugin dependency. The `<source>` element inside each lets you provide multiple formats — WebM and MP4 for video, OGG and MP3 for audio — so the browser picks what it supports. The `<track>` element adds captions and subtitles. The follow-up is usually about autoplay — browsers block autoplay with sound by default, and you need the `muted` attribute to autoplay video reliably. According to MDN's media element documentation, the `controls` attribute is all you need to expose a fully accessible playback UI.
Canvas vs SVG: which one should you reach for?
Canvas is the right tool when you need pixel-level control and high-frequency redraws — games, real-time charts, image filters, generative art. Everything drawn to canvas is immediately rasterized; there's no DOM to query or style. SVG is the right tool for scalable graphics that need to stay crisp at any resolution, respond to CSS, or be accessible — icons, logos, data visualizations where individual elements need hover states or click handlers. A chart library that animates thousands of data points per second wants canvas. An icon system that needs to scale from 16px to 512px wants SVG.
What is the picture element actually solving?
`<picture>` solves two problems that `<img>` alone can't handle cleanly: format fallback and art direction. For format fallback, you offer a WebP source first and a JPEG source as the fallback — the browser picks the best format it supports. For art direction, you serve a tightly cropped portrait image on mobile and a wide landscape version on desktop using `media` attributes on `<source>`. The browser makes the choice; you just define the options. The follow-up is about `srcset` on `<img>` — that's for serving the same image at different resolutions, not different compositions.
Storage and Browser APIs Are the Questions That Start Sounding More Practical
localStorage vs sessionStorage: what's the difference?
Both store string key-value pairs in the browser with no expiration managed by the HTTP layer. `localStorage` persists across sessions — close the browser, reopen it, the data is still there. `sessionStorage` is scoped to the tab and cleared when the tab closes. Neither should hold sensitive data like tokens or PII because any JavaScript running on the page can read them, making both vulnerable to XSS attacks. The follow-up is almost always "so where do you put auth tokens?" — httpOnly cookies, which are inaccessible to JavaScript entirely.
What is the Geolocation API used for?
Real use cases: centering a map on the user's location, finding the nearest store or service, tagging content with a location, routing and navigation. The API requires explicit permission — `navigator.geolocation.getCurrentPosition()` triggers a browser prompt, and you must handle both the success callback and the error callback. The error case isn't rare; users deny location access regularly, especially on desktop. The follow-up is about accuracy — GPS gives you precise coordinates on mobile, but IP-based fallback on desktop can be off by miles.
Which HTML5 APIs do candidates get asked about next?
After localStorage and Geolocation, the most common adjacent questions are about the History API (`pushState` and `replaceState` for client-side routing), the Fetch API as the modern replacement for XMLHttpRequest, Web Workers for off-thread computation, and the Intersection Observer API for lazy loading and scroll-triggered animations. These questions are usually framed as "have you used X?" rather than "explain X in detail" — the interviewer is probing whether you've worked beyond basic markup into the browser platform. Honest, specific answers about where you've actually used them are more credible than rehearsed definitions.
The Answers That Sound Senior Are the Ones That Don't Forget Accessibility, Security, and Performance
What is an accessible name?
An accessible name is the text that assistive technology uses to identify an interactive element. For a button with visible text, the accessible name is that text. For an icon button with no visible label, you need `aria-label="Close dialog"` or `aria-labelledby` pointing to a visible element elsewhere in the DOM. The follow-up is "when should you use visible text instead of ARIA?" — always, when possible. Visible text benefits all users, not just screen reader users, and it's less fragile than ARIA attributes that can drift out of sync with the UI.
What HTML mistakes break accessibility fast?
Four patterns cause the most damage. Missing `<label>` elements on form inputs — screen readers can't announce what a field is for. Empty link text like `<a href="/report">Download</a>` with only an icon inside — the accessible name is empty. Bad heading order — jumping from `<h1>` to `<h4>` breaks the document outline that screen reader users navigate by. And `tabindex` values above 0 — they override the natural tab order and create a navigation nightmare for keyboard users. A login form with unlabeled password fields and a submit button that's a styled `<div>` will fail a basic accessibility audit immediately.
Which security and performance details are worth mentioning?
For security: `rel="noopener noreferrer"` on any `<a target="_blank">` link prevents the opened page from accessing `window.opener` — a tabnabbing vector. The `sandbox` attribute on `<iframe>` restricts what the embedded content can do. Subresource Integrity (`integrity` attribute on `<script>` and `<link>`) validates that CDN-served files haven't been tampered with. For performance: `loading="lazy"` on images below the fold defers their fetch until they're near the viewport. `<link rel="preload">` tells the browser to fetch a critical resource early. `<link rel="preconnect">` warms up the DNS and TCP connection to a third-party origin before the resource is requested. `fetchpriority="high"` on the hero image nudges the browser to prioritize it over other in-flight requests. Dropping two or three of these into an answer about page performance is the detail that makes a candidate sound like they've actually shipped production pages.
How Verve AI Can Help You Prepare for Your Frontend Engineer Interview
Knowing the right answer in a quiet room and delivering it under live interview pressure are two different skills. The structural problem isn't recall — it's that most candidates have never actually said these answers out loud to something that responds. Reading through a list gives you familiarity; being asked a follow-up you didn't expect is what exposes the gaps.
Verve AI Interview Copilot is built for exactly that gap. It listens in real-time to the live conversation — whether you're in a mock session or a real screen — reads what's actually being asked, and surfaces relevant context while you're answering. It's not a flashcard app. It responds to what you actually said, not a canned prompt, which means when the interviewer pivots from "what are semantic elements?" to "show me how you'd structure a news page," Verve AI Interview Copilot is already tracking the thread. The desktop app stays invisible during screen share at the OS level, so it's available without disrupting the interview dynamic. If you've worked through this list and want to pressure-test whether the answers hold up under follow-up, Verve AI Interview Copilot is the tool that runs mock interviews at the level of specificity that actually prepares you.
You Don't Need to Know Everything — You Need to Know These First
The fastest path through HTML5 interview prep isn't broader coverage. It's deeper confidence on the questions that come first. If you can explain the doctype, semantic layout, form validation, media elements, browser storage, and the accessibility and security details that most candidates skip, you've covered the ground that determines whether you pass a screening round.
Go back through the top 20 and say each answer out loud. Then say the follow-up out loud too. That's the practice that matters — not reading, but speaking. The candidate who can answer cleanly and then handle the push without wobbling is the one who moves forward.
James Miller
Career Coach

