Use MySQL create index fast queries decision tree to choose an index, query rewrite, or schema change after EXPLAIN shows the real bottleneck.
The ticket says "query is slow, add an index." That's the wrong starting point — and if you've ever added an index that the optimizer ignored, or watched write throughput drop after a migration, you already know why. The real question when you want to use MySQL create index for fast queries is whether the bottleneck is actually a missing index, a badly shaped SQL statement, or a schema that was never designed for this access pattern. Those three problems need different fixes, and conflating them is how you end up with a table carrying twelve indexes and still running full scans.
This guide gives you a repeatable decision tree: prove the bottleneck, read the EXPLAIN plan, decide between indexing, rewriting, or changing the schema, then validate the change safely before production traffic depends on it.
Start by proving the bottleneck is real, not just annoying
MySQL index optimization starts before you open a schema editor. The most common mistake is treating "the page is slow" as sufficient evidence that a query is the culprit. It often isn't — or it is, but not for the reason anyone assumed.
Stop guessing which layer is slow
Connection overhead, ORM hydration, N+1 query patterns, and application-level serialization all look like slow queries from the outside. Before you touch a schema, isolate the query layer from everything else. Enable the slow query log with `long_query_time` set to something honest — 100ms for OLTP, lower for latency-sensitive paths — and let it run against real traffic for at least one representative hour. What you get back is a ranked list of actual offenders, not a suspicion.
Performance Schema is more precise. The `events_statements_summary_by_digest` table gives you total execution count, cumulative latency, rows examined per execution, and rows sent. If a query runs 50,000 times per hour and examines 10,000 rows to return 1, that's the one to fix. A query that runs twice and takes two seconds is annoying but rarely the architectural problem.
Use the simplest slow query as your first test case
Pick one query, not a category of queries. A good first candidate is a `users` table filter with a `JOIN` to `orders` and an `ORDER BY created_at DESC LIMIT 20`. It's simple enough to reason about, but it exercises the three most common index-relevant patterns: selective filtering, join access order, and sort elimination. If you can make the right call on that query, you can make it on almost anything.
What this looks like in practice
A dashboard query at a SaaS company was timing out intermittently. The query filtered `orders` by `status = 'pending'` and `created_at > NOW() - INTERVAL 7 DAY`, joined to `users`, and sorted by `created_at DESC`. The team's first instinct was to add an index on `(status, created_at)`. Before they did, they checked rows examined in Performance Schema: 1.4 million rows examined, 47 returned. That ratio — 30,000:1 — is a scan problem.
But here's what the EXPLAIN showed: there was already an index on `status`. The optimizer was using it. The rows examined were high because `status = 'pending'` had terrible selectivity — 60% of rows in that table were pending. The index existed, the optimizer used it, and it still didn't help. The real fix was a partial index workaround (a generated column scoping to recent rows) and a query rewrite that added a tighter date range. No new index was the right answer.
That pattern — index present, optimizer using it, still slow — is more common than people admit. It's a signal to look at selectivity and query shape before reaching for `CREATE INDEX`.
Read the EXPLAIN plan before you touch production
The MySQL EXPLAIN plan is the single most important tool in this workflow. It tells you what the optimizer decided to do, not what you hoped it would do.
The plan tells you what the optimizer thinks is cheap
Run `EXPLAIN SELECT ...` and look at five fields: `type`, `key`, `rows`, `filtered`, and `Extra`.
`type` is the access method. From worst to best: `ALL` (full scan), `index` (full index scan), `range` (bounded scan), `ref` (non-unique lookup), `eq_ref` (unique join), `const` (single-row lookup by primary key or unique key). If you're seeing `ALL` on a large table, that's where the scan cost is coming from. `rows` is the optimizer's estimate of rows it will examine — not rows returned. `filtered` is the percentage of those rows that survive the WHERE clause. `Extra` carries critical signals: `Using filesort` means the sort can't be satisfied by the index, `Using temporary` means a temp table is being created, `Using index` means the query is covered by the index and never touches the base table.
`key` tells you which index was chosen. If it's `NULL`, no index is being used. If it's an index you expected to help but isn't listed, the optimizer rejected it — and you need to know why before adding another one.
When a full scan is fine and when it is a red flag
A full table scan on a 500-row lookup table that fits entirely in the buffer pool is not a problem. The optimizer knows this — it calculates whether the cost of a random-access index lookup exceeds the cost of a sequential scan, and for small tables it often prefers the scan. Don't fight it.
The scan becomes indefensible when the table has millions of rows, the query runs frequently, and `rows` in EXPLAIN is within an order of magnitude of the total table size. At that point, every execution is reading most of the table, and the cost compounds with traffic.
What this looks like in practice
Example 1 — Selective WHERE filter:
Before index: `type: ALL`, `rows: 2,400,000`, `Extra: Using where`. After adding `(account_id, status)`: `type: ref`, `key: idx_account_status`, `rows: 43`, `Extra: Using index condition`. Rows examined dropped by four orders of magnitude.
Example 2 — JOIN from the wrong table:
A JOIN between `events` (large) and `sessions` (small) was driving from `events`. EXPLAIN showed `type: ALL` on `events` as the outer table. Rewriting the query to hint the join order — or adding an index on `events.session_id` — changed the driving table and dropped rows examined from 800,000 to 12.
Example 3 — ORDER BY forcing filesort:
A query sorted by `updated_at DESC` but the index was on `(user_id, created_at)`. EXPLAIN showed `Extra: Using filesort`. Adding `(user_id, updated_at)` eliminated the filesort and cut query time from 340ms to 18ms on a production clone.
The official EXPLAIN output format documentation covers every field in detail. Read it once properly and you'll stop guessing at plan output.
Add an index only when the access pattern is doing real work
The decision to add index to a MySQL table is a write-side commitment, not just a read-side optimization. Every index you add is a data structure that MySQL maintains on every INSERT, UPDATE, and DELETE. That cost is real, and on write-heavy tables it compounds fast.
SELECT speed is the upside; write cost is the bill
An index on a table receiving 10,000 inserts per second adds roughly one B-tree write per index per row inserted. Three indexes means three additional write operations per insert, plus the page splits and rebalancing that come with high write volume. On InnoDB, secondary indexes also interact with the change buffer, which helps — but only for non-unique indexes and only when the index page isn't already in the buffer pool.
The write overhead shows up as increased lock contention, higher I/O on writes, and larger transaction log volume. On a table that's read 1,000 times per second but written 10,000 times per second, an index that saves 5ms on reads might cost 15ms on writes. That's a net loss.
Choose columns by selectivity, not by instinct
Selectivity is the ratio of distinct values to total rows. A column with 1,000,000 rows and 999,000 distinct values has near-perfect selectivity — a great index candidate. A `status` column with three possible values on a million-row table has terrible selectivity — an index on it alone will rarely help, because the optimizer knows it still has to read a third of the table.
The practical rule: index columns that appear in WHERE or JOIN conditions and have high cardinality. Use `SELECT COUNT(DISTINCT col) / COUNT(*) FROM table` to get a quick selectivity estimate. Anything below 5% deserves skepticism.
What this looks like in practice
A `payments` table with 8 million rows had a filter query: `WHERE customer_id = ? AND status = 'completed' ORDER BY created_at DESC LIMIT 10`.
A single-column index on `customer_id` helped but left the sort as a filesort. A single-column index on `status` was useless — 40% of rows were `completed`. A composite index on `(customer_id, status, created_at)` eliminated the filesort entirely and covered the filter with high selectivity on `customer_id`. Rows examined dropped from 14,000 to 10.
Adding a separate index on `created_at` on top of that composite would have been baggage — the optimizer would ignore it for this query shape, and it would cost write overhead on every payment insert. One well-designed composite index beat three narrow ones.
After adding that composite index to a table receiving ~2,000 inserts per minute, average insert latency increased by 0.3ms. Acceptable. On a table with 50,000 inserts per minute, the same math would require a more careful conversation.
Build composite indexes around WHERE, JOIN, ORDER BY, and pagination
MySQL index optimization for composite indexes follows one rule above all others: the leftmost prefix. The optimizer can use a composite index starting from the leftmost column and moving right — but it stops the moment it hits a gap or a range condition.
Get the column order wrong and the index feels broken
An index on `(created_at, user_id, status)` will not help a query that filters only on `user_id`. The optimizer needs `created_at` to be present in the WHERE clause to use this index at all. Put the most selective equality predicates first, range predicates after, and sort columns last — but only when they align with the query's ORDER BY direction.
The leftmost-prefix rule also means an index on `(a, b, c)` effectively gives you indexes on `(a)`, `(a, b)`, and `(a, b, c)` — but not `(b, c)` or `(c)` alone. This is why a well-designed composite index can replace multiple single-column indexes.
Do not index the whole sentence — index the path the optimizer walks
Map the query shape to index order: equality filters first, then range conditions, then the ORDER BY column if it matches the sort direction. If your query has `WHERE account_id = ? AND event_date BETWEEN ? AND ? ORDER BY event_date DESC`, the index should be `(account_id, event_date)` — `account_id` handles the equality, `event_date` handles both the range and the sort in one pass.
If you put `event_date` first, the index becomes useless for the `account_id` filter unless `event_date` is also in the WHERE clause as an equality condition.
What this looks like in practice
WHERE filter: `WHERE region_id = ? AND active = 1` → index on `(region_id, active)`. Both are equality predicates; order by selectivity (region_id is more selective).
JOIN: `events JOIN sessions ON events.session_id = sessions.id WHERE sessions.user_id = ?` → index on `events.session_id`. The join column on the inner table is the access path.
Keyset pagination: `WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 20` → index on `(created_at, id)`. This replaces offset-based pagination, which forces the optimizer to scan and discard rows. The composite index makes each page fetch a range scan from a known position.
One real case: a reporting query used two separate indexes — `(user_id)` and `(created_at)` — and the optimizer chose the wrong one depending on table statistics. Replacing both with `(user_id, created_at)` gave the optimizer one unambiguous path. Rows examined dropped from 180,000 to 22, and the query went from 1.2 seconds to 14ms. The MySQL documentation on multiple-column indexes explains the leftmost-prefix behavior in detail.
Rewrite the query when the index is fighting the SQL
Adding an index is the right answer when the access pattern is sound and the data volume justifies it. But sometimes the SQL itself is the problem — and no index will fix it.
Sometimes the SQL is the problem, not the index
Indexes are worth adding when the query shape allows the optimizer to use them efficiently. When the SQL wraps columns in functions, uses implicit type conversions, or applies conditions in an order that blocks index use, the right fix is to change the SQL, not add more indexes. The index exists; the query just can't reach it.
Functions on columns are the classic trap
`WHERE DATE(created_at) = '2024-01-15'` will not use an index on `created_at`. The function is applied to every row before the comparison, which means the optimizer has no way to use the B-tree structure — it has to evaluate the function on every row, which is a full scan. The same problem applies to `LOWER(email) = ?`, `YEAR(dob) = ?`, and any arithmetic like `price * 1.1 > ?`.
This is the most common case where EXPLAIN shows `type: ALL` on a table with an index that should be helping. It looks like a missing index. It isn't.
What this looks like in practice
Before (broken):
EXPLAIN: `type: ALL`, `rows: 4,200,000`, `Extra: Using where`. Index on `created_at` not used.
After (rewrite):
EXPLAIN: `type: range`, `key: idx_created_at`, `rows: 8,400`. Index used, filesort eliminated.
The rewrite costs nothing — no schema change, no migration, no downtime. One production case: an analytics query using `LOWER(email)` in a WHERE clause was scanning 6 million rows. The existing index on `email` was being ignored. Rewriting to `WHERE email = LOWER(?)` — moving the function to the parameter side — let the index work immediately. Latency dropped from 4.2 seconds to 11ms. No new index required.
Pick the right fix in MySQL 8.0: functional index, generated column, or rewrite
When a rewrite isn't possible — because the ORM generates the SQL, because the function is applied inconsistently across queries, or because you need the normalized value to be queryable in multiple contexts — MySQL 8.0 gives you two structured alternatives.
Functional indexes solve one narrow problem very well
A functional index in MySQL 8.0 lets you index an expression directly:
The optimizer will now use this index for `WHERE LOWER(email) = ?` without touching the base column. This is the right fix when the function-wrapped predicate is the only place you need the normalized value and you can't rewrite the SQL.
The limitation: functional indexes are narrow. They only help the exact expression they index. `LOWER(email)` and `LOWER(TRIM(email))` are different expressions and need different functional indexes. They also add write overhead just like regular indexes — every insert recalculates and stores the expression value.
Generated columns earn their keep when the expression is reused
A virtual generated column materializes the expression as a column that can be indexed normally:
The column appears in the schema, can be referenced in SELECT, used in multiple queries, and documented as a first-class concept. The index on it behaves like any other B-tree index. For a derived value that appears in reporting queries, API filters, and JOIN conditions, a generated column is operationally cleaner than a functional index — the team can see it, understand it, and query it directly.
What this looks like in practice
Same problem — email normalization — solved three ways:
Rewrite: Move `LOWER()` to the parameter. Works when you control the SQL. Zero schema change.
Functional index: Add `((LOWER(email)))` index. Works when you can't change the SQL. One migration, narrow scope.
Generated column + index: Add `email_lower` column and index it. Works when the normalized value is needed across multiple queries and you want it visible in the schema.
One team chose the generated column over the functional index specifically because their on-call engineers needed to be able to `SELECT email_lower` directly during incident investigation. The functional index was invisible in the schema; the generated column was self-documenting. That operational clarity was worth the slightly more complex migration. The MySQL 8.0 documentation on functional key parts and generated columns covers both approaches precisely.
Test the change safely, then prove the app got faster
The MySQL EXPLAIN plan on a staging clone is your first proof. Production latency metrics are your second. Neither one alone is sufficient.
Do not ship blind because the index name looks smart
MySQL 8.0 supports invisible indexes — indexes that exist in the schema but are not used by the optimizer unless you explicitly force them. This is the safest validation path:
Check EXPLAIN with `SET SESSION optimizer_switch = 'use_invisible_indexes=on'` to simulate the optimizer using the index. If the plan improves, make it visible. If it doesn't, drop it with no production impact.
On MySQL 5.7 or when invisible indexes aren't an option, test on a staging clone with a production data snapshot. Run the target query 1,000 times before and after the index, capture p50 and p95 latency, and check rows examined in Performance Schema. If both metrics improve, proceed.
Measure latency, not hope
The only valid success criteria are: rows examined decreased, p95 latency decreased, and the EXPLAIN plan shows the index being used. If rows examined dropped but latency didn't, check whether the query is now bottlenecked somewhere else — network, application layer, or a second slow query triggered by the first. If the plan shows the index but rows examined is still high, the index has low selectivity and you're back to the rewrite-or-schema decision.
What this looks like in practice
A safe rollout checklist:
- Capture baseline. Record p95 latency, rows examined per execution, and EXPLAIN output for the target query on staging with production-sized data.
- Create the index as invisible on the production table (or on staging first if the table is very large and ALTER time is a concern).
- Simulate the optimizer using it with `use_invisible_indexes=on` and re-run EXPLAIN. Verify `key` shows the new index and `rows` dropped significantly.
- Make the index visible and monitor p95 latency for 15 minutes under real traffic.
- Check write latency on the table for INSERT/UPDATE-heavy workloads. A 5% increase is usually acceptable; 20% is a conversation.
- Rollback trigger: if p95 latency on the target query doesn't improve within 15 minutes, or if write latency increases more than your threshold, drop the index. `ALTER TABLE ... DROP INDEX` is fast and low-risk.
Know when the new index is ignored and stop forcing it
You added the index. EXPLAIN still shows `type: ALL`. The key column is `NULL`. The optimizer is not confused — it made a decision, and there's a reason.
The optimizer is not being stubborn for fun
The most common reasons MySQL skips a new index: the index has low selectivity and the optimizer calculated that a full scan is cheaper (this is correct behavior, not a bug); the statistics are stale and the optimizer has a wrong estimate of cardinality; there are too many competing indexes and the optimizer chose a different one; or the query shape still wraps the column in a function or applies an implicit type cast that blocks index use.
Low selectivity is the most common culprit. If you add an index on a `boolean` column or a `status` field with three values on a large table, the optimizer will often ignore it because the index lookup plus random I/O to the base table costs more than a sequential scan.
Check stats before you blame the engine
Run `ANALYZE TABLE your_table` to refresh the index statistics. On large tables with skewed data distributions — where a small number of values appear far more frequently than others — the optimizer's cardinality estimates can be significantly wrong, leading it to either use an index it shouldn't or ignore one it should. After `ANALYZE TABLE`, re-run EXPLAIN and check whether the plan changed.
If the statistics look correct and the optimizer is still ignoring the index, check `SHOW INDEX FROM your_table` and look at the `Cardinality` column. A cardinality of 3 on a million-row table is a signal that this index will rarely be useful.
What this looks like in practice
One case: an index on `(tenant_id, event_type)` was added to a multi-tenant events table. EXPLAIN showed it being ignored in favor of a full scan. After `ANALYZE TABLE`, the cardinality estimate for `tenant_id` updated from 12 to 4,200. The optimizer immediately started using the index, and rows examined dropped from 2.1 million to 800. The data hadn't changed — only the statistics had.
A second case: an index on `(country_code)` was added to a `users` table where 70% of rows had `country_code = 'US'`. After `ANALYZE TABLE`, the optimizer correctly identified the low selectivity and continued ignoring the index. The right answer was to drop the index and rewrite the query to add a tighter secondary filter. Sometimes the optimizer is right, and the correct response is to stop adding indexes to that column.
FAQ
Q: How do I tell whether a slow MySQL query should be fixed with an index or rewritten instead?
Run EXPLAIN and look at whether the optimizer is using an existing index. If `type: ALL` is present and no function wraps the filtered column, a new index is likely the fix. If a function wraps the column — `DATE()`, `LOWER()`, arithmetic — the SQL is blocking index use and a rewrite is cheaper than a new index. If an index exists and is being used but rows examined is still high, the index has low selectivity and neither adding more indexes nor rewriting will help without a schema change.
Q: Which columns should I index first for a WHERE, JOIN, ORDER BY, or pagination query?
For composite indexes, put equality predicates first (most selective first among them), range predicates second, and the ORDER BY column last — but only when it matches the sort direction and follows the range column. For JOINs, index the column on the inner (driven) table that the outer table joins against. For keyset pagination, index the column pair used in the cursor condition — typically `(created_at, id)` — so each page is a range scan, not an offset scan.
Q: How can I prove with EXPLAIN that MySQL is actually using the new index?
After creating the index, run `EXPLAIN SELECT ...` and check three fields: `key` should show the new index name (not NULL), `type` should be `range`, `ref`, `eq_ref`, or `const` (not `ALL` or `index`), and `rows` should be dramatically lower than before. If `Extra` shows `Using index`, the query is covered by the index and never reads the base table — that's the best outcome. Compare rows examined in Performance Schema before and after to confirm the improvement is real, not just a plan artifact.
Q: When does a function on a column make a normal index useless, and what is the fix?
Any function applied to an indexed column in a WHERE clause prevents the optimizer from using the B-tree structure: `DATE(created_at)`, `LOWER(email)`, `YEAR(dob)`, `price * 1.1`. The optimizer has to evaluate the function per row, which forces a full scan. The fix depends on whether you control the SQL: if yes, rewrite to move the function to the parameter side (`created_at >= ? AND created_at < ?`). If no, use a MySQL 8.0 functional index on the expression or a virtual generated column that stores the normalized value.
Q: Should I use a functional index, a generated column, or a query rewrite in MySQL 8.0?
Rewrite first — it costs nothing and requires no schema change. Use a functional index when you can't rewrite the SQL and the expression is used in one specific query pattern. Use a generated column when the derived value is needed across multiple queries, when you want it visible in the schema for operational clarity, or when you need to SELECT it directly. Generated columns are more work to add but more transparent to operate.
Q: What is the real downside of adding another index to a busy production table?
Every index adds a write operation on INSERT, UPDATE, and DELETE. On write-heavy tables, this compounds into measurable latency increases and higher I/O. Indexes also consume disk space and buffer pool memory — memory that would otherwise cache data pages. On a table with many indexes, the optimizer also has more choices to evaluate, which can occasionally lead to worse plan selection when statistics are stale. The cost is real and proportional to write volume; it's not hypothetical.
Q: How can I add an index safely and verify the app got faster without creating regressions?
Use MySQL 8.0 invisible indexes: create the index as invisible, verify the plan improvement with `use_invisible_indexes=on`, then make it visible and monitor p95 latency and write latency for 15 minutes under real traffic. Set a rollback trigger before you start — if p95 doesn't improve or write latency exceeds your threshold, drop the index immediately. Capture rows examined and EXPLAIN output before and after as your proof of improvement.
Conclusion
The production decision was never "should we add an index?" It was always "what actually fixes this query with the least risk?" Those are different questions, and the second one has a structured answer: prove the bottleneck with the slow query log and Performance Schema, read the EXPLAIN plan before touching schema, choose between indexing and rewriting based on what's actually blocking the optimizer, and validate the change with invisible indexes and before-and-after latency measurements before you let production traffic depend on it.
Pick one slow query — the one that's been bothering you longest — and run it through this decision tree before you touch anything else in the schema. The answer will be clearer than you expect, and the fix will be smaller than you feared.
James Miller
Career Coach

