Strategy · 2026-06-01 · 9 min read

The technical SEO audit checklist I use for every new client

A detailed walk-through of the 40-point audit process that forms the foundation of every engagement — including what most tools miss.

Every engagement starts with an audit. Not because audits are impressive deliverables, but because you cannot prioritise what you haven't measured. The checklist below is the exact process I run — across 8 categories, 40 checks — before making a single recommendation.

Tools generate reports. This process generates a diagnosis. The difference is that a report tells you what's there; a diagnosis tells you what matters and what to do first.

An SEO audit is only as useful as its prioritisation. A list of 200 issues is not a strategy — it's noise.

Before you open any tool

The first step isn't crawling the site. It's understanding the business context: What are the primary conversion actions? Which pages drive revenue? Where does current organic traffic come from, and where does it drop off?

Get access to: Google Search Console, Google Analytics (or equivalent), the CMS, and a Cloudflare/CDN dashboard if available. An audit without GSC data is a blind audit.

The 40-point checklist

Crawl & Indexability

▸Robots.txt audit

Confirm no critical paths are disallowed. Check for wildcard rules that accidentally block CSS or JS.

▸XML sitemap validation

Sitemap submitted to GSC, no 4xx/5xx URLs included, updated automatically on publish.

▸Crawl depth analysis

Important pages reachable within 3 clicks from homepage. Deep-buried pages lose crawl priority.

▸Orphaned pages

Pages in sitemap or GSC with zero internal links. Crawlers find them, users don't.

▸Crawl budget check

Run Screaming Frog with a crawl log. Identify what fraction of crawl budget goes to low-value URLs (faceted nav, session parameters, infinite scroll).

▸noindex audit

Verify noindex isn't applied to pages that should be indexed. Common on staging environments merged to production.

Indexation & Coverage

▸GSC Coverage report

Review all 'Excluded' URLs. 'Crawled but not indexed' is the most important signal — Google's soft rejection of thin or duplicate content.

▸Canonicalisation

Self-referencing canonicals on all indexable pages. No canonical conflicts between paginated pages, parameter URLs, or www/non-www variants.

▸Duplicate content

Run a content similarity check. Internal duplicates dilute authority and confuse crawlers. Common in e-commerce (product variants, filtered pages).

▸Hreflang (if multilingual)

Correct language-region codes, reciprocal annotations, and no circular references.

Site Architecture

▸URL structure

Clean, descriptive, lowercase, hyphenated. No unnecessary parameters, no dynamic IDs as primary URLs for important pages.

▸Internal link equity distribution

Map internal links from homepage to priority pages. High-priority pages should receive the most internal link equity.

▸Breadcrumbs

Present and marked up with BreadcrumbList schema. Supports crawlability and featured snippet eligibility.

▸Redirect chains

No chains longer than 1 hop. Every redirect chain adds latency and loses a fraction of link equity.

On-page & Content Signals

▸Title tag audit

Unique, under 60 characters, primary keyword front-loaded. Check for templated titles that duplicate across categories.

▸Meta description audit

Unique, under 155 characters, action-oriented. Not a ranking factor but directly impacts CTR.

▸H1 uniqueness

One H1 per page, matches searcher intent, not duplicated in title tag.

▸Heading hierarchy

Logical H2/H3/H4 structure. Broken hierarchy confuses both crawlers and LLMs extracting content structure.

▸Content depth vs. intent

Match content depth to query type. Informational queries need comprehensive coverage; transactional queries need clarity and conversion paths.

▸Content freshness signals

Published and modified dates accurate, schema-marked. Stale dates on evergreen content are a soft trust signal problem.

▸Image alt text

Descriptive, not keyword-stuffed. Missing alt text on hero images affects LCP attribution and accessibility.

Structured Data

▸Schema validation

Run all schema through Google Rich Results Test and Schema.org validator. Invalid schema is worse than no schema.

▸Entity schema (Person / Organisation)

Most missed item on audits. Defines who you are to the Knowledge Graph and LLMs. Required for GEO effectiveness.

▸Page-type schema

Article, Product, Service, FAQPage, HowTo, BreadcrumbList — applied correctly to relevant page types.

▸LocalBusiness (if applicable)

NAP consistency between schema and Google Business Profile.

Core Web Vitals & Performance

▸LCP measurement (field data)

From Search Console CWV report, not PSI. Target: under 2.5s on mobile. Identify the LCP element for each page type.

▸INP measurement

Use Web Vitals Chrome extension on real interactions. Target: under 200ms. Focus on JS-heavy interactions first.

▸CLS measurement

Check for layout shift sources with DevTools Layout Shift regions. Common sources: images without dimensions, late-loading fonts.

▸TTFB

Time to First Byte above 600ms will prevent good LCP regardless of other optimisations. CDN and server response time audit.

▸Third-party script audit

Identify all third-party scripts loading on key pages. Measure their thread blocking impact.

Technical Infrastructure

▸HTTPS & mixed content

All resources served over HTTPS. Mixed content warnings block browser trust signals.

▸Mobile usability

GSC Mobile Usability report. Tap target sizes, font sizes, viewport configuration.

▸JavaScript rendering

For JS-heavy sites: compare rendered DOM vs. source HTML for critical content. Use Google's URL Inspection 'View Tested Page' to see what Googlebot sees.

▸Server error monitoring

5xx errors in GSC log and server logs. Persistent 500s on any URL are crawl waste.

▸Faceted navigation / parameter handling

The single largest crawl budget drain on e-commerce sites. Requires URL parameter handling in GSC or robots.txt.

AI Visibility Layer

▸Entity disambiguation check

Ask ChatGPT, Perplexity, Gemini, and Claude: 'Who is [brand/person]?' Document what each returns. This is your GEO baseline.

▸llms.txt presence

Plain-text file at site root that helps AI crawlers understand site structure and priority content.

▸Answer-formatted content on key pages

Does the homepage directly answer 'What does [brand] do?' in the first 150 words? LLMs extract from page opens.

▸Citation footprint

Count credible external mentions of the brand. Wikipedia, industry publications, and authority directories are the highest-signal sources for LLM entity trust.

▸Consistent brand co-occurrence

Search '[brand] + [primary keyword]' — how many results appear? Low co-occurrence = weak entity-topic association in LLMs.

How to prioritise

After running all 40 checks, I categorise findings into three buckets:

Critical (fix first): Anything preventing crawl, indexation, or rendering of important pages. Zero upside until resolved.

High leverage (fix second): LCP, entity schema, internal link structure, and content depth on high-priority pages.

Optimisation (fix third): Everything else — meta descriptions, CLS, pagination, third-party scripts.

What most tools miss

1. Rendered vs. source content discrepancy. Tools that don't render JavaScript report what's in the HTML source, not what Googlebot actually sees. On Next.js sites with heavy client-side rendering, these can be completely different documents.

2. The AI visibility layer. No mainstream SEO tool yet measures entity disambiguation, LLM citation likelihood, or co-occurrence signals. This is a manual check — ask the AI systems directly.

3. Crawl budget reality. Tools report crawl depth as a graph. What matters is the ratio of meaningful URLs to noise URLs being crawled. A site with 10,000 real pages and 500,000 parameter URLs is burning 98% of its crawl budget on nothing.

4. Business context alignment. A tool reports that you have 847 pages with thin content. That's data. The audit question is: which 12 of those pages are on paths that affect revenue? That requires understanding the business, not just the site.

Deliverable format

The output of this audit is a prioritised action plan — not a 200-item issue list. Each finding is assigned a priority tier (Critical / High / Optimisation), an estimated effort level, and a rationale explaining why it matters in the context of this specific site's goals.

The goal isn't to fix everything. It's to fix the right things first — and create a clear roadmap for the rest.