Every engagement starts with an audit. Not because audits are impressive deliverables, but because you cannot prioritise what you haven't measured. The checklist below is the exact process I run — across 8 categories, 40 checks — before making a single recommendation.
Tools generate reports. This process generates a diagnosis. The difference is that a report tells you what's there; a diagnosis tells you what matters and what to do first.
An SEO audit is only as useful as its prioritisation. A list of 200 issues is not a strategy — it's noise.
Before you open any tool
The first step isn't crawling the site. It's understanding the business context: What are the primary conversion actions? Which pages drive revenue? Where does current organic traffic come from, and where does it drop off? Without this, you'll optimise for rankings that don't convert.
Get access to: Google Search Console, Google Analytics (or equivalent), the CMS, and a Cloudflare/CDN dashboard if available. An audit without GSC data is a blind audit.
The 40-point checklist
Crawl & Indexability
▸Robots.txt audit
Confirm no critical paths are disallowed. Check for wildcard rules that accidentally block CSS or JS.
▸XML sitemap validation
Sitemap submitted to GSC, no 4xx/5xx URLs included, updated automatically on publish.
▸Crawl depth analysis
Important pages reachable within 3 clicks from homepage. Deep-buried pages lose crawl priority.
▸Orphaned pages
Pages in sitemap or GSC with zero internal links. Crawlers find them, users don't.
▸Crawl budget check
Run Screaming Frog with a crawl log. Identify what fraction of crawl budget goes to low-value URLs (faceted nav, session parameters, infinite scroll).
▸noindex audit
Verify noindex isn't applied to pages that should be indexed. Common on staging environments merged to production.
Indexation & Coverage
▸GSC Coverage report
Review all 'Excluded' URLs. 'Crawled but not indexed' is the most important signal — Google's soft rejection of thin or duplicate content.
▸Canonicalisation
Self-referencing canonicals on all indexable pages. No canonical conflicts between paginated pages, parameter URLs, or www/non-www variants.
▸Duplicate content
Run a content similarity check. Internal duplicates dilute authority and confuse crawlers. Common in e-commerce (product variants, filtered pages).
▸Hreflang (if multilingual)
Correct language-region codes, reciprocal annotations, and no circular references.
Site Architecture
▸URL structure
Clean, descriptive, lowercase, hyphenated. No unnecessary parameters, no dynamic IDs as primary URLs for important pages.
▸Internal link equity distribution
Map internal links from homepage to priority pages. High-priority pages should receive the most internal link equity.
▸Breadcrumbs
Present and marked up with BreadcrumbList schema. Supports crawlability and featured snippet eligibility.
▸Pagination handling
rel='next'/'prev' removed (deprecated). Ensure paginated content is accessible and consolidated where appropriate.
▸Redirect chains
No chains longer than 1 hop. Every redirect chain adds latency and loses a fraction of link equity.
On-page & Content Signals
▸Title tag audit
Unique, under 60 characters, primary keyword front-loaded. Check for templated titles that duplicate across categories.
▸Meta description audit
Unique, under 155 characters, action-oriented. Not a ranking factor but directly impacts CTR.
▸H1 uniqueness
One H1 per page, matches searcher intent, not duplicated in title tag.
▸Heading hierarchy
Logical H2/H3/H4 structure. Broken hierarchy confuses both crawlers and LLMs extracting content structure.
▸Content depth vs. intent
Match content depth to query type. Informational queries need comprehensive coverage; transactional queries need clarity and conversion paths.
▸Content freshness signals
Published and modified dates accurate, schema-marked. Stale dates on evergreen content are a soft trust signal problem.
▸Image alt text
Descriptive, not keyword-stuffed. Missing alt text on hero images affects LCP attribution and accessibility.
Structured Data
▸Schema validation
Run all schema through Google Rich Results Test and Schema.org validator. Invalid schema is worse than no schema.
▸Entity schema (Person / Organisation)
Most missed item on audits. Defines who you are to the Knowledge Graph and LLMs. Required for GEO effectiveness.
▸Page-type schema
Article, Product, Service, FAQPage, HowTo, BreadcrumbList — applied correctly to relevant page types.
▸LocalBusiness (if applicable)
NAP consistency between schema and Google Business Profile.
▸SiteLinksSearchBox / WebSite
Enables search functionality in SERPs. Minor but quick win.
Core Web Vitals & Performance
▸LCP measurement (field data)
From Search Console CWV report, not PSI. Target: under 2.5s on mobile. Identify the LCP element for each page type.
▸INP measurement
Use Web Vitals Chrome extension on real interactions. Target: under 200ms. Focus on JS-heavy interactions first.
▸CLS measurement
Check for layout shift sources with DevTools Layout Shift regions. Common sources: images without dimensions, late-loading fonts, injected banners.
▸TTFB
Time to First Byte above 600ms will prevent good LCP regardless of other optimisations. CDN and server response time audit.
▸Third-party script audit
Identify all third-party scripts loading on key pages. Measure their thread blocking impact. Tag managers, chat widgets, and analytics are common offenders.
Technical Infrastructure
▸HTTPS & mixed content
All resources served over HTTPS. Mixed content warnings block browser trust signals.
▸Mobile usability
GSC Mobile Usability report. Tap target sizes, font sizes, viewport configuration.
▸JavaScript rendering
For JS-heavy sites: compare rendered DOM vs. source HTML for critical content. Use Google's URL Inspection 'View Tested Page' to see what Googlebot sees.
▸Server error monitoring
5xx errors in GSC log and server logs. Persistent 500s on any URL are crawl waste.
▸faceted navigation / parameter handling
The single largest crawl budget drain on e-commerce sites. Requires URL parameter handling in GSC or robots.txt.
AI Visibility Layer
▸Entity disambiguation check
Ask ChatGPT, Perplexity, Gemini, and Claude: 'Who is [brand/person]?' Document what each returns. This is your GEO baseline.
▸llms.txt presence
Plain-text file at site root that helps AI crawlers understand site structure and priority content.
▸Answer-formatted content on key pages
Does the homepage directly answer 'What does [brand] do?' in the first 150 words? LLMs extract from page opens.
▸Citation footprint
Count credible external mentions of the brand. Wikipedia, industry publications, and authority directories are the highest-signal sources for LLM entity trust.
▸Consistent brand co-occurrence
Search '[brand] + [primary keyword]' — how many results appear? Low co-occurrence = weak entity-topic association in LLMs.
How to prioritise
After running all 40 checks, I categorise findings into three buckets:
Critical (fix first): Anything preventing crawl, indexation, or rendering of important pages. These have zero upside until resolved.
High leverage (fix second): LCP, entity schema, internal link structure, and content depth on high-priority pages.
Optimisation (fix third): Everything else — meta descriptions, CLS, pagination, third-party scripts.
What most tools miss
Automated tools like Screaming Frog, Semrush, and Ahrefs cover crawl mechanics well. They miss four things that I check manually on every audit:
1. Rendered vs. source content discrepancy. Tools that don't render JavaScript report what's in the HTML source, not what Googlebot actually sees. On Next.js sites with heavy client-side rendering, these can be completely different documents.
2. The AI visibility layer. No mainstream SEO tool yet measures entity disambiguation, LLM citation likelihood, or co-occurrence signals. This is a manual check — ask the AI systems directly.
3. Crawl budget reality. Tools report crawl depth as a graph. What matters is the ratio of meaningful URLs to noise URLs being crawled. A site with 10,000 real pages and 500,000 parameter URLs is burning 98% of its crawl budget on nothing.
4. Business context alignment. A tool reports that you have 847 pages with thin content. That's data. The audit question is: which 12 of those pages are on paths that affect revenue? That requires understanding the business, not just the site.
Deliverable format
The output of this audit is a prioritised action plan — not a 200-item issue list. Each finding is assigned a priority tier (Critical / High / Optimisation), an estimated effort level, and a rationale explaining why it matters in the context of this specific site's goals.
The goal isn't to fix everything. It's to fix the right things first — and create a clear roadmap for the rest.