After 500 Website Audits, I Built the Tool That Didn't Exist
Most audit tools still check for 2020 web standards. After manually testing the same issues across 500+ sites, I built GuardianScan to catch what Lighthouse, WAVE, and Screaming Frog miss in 2025.
There's a moment in every website audit where I open four different tools, cross-reference their outputs, and manually check another dozen things they all miss. It's tedious. And after doing this 500 times—literally 500+ sites across different teams and companies between 2010 and 2025—it started to feel less like diligence and more like a system failure.
Not my failure—the tools' failure. They're all built for a web that existed five years ago.
I kept thinking: someone must have built a comprehensive tool that covers everything that matters in 2025—Core Web Vitals, WCAG 2.2, modern framework optimisation, AI search readiness, security headers. Someone must have recognised that checking these things separately wastes 20 minutes per audit.
Turns out, nobody had. So in early 2025, I built GuardianScan.
The Pattern I Couldn't Unsee
If you've read my post on pattern matching in decision making, you know our brains are wired to spot repetition. After about the fiftieth audit in 2022, mine started flagging the same issues on autopilot:
- Missing or misconfigured security headers (CSP, HSTS, X-Frame-Options)
- Images without explicit width/height causing layout shift
- Poor font loading strategies tanking INP scores
- Accessibility violations that automated tools miss (but are obvious in manual testing)
- Next.js apps shipping massive JavaScript bundles for mostly static content
- Heavy client-side rendering on pages with minimal interactivity
The pattern was clear: modern frameworks had evolved faster than audit tools.
Lighthouse is brilliant for what it does, but it's framework-agnostic by design. It can't tell you if you're using React Server Components correctly, or if your Next.js app is misconfigured for optimal performance. Accessibility checkers like WAVE and Axe catch maybe 30-40% of WCAG violations according to WebAIM research—the rest require manual testing. SEO tools like Screaming Frog focus on meta tags and sitemaps but miss Core Web Vitals entirely.
And nobody was checking all of this together comprehensively in under a minute. I know because I searched. Spent three weeks in late 2024 evaluating every audit tool I could find. They all either focused on legacy compatibility or missed modern requirements entirely. None of them checked for AI search engine readiness—which in 2025 is inexcusable.
The Gap Between Standards and Reality
Here's what changed between 2020 and 2025 that most audit tools haven't caught up with:
AI Search Engines Changed Everything
In 2020, Google was the dominant search engine. In 2025, ChatGPT search, Perplexity, and Google's AI Overviews account for a significant portion of traffic. But they don't crawl websites the same way.
AI search engines prioritise structured data. They need clean Schema.org markup, semantic HTML, clear content hierarchy, and direct answers to questions. A site optimised for traditional SEO might be invisible to AI crawlers.
I audited a SaaS company's site in August 2025 that ranked well on Google but got zero traffic from AI search engines. Their structured data was malformed, their FAQ section wasn't marked up properly, and their article schema was missing critical fields. Traditional SEO tools gave them a 94/100. They were invisible to Perplexity.
Most audit tools still don't check for this. They'll validate your meta description but won't tell you if your Schema.org markup is parseable by AI or if your content is structured for featured snippets and direct answers.
Core Web Vitals Evolved And Tools Didn't Notice
In March 2024, Google replaced First Input Delay (FID) with Interaction to Next Paint (INP). This wasn't a minor tweak—it changed how we measure responsiveness. FID only measured the delay before the first interaction started processing. INP measures the full cycle: input delay, processing time, and rendering delay.
The threshold is strict: under 200 milliseconds. Sites that felt "snappy enough" started failing. I saw this firsthand on a SaaS dashboard in April 2024—interactions that seemed instant were taking 340ms when I measured the full paint cycle. The Search Console data showed rankings dropping week over week.
Most audit tools I was using in late 2024? Still reporting FID. Or worse, reporting both FID and INP without explaining which one actually affects rankings (spoiler: it's INP).
WCAG Raised the Bar And Most Tools Didn't Notice
WCAG 2.2 added nine new success criteria, and they're not trivial:
- Touch targets - 44×44 pixels minimum. Harder than it sounds on dense interfaces. I audited a SaaS dashboard in June where icon buttons were 32×32px. Failed WCAG 2.2, but their previous audit tool (still checking against 2.1) didn't flag it.
- Forms can't require redundant entry - Breaks a lot of checkout flows. Asking users to enter their address twice? That's now a Level A violation. (GuardianScan flags potential violations by detecting duplicate input patterns.)
- Authentication must offer accessible alternatives - No more "remember this grid of images" CAPTCHA. You need password managers, biometric logins, or other accessible options. (GuardianScan detects common inaccessible auth patterns.)
Government websites need WCAG 2.1 Level AA compliance by April 2026, but forward-thinking organisations are already targeting 2.2. The gap between what's legally required and what's actually good practice keeps widening.
And here's the kicker: automated accessibility audits catch 30-40% of issues at best according to WebAIM's research. The rest—colour contrast in complex layouts, keyboard navigation logic, screen reader compatibility—require manual testing or tools that actually understand context.
Next.js Changed the Defaults
Next.js 15, released in October 2024, fundamentally changed how caching works. Fetch requests, GET route handlers, and client navigations are no longer cached by default. This is a massive shift from previous versions where aggressive caching was the default.
For teams migrating from Next.js 14 or earlier, this means apps that performed well might suddenly feel sluggish—not because they're slower, but because the framework stopped doing invisible performance work. You have to opt into caching explicitly now.
Most audit tools? They have no idea. They'll measure your LCP and INP scores, but they won't tell you that your Next.js site is shipping 400KB of JavaScript for what appears to be a mostly static page.
React Server Components Became Mainstream But Audits Didn't
About 90% of new enterprise applications are now cloud-native, built with modern frameworks. React Server Components (RSC) became the default for Next.js 13+ apps, yet I kept seeing the same mistakes.
I've audited dozens of Next.js apps where everything runs client-side—massive bundles, poor hydration performance, unnecessary rerenders—when half the components could be server-rendered with zero client JavaScript.
Example from July 2025: An e-commerce site had a 287KB JavaScript bundle for a product listing page with minimal interactivity. The bundle size alone suggested potential over-rendering on the client. After optimisation, they cut it to 134KB and dropped their LCP from 4.1s to 2.3s.
Most tools just report the bundle size. They don't flag that a product listing page probably shouldn't need 287KB of JavaScript to render.
What I Learned Building the Solution
Building GuardianScan forced me to think deeply about what actually matters in a website audit. Not what's easy to measure, but what creates real value.
Comprehensive but Fast
The ideal audit would check everything that matters for modern websites. Every WCAG 2.2 criterion, every performance metric, every security header, SEO fundamentals, AI search readiness. Most tools either check a subset of these or take forever to run.
I've written before about systems thinking and feedback loops. The value of feedback is inversely proportional to its latency. Feedback that takes five minutes is exponentially more useful than feedback that takes an hour, even if it's slightly less comprehensive.
So I optimised for 45 seconds for a comprehensive scan. That's fast enough to check before every deploy. Fast enough to run during a meeting. Fast enough that you actually use it, rather than putting it off until "later."
This meant ruthless focus on modern standards:
- Check everything that matters for 2025 web development
- Skip legacy browser compatibility checks (if you're supporting IE11, this isn't your tool)
- Focus on what actually affects users and search visibility
- Provide actionable diagnostics, not vague scores
Context Beats Scores
Lighthouse gives you a performance score out of 100. It's satisfying—people love scores—but it's also reductive. A score of 78 doesn't tell you whether the problem is server response time, render-blocking resources, or massive images.
I wanted GuardianScan to surface insights, not just numbers. Instead of "Performance: 78/100," it tells you:
- LCP is 3.2s (target: under 2.5s), caused by an unoptimised hero image at /images/hero.jpg
- INP is 340ms (target: under 200ms), likely from 147KB of client-side JavaScript
- CLS is 0.15 (target: under 0.1), from images without width/height attributes
Same data, but actionable. You know exactly what to fix and why it matters.
This is where framework-aware detection pays off. GuardianScan can identify Next.js sites (from meta tags and bundle paths) and flag when images aren't using modern optimisation, or when fonts are loading inefficiently. Generic tools can't make these connections—they don't know what framework you're using or what best practices apply.
The 70% Rule And Why That's Enough for Most Teams
Here's an uncomfortable truth I encountered early: perfect accessibility auditing requires human judgement. Automated tools catch the mechanical stuff—missing alt text, insufficient colour contrast, invalid ARIA—but they can't evaluate whether your alt text is meaningful, or whether your keyboard navigation flow makes logical sense.
WebAIM's research consistently shows automated tools catch 30-40% of WCAG issues. I aimed higher. GuardianScan catches about 70% by combining standard automated checks with smarter pattern detection (like finding navigation links missing accessible names, or detecting keyboard focus traps in modals).
I leaned into the limitation rather than overpromising. GuardianScan catches the 70% of accessibility issues that are objectively testable, clearly documents what it checks, and reminds you that manual testing is still necessary for full WCAG 2.2 compliance.
Honest tools beat tools that overpromise. If you're checking 47 specific criteria, say that. Don't imply you're checking everything when you're not.
Modern Standards as a Feature
Most audit tools are conservative by nature. They check against established standards because those are well-documented and legally defensible. But "established standards" often means "what was true two years ago."
I decided to position GuardianScan around comprehensive modern standards—not as an early adopter gimmick, but because that's what actually matters if you're building websites in 2025:
- Check INP, not FID - Google replaced FID with INP in March 2024. Your site needs an INP under 200ms to pass Core Web Vitals. Most tools still report the old metric.
- Target WCAG 2.2 Level AA - Government sites need WCAG 2.1 by April 2026, but 2.2 is where compliance is heading. The nine new criteria (44×44px touch targets, redundant entry prevention, accessible authentication) aren't optional.
- Validate AI search readiness - Check Schema.org markup quality, content structure for AI parsing, FAQ schema, article schema, breadcrumb navigation. If AI search engines can't parse your content, you're invisible to half your potential traffic.
- Detect framework-specific issues - Identify Next.js sites and flag when they're shipping disproportionately large bundles for static content. A blog post shouldn't need 300KB of JavaScript.
- Modern security standards - CSP, HSTS, X-Frame-Options, security headers that actually matter in 2025.
This creates a clear tradeoff: if you're maintaining legacy systems or supporting old browsers, GuardianScan isn't for you. But if you're building modern web applications for 2025 and beyond, it checks everything comprehensively.
The Product as System Design
Building GuardianScan reinforced something I've learned over fifteen years: products are systems, and systems thinking applies to how you design them.
The temptation is to add features. More checks, more frameworks, more integrations, more customization. Every feature feels valuable in isolation. But features interact—they create complexity, cognitive load, maintenance burden.
I kept asking: what's the simplest system that delivers the core value? And the core value isn't "comprehensive auditing"—it's "quickly identify the issues that actually matter."
So GuardianScan does comprehensive checks across seven categories:
- Modern Standards (Core Web Vitals, framework detection, INP measurement)
- Performance (LCP, INP, CLS, resource optimisation, bundle analysis)
- AI Search Optimisation (Schema.org markup, content structure, FAQ/Article schema, semantic HTML)
- SEO Fundamentals (meta tags, sitemaps, internal linking, crawlability)
- Security (HTTPS, security headers, CSP configuration)
- Accessibility (WCAG 2.2 Level AA automated checks, touch targets, keyboard navigation)
- Code Quality (bundle sizes, console errors, render-blocking resources)
Not 300 legacy compatibility checks. Not customisable rulesets. Just comprehensive coverage of what actually matters for modern web development in 2025.
This is what I call "productive chaos" from my earlier post—clear boundaries, flexible execution. The boundaries are non-negotiable (these specific checks, this performance target, this accuracy threshold). The execution is optimised for speed and clarity.
Who This Is Actually For
I built GuardianScan for myself. That sounds narcissistic, but it's actually the best product development strategy I know: solve a problem you have, that you understand deeply, that you can evaluate honestly.
But "myself" is a proxy for three groups I've worked with extensively over fifteen years:
Agencies checking client work before delivery. You need to catch embarrassing issues before the client does—like the time an agency shipped a site with a 0.42 CLS score (should be under 0.1) and the client noticed the layout jumping on mobile. You need to demonstrate technical competence. You need it to be fast because you're juggling twelve projects. You need reports that clients can understand without a CS degree.
Freelancers doing health checks on new projects. You inherit a codebase and need to quickly assess what's broken. I've done this dozens of times—stakeholder says "our site feels slow," you audit it, find 47 issues, and need to prioritise the five that matter most. You need to justify your recommendations with data ("your INP is 380ms, Google's threshold is 200ms"), not just opinions.
In-house teams monitoring post-deployment regressions. You shipped something. Did it break accessibility? Did the performance degrade? Did someone accidentally commit a 2MB unoptimised image? (Yes, I've seen this happen three times in the past year.) You need to know immediately, not during the quarterly audit.
All three need the same thing: fast, accurate, modern-standard checks that surface actionable issues. Not vanity metrics, not academic perfectionism—practical, fixable problems.
What I'm Not Solving
It's worth being explicit about what GuardianScan doesn't do:
It doesn't replace manual accessibility testing. If you need full WCAG compliance, you still need human evaluators. GuardianScan catches the automated 70%, which is valuable, but it's not certification.
It doesn't do ongoing monitoring. It's a point-in-time audit, not a monitoring service like Sentry or LogRocket. You run it when you need it—before launch, after deploys, when something feels wrong.
It doesn't customise to your exact standards. Some tools let you configure thresholds, disable checks, add custom rules. GuardianScan is opinionated: these are the modern standards, these are the thresholds that matter. If you disagree, it's probably not the right tool for you.
It doesn't audit authenticated content. It checks public pages. If your app requires login, GuardianScan can't reach those screens. This is a genuine limitation I might address later, but for now, it's out of scope.
It can't analyse your source code. GuardianScan scans the live site—what gets delivered to browsers. For deeper insights like checking individual component usage, analysing unused dependencies, or reviewing build configurations, you'd need source code access. Future integrations (like build plugins or GitHub apps) could add this, but the current focus is on what's detectable from the live site: performance, security headers, accessibility, and SEO.
Being clear about limitations isn't weakness—it's honesty. Every tool makes tradeoffs. I'd rather be upfront about mine than overpromise.
What This Means for Your Next Audit
If you're still manually checking sites across four different tools, you're spending 20-30 minutes per audit on work that could be automated. Over a year, that's dozens of hours.
Here's what you can do instead:
For Next.js sites specifically: Detect when you're shipping heavy JavaScript bundles for mostly static content, validate your Core Web Vitals meet Google's thresholds (LCP under 2.5s, INP under 200ms, CLS under 0.1), and catch accessibility issues before they become compliance problems.
For any modern site: Get comprehensive checks covering performance, AI search readiness, security (CSP, HSTS, X-Frame-Options), accessibility (WCAG 2.2 Level AA automated checks), SEO fundamentals, and code quality—in 45 seconds.
For agencies and freelancers: Run audits before client delivery, justify your recommendations with specific data points, and catch issues that would otherwise embarrass you post-launch.
Pre-Orders Open November 1st
I'm opening pre-orders on November 1st at guardianscan.ai. The tool is functional—I've been testing it on real sites for months—but I want to refine the reporting UI and add a few more framework-specific checks before the full launch.
Pricing is £24. That's intentionally positioned as "cheaper than an hour of your time." If it saves you thirty minutes on one audit, it's paid for itself. If you run ten audits a year, the ROI is obvious.
I'm not trying to build a SaaS empire or raise venture capital. I'm solving a real problem that I have, that other developers and agencies have, in a way that existing tools don't. If it's useful to a hundred people, that's success. If it's useful to a thousand, even better.
What I'd Do Differently
Building GuardianScan taught me things I wish I'd known earlier:
Start with the positioning, not the features. I spent weeks debating whether to include X or Y check before realizing the real question was: who is this for, and what problem does it solve for them? Once I answered that, feature decisions became obvious.
Opinionated tools are easier to build and more valuable. Every time I considered adding customization—"let users configure thresholds" or "support custom rulesets"—I created complexity that didn't serve the core value. The best tools have a point of view.
Speed is a feature, not a constraint. I could have built a more comprehensive audit that takes five minutes. But optimising for 45 seconds forced me to prioritise ruthlessly, which made the tool better. Constraints drive clarity.
Modern standards are a competitive moat. Most tools optimise for compatibility with legacy systems. Building comprehensive checks for modern frameworks (Next.js 15, React 19), modern standards (WCAG 2.2, INP), and emerging requirements (AI search optimisation) makes you less universally applicable but far more valuable to the people who matter—developers building new things, not maintaining legacy systems that should have been retired years ago.
The Pattern Continues
After building GuardianScan, I'm noticing the same pattern in other areas. Tools that were state-of-the-art in 2020 haven't kept pace with how we build software in 2025. The gap between modern practices and available tooling keeps widening.
Maybe this is always true. Technology evolves faster than the ecosystem around it. Someone builds a great tool for the current paradigm, and three years later, the paradigm has shifted but the tool hasn't.
This creates opportunity—not just commercial opportunity, but the opportunity to build things that should exist but don't. To notice gaps that others overlook because they're too close to the existing solutions.
That's what GuardianScan is: a gap I noticed, a tool I needed, a pattern I couldn't unsee.
If you've ever had that feeling—"why doesn't this tool exist?"—you might be onto something. Sometimes the answer is "because it's hard." But sometimes the answer is "because nobody's built it yet."
And that's an invitation.
Want to try GuardianScan when it launches? Pre-orders open November 1st at guardianscan.ai. Get notified when the tool goes live and catch performance, accessibility, and security issues before they reach production.
Michael Pilgram is the founder of Numen Technology, a UK web development company. Over fifteen years working across different teams and industries, he's built hundreds of websites and conducted 500+ technical audits for e-commerce, SaaS, and publishing platforms. He writes about web development, systems thinking, and decision-making at michaelpilgram.co.uk.
Get posts like this delivered to your feed reader:
Subscribe via RSS