AI AutomationArtificial IntelligenceSitecore

We Built an Intelligent Content Audit System for a Healthcare Website on Sitecore

Learn how we built an AI-powered content audit system for a healthcare site on Sitecore, evaluating 4,200+ pages across providers, services, conditions, and locations to boost SEO and ensure accuracy.

10 min read
Healthcare content auditing

The best automation doesn't announce itself. It works quietly in the background, surfacing insights that would take a human team months to compile, and presenting them in a way that makes the next action obvious. This is precisely what we set out to build when a regional healthcare network came to us with a problem that will sound familiar to anyone managing enterprise content: thousands of pages, no clear picture of what was current, and a nagging sense that outdated information was hurting both patients and search rankings.

The Scale of the Problem

Healthcare content presents unique challenges that most industries never encounter. Medical information isn't just marketing copy that ages gracefully into irrelevance. Outdated clinical guidance can mislead patients. Provider pages with departed physicians frustrate people seeking care. Service pages describing discontinued programs waste everyone's time. And search engines have become remarkably good at detecting when content no longer deserves to rank.

Our client's Sitecore instance contained over 4,200 pages spanning five distinct content types: provider profiles, service line pages, condition and treatment content, location pages, and a medical blog that had been publishing weekly for six years. The content team consisted of three people. Manual auditing at any meaningful frequency was simply not possible.

The traditional approach would involve spreadsheets, sampling, and educated guesses about what needed attention. We proposed something different: an agentic system that could evaluate every page against intelligent criteria, learn what "fresh" means for each content type, and produce actionable recommendations rather than overwhelming data dumps.

Understanding Content Freshness as a Spectrum

Before writing a single line of automation code, we needed to reframe how the organization thought about content freshness. The binary of "current" versus "stale" doesn't capture the nuance that healthcare content requires.

We developed a freshness taxonomy with four states: evergreen content that should remain stable, time-sensitive content requiring regular verification, triggered content that needs updates based on external events, and decay-prone content that loses value predictably over time.

Provider pages, for instance, contain elements from multiple categories. A physician's medical school and board certifications are evergreen. Their office locations and accepted insurance plans are triggered by external changes. Their clinical interests and patient reviews are decay-prone as preferences shift and new feedback accumulates.

This framework became the foundation for our automated evaluation logic. Instead of asking "when was this page last modified?" we trained the system to ask "what kind of freshness does each element require, and is that requirement being met?"

The Architecture of Intelligent Auditing

Our system connects three capabilities that work together: content extraction and parsing, contextual evaluation against freshness rules, and prioritized recommendation generation.

The extraction layer pulls content from Sitecore's content tree via the Item Web API, preserving the hierarchical relationships and metadata that inform our evaluation. We parse each page into semantic components rather than treating it as a monolithic block of text. A provider page becomes a structured object with discrete fields for credentials, locations, specialties, biography, and contact information.

The evaluation layer applies different rules to each component based on its freshness category. Evergreen elements are checked for accuracy against authoritative sources when possible. Time-sensitive elements are flagged based on age thresholds calibrated to each content type. Triggered elements are monitored through integration with external data sources like the National Provider Identifier registry for physician credentialing.

The recommendation layer synthesizes findings into prioritized action items. Rather than presenting a list of 847 pages that need "some kind of attention," the system outputs specific tasks: "Update insurance information for 23 providers in the cardiology service line" or "Verify clinical accuracy of 7 condition pages referencing treatment protocols published before 2022."

Provider Pages: Verifying the Human Element

Provider pages required the most sophisticated handling because they represent individual people whose professional circumstances change frequently. Physicians join and leave practices, earn new certifications, shift their clinical focus, and update their availability.

Our automation cross-references provider data against three external sources: the NPI registry maintained by CMS, state medical board databases, and the health system's own credentialing records accessed through a secure internal API. Discrepancies trigger automatic flagging with specific recommendations.

We discovered that 12% of provider pages listed credentials that couldn't be verified against current registry data. Some were minor variations in how certifications were described. Others revealed physicians who had left the practice but whose pages remained live. Three pages referenced board certifications that had lapsed.

The system also evaluates the qualitative elements of provider pages. Biography text is analyzed for dated references, such as mentions of "over 20 years of experience" when the physician's medical school graduation would place them at 28 years. Patient testimonials are flagged when they reference experiences more than three years old, as healthcare delivery changes substantially over time.

Service Line Pages: Mapping Capability to Currency

Service pages describe what the healthcare system offers across specialties like cardiology, oncology, orthopedics, and primary care. These pages serve both patients researching their options and referring physicians evaluating where to send their patients.

Our freshness evaluation for service pages focuses on three dimensions: clinical accuracy, operational currency, and competitive positioning.

Clinical accuracy assessment compares described treatments and technologies against current standard-of-care guidelines. The system flags pages that emphasize capabilities which have become routine rather than differentiating, or that fail to mention newer approaches the health system has adopted.

Operational currency verification checks that referenced locations, hours, scheduling processes, and contact information remain accurate. Integration with the health system's operational data identified 34 service pages referencing phone numbers that now route to general switchboards rather than dedicated service line coordinators.

Competitive positioning analysis evaluates whether service descriptions reflect the health system's current market position. Pages emphasizing distinctions that competitors have since matched, or failing to highlight unique capabilities, receive lower freshness scores.

Condition Pages: The Challenge of Medical Accuracy

Condition and treatment pages presented our greatest responsibility. Patients read these pages when they're scared, confused, and seeking guidance. Outdated medical information isn't just a brand problem—it's an ethical issue.

We implemented a multi-layer verification approach. The first layer checks publication dates of any cited clinical studies or guidelines. Medical knowledge evolves continuously, and a page about diabetes management citing research from 2018 may be presenting outdated recommendations.

The second layer performs semantic analysis to identify claims that may require verification. Statements about treatment efficacy, recovery timelines, or risk factors are flagged for clinical review when they appear to conflict with current evidence or when the underlying sources are no longer current.

The third layer monitors external medical literature for updates relevant to each condition page. When major clinical guidelines are revised or significant new research is published, affected pages are automatically queued for review.

This approach identified 67 condition pages requiring clinical review, with 23 flagged as high priority due to references to treatment approaches that have been superseded by newer protocols. The content team was able to work with clinical advisors to prioritize updates based on patient impact rather than arbitrary schedules.

Location Pages: The Moving Target

Healthcare organizations constantly adjust their physical footprint. Clinics open and close, services migrate between facilities, and operational details shift with regulatory requirements and market conditions.

Location pages were evaluated against real-time operational data including current hours of operation, services offered at each site, parking and transportation information, and accessibility features. The system also verified that embedded maps and directions remained accurate as road networks and landmarks change.

We discovered that 28% of location pages contained at least one piece of operationally inaccurate information. Most were minor issues like outdated holiday hours or parking validation details. Several were more significant, including two pages for locations that had closed and three that listed services no longer offered at those sites.

The Medical Blog: Managing Six Years of Clinical Content

The blog archive contained 312 posts spanning topics from seasonal health tips to explanations of complex surgical procedures. Unlike marketing blogs where older content simply becomes less relevant, medical blog posts can become actively harmful if they present outdated clinical guidance.

Our automated analysis categorized blog posts into three groups based on their content characteristics and decay risk.

Seasonal and awareness content, such as flu prevention tips or heart health month features, was evaluated for clinical accuracy of any specific recommendations while acknowledging that the general messaging often remains appropriate across years.

Procedure and treatment explainers were subjected to rigorous currency checks. Posts describing specific surgical techniques, medication protocols, or diagnostic approaches were cross-referenced against current clinical guidelines. Twelve posts were flagged as requiring updates due to evolved standard of care.

Patient story and community content was evaluated with lighter criteria, focusing on ensuring that referenced physicians remained with the practice and that any clinical claims in patient narratives still aligned with current medical understanding.

The system generated specific recommendations for each blog category: archive and redirect posts that are no longer accurate, update posts that need minor clinical corrections, refresh posts that would benefit from contemporary examples or statistics, and leave alone posts that remain accurate and valuable.

The SEO Impact of Systematic Freshness

Search engines have become sophisticated at evaluating content freshness, particularly for health-related queries where Google's algorithms apply heightened scrutiny under their beneficial purpose guidelines. Our systematic approach to content currency produced measurable improvements in organic search performance.

Within four months of implementing the prioritized refresh plan, the healthcare system saw a 23% increase in organic traffic to condition pages. Provider pages appeared in featured snippets at twice their previous rate. The medical blog began ranking for queries it had previously been invisible for, as refreshed posts demonstrated the currency signals search algorithms reward.

Beyond traffic gains, the audit revealed content gaps that became opportunities. Analysis of condition pages showed that the health system offered services for conditions they had never created content to address. The competitive intelligence component identified topics where competitors ranked well that our client had not covered at all.

Building Sustainable Freshness Operations

The true value of this system extends beyond the initial audit. By establishing automated monitoring and intelligent alerting, we transformed content freshness from a periodic project into an ongoing operational capability.

The system now runs continuous evaluation, flagging content that crosses freshness thresholds rather than waiting for manual review cycles. Content teams receive weekly digests prioritizing the highest-impact updates. Clinical advisors are engaged only when medical accuracy questions arise, rather than being asked to review pages that simply need operational updates.

Dashboard reporting gives leadership visibility into content health metrics they never had access to before. They can see freshness scores by content type, track improvement over time, and understand where content investments are needed before problems become visible to patients or search engines.

The Human Element in Intelligent Automation

The system we built doesn't replace human judgment—it amplifies it. Content strategists still decide which updates to prioritize. Clinical advisors still verify medical accuracy. Writers still craft the refreshed content.

What automation eliminated was the tedious work of identifying what needed attention. The countless hours of clicking through pages, checking dates, cross-referencing information, and compiling spreadsheets. That work now happens continuously and invisibly, surfacing only the insights and recommendations that require human action.

This is what I mean when I talk about human-centered automation. The system was designed around the humans who would use it, not around the technology that powers it. Every output is formatted to support human decision-making. Every alert includes enough context to act without additional research. Every recommendation connects to the outcome it serves.

Healthcare content will always require human expertise, human empathy, and human judgment. Intelligent automation simply ensures that those human capabilities are directed where they matter most, rather than consumed by the mechanical work of keeping thousands of pages current.

For organizations managing complex content at scale, this represents a fundamental shift in what's possible. Not automation that replaces teams, but automation that makes teams capable of maintaining quality standards that would otherwise be unachievable.

W.S. Benks
W. S. Benks

Director of AI Systems and Automation

HT Blue