Content Audit Crawler
A headless browser crawler that automated a content audit across a content-heavy Drupal 8 site. It ingested a JSON list of URLs, validated each page for broken links, missing videos, and unlinked related stories, then output a report of flagged pages.
Built internally at a digital marketing agency to turn a 20+ hour manual task into a 5-minute process. My first hands-on project with Node.js, Puppeteer, and asynchronous JavaScript.
Highlights
- 20+ hour manual audit → 5 minute automated process
- 10,000+ pages crawled and validated
- Lightweight, minimal dependencies, JSON in/out