Back to Projects
Scrape Runner Tool
Visit2022
NodeJSPuppeteer
A configurable web scraping framework using NodeJS and Puppeteer. Designed for running multiple scraping jobs with scheduling and error handling.
The Story
After building several one-off scrapers, I wanted a reusable framework that could handle common scraping patterns. Scrape Runner was designed to be configurable and robust.
Architecture
- Plugin-based scraper definitions
- Puppeteer for JavaScript-heavy sites
- Built-in retry and error handling
- Scheduling with cron expressions
- Output to multiple formats (JSON, CSV, DB)
Technical Highlights
The framework used a job queue system to manage multiple concurrent scrapes. Puppeteer handled dynamic content while a plugin system allowed defining new scrapers without touching core code.
Project Discontinued
Moved on for now...