Skip to main content
Back to Projects

Scrape Runner Tool

Visit
2022
NodeJSPuppeteer

A configurable web scraping framework using NodeJS and Puppeteer. Designed for running multiple scraping jobs with scheduling and error handling.

The Story

After building several one-off scrapers, I wanted a reusable framework that could handle common scraping patterns. Scrape Runner was designed to be configurable and robust.

Architecture

  • Plugin-based scraper definitions
  • Puppeteer for JavaScript-heavy sites
  • Built-in retry and error handling
  • Scheduling with cron expressions
  • Output to multiple formats (JSON, CSV, DB)

Technical Highlights

The framework used a job queue system to manage multiple concurrent scrapes. Puppeteer handled dynamic content while a plugin system allowed defining new scrapers without touching core code.

What I Learned

Headless BrowsersJob SchedulingError Recovery

Project Discontinued

Moved on for now...