I got sick of keeping scraped data up to date, so I built this

1 points | by mckinnonr 2 hours ago

1 comments

mckinnonr 2 hours ago
I keep a few vector DBs in sync withs scraped data for a few projects, and it always takes me a few hours to fire up the scrapers, embed everything, and re-sync the database. Usually, my vector DB has stale info.
I built this to create "strategies" which find the correct selectors in the HTML for scraping jobs, and then you can re-use that strategy for different URLs on that site. You can also create schedules to check if data has changed, and if so, it will send to your webhook.
Here's a good example use case: Scraping forum data - trying to keep up to date with new responses, etc. You can configure a strategy, then a job will use that strategy on any interval and deliver webhooks to your API if the content changes and you can either re-embed or do something with the changed data.
This is a side project, so I'd love to get some feedback. It's free for now, so give it a try and give me some constructive feedback if you're open to it!