4 comments

  • nirvanist 14 hours ago

    Modern web pages are cluttered with tracking scripts, analytics, styling, ads, and interactive elements that waste tokens and dilute semantic meaning when processing content for AI systems. This library strips away the noise to give you clean, meaningful HTML that:

    - Reduces token count by 60-90% (fewer API costs) - Improves embedding quality (less noise = better semantic search) - Speeds up processing (smaller payloads = faster inference) - Preserves structure (headings, paragraphs, links stay intact) - Zero dependencies (pure JavaScript, no bloat)

  • html5ninja 12 hours ago

    A colleague shared it with me, and I found it pretty cool because it’s simple. actually we will use this for our scraping workflow. thx

  • ioniq 13 hours ago

    Any chance you’ll add a chunking strategy? If not, I’d love to know what strategy you use for chunking.

      nirvanist 12 hours ago

      thank you for comment, probably not in this module but defiantly I m thinking about how to implement this