The irony of sending PII to an AI model to detect PII is lost on most "privacy" APIs. This is pure algorithmic detection – the same approach your credit card company uses to validate card numbers.
What's validated (not just pattern-matched):
- Credit cards → Luhn checksum
- Aadhaar → Verhoeff (the algorithm that catches single-digit and transposition errors)
- IBAN → Mod 97 (same as banks use)
- Singapore NRIC → Mod 11 with offset
- Brazilian CPF → Dual Mod 11
Latency breakdown:
- Heuristic scan: O(n) single pass for trigger characters (@, -, digits)
- Pattern matching: Only runs if triggers found
- Validation: Only on pattern matches
- Total: 2-5ms for /fast, 5-15ms for /deep
Quick technical notes for HN:
Why no AI?
The irony of sending PII to an AI model to detect PII is lost on most "privacy" APIs. This is pure algorithmic detection – the same approach your credit card company uses to validate card numbers.
What's validated (not just pattern-matched): - Credit cards → Luhn checksum - Aadhaar → Verhoeff (the algorithm that catches single-digit and transposition errors) - IBAN → Mod 97 (same as banks use) - Singapore NRIC → Mod 11 with offset - Brazilian CPF → Dual Mod 11
Latency breakdown: - Heuristic scan: O(n) single pass for trigger characters (@, -, digits) - Pattern matching: Only runs if triggers found - Validation: Only on pattern matches - Total: 2-5ms for /fast, 5-15ms for /deep
False positive mitigation: - "Order ID: 123-45-6789" won't trigger SSN (negative context) - Timestamps won't match phone patterns (separator requirements) - Random 16-digit numbers won't trigger credit card (Luhn must pass)
The project is great, honestly. But I just put a space in the email by mistake, it wasn't censored.