Running a real consumer app on a 70B LLM at sub-cent cost per scan

1 points | by rs1996 18 hours ago

1 comments

rs1996 18 hours ago
We built a consumer app that does deep ingredient and health analysis (food, supplements, skincare, cat treats, etc.) using llama-3.3-70b in production.
Some numbers from the last month: - ~3.0M+ tokens processed - ~$2.07 total inference cost - ~0.5–0.6 cents per scan - Median latency ~3s, typical range 3–5s - Long prompts, structured outputs, ingredient-level caching
This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.
The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.
Happy to answer questions or share more details if useful.