We built a consumer app that does deep ingredient and health analysis (food, supplements, skincare, cat treats, etc.) using llama-3.3-70b in production.
Some numbers from the last month:
- ~3.0M+ tokens processed
- ~$2.07 total inference cost
- ~0.5–0.6 cents per scan
- Median latency ~3s, typical range 3–5s
- Long prompts, structured outputs, ingredient-level caching
This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.
The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.
Happy to answer questions or share more details if useful.
We built a consumer app that does deep ingredient and health analysis (food, supplements, skincare, cat treats, etc.) using llama-3.3-70b in production.
Some numbers from the last month: - ~3.0M+ tokens processed - ~$2.07 total inference cost - ~0.5–0.6 cents per scan - Median latency ~3s, typical range 3–5s - Long prompts, structured outputs, ingredient-level caching
This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.
The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.
Happy to answer questions or share more details if useful.