1 comments

  • rs1996 18 hours ago

    We built a consumer app that does deep ingredient and health analysis (food, supplements, skincare, cat treats, etc.) using llama-3.3-70b in production.

    Some numbers from the last month: - ~3.0M+ tokens processed - ~$2.07 total inference cost - ~0.5–0.6 cents per scan - Median latency ~3s, typical range 3–5s - Long prompts, structured outputs, ingredient-level caching

    This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.

    The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.

    Happy to answer questions or share more details if useful.