1 comments

  • perinban 3 hours ago

    Hey HN! I built MetaXuda after getting tired of "buy Windows for ML" advice when working on Apple Silicon.

    Problem: Most ML libraries (XGBoost, scikit-learn) are CUDA-only with zero macOS GPU support. Existing translation layers (ZLUDA) add overhead.

    Solution: Native Rust + Metal runtime from scratch.

    Key features: - 1.1 TOPS throughput (95% of M1 theoretical peak) - Tokio async scheduler with zero race conditions - Multi-tier memory: GPU → RAM → SSD (handles 100GB+ workloads) - 230+ GPU ops (math, transform, ML primitives) - CUDA-style APIs for easy library integration - Bypasses Numba execution path

    Technical approach: - No CUDA/ZLUDA reuse (licensing + perf reasons) - PyO3 wrapper for Python - Arrow-based quantization in-kernel - 93.37% GPU cap to prevent macOS starvation

    Known limitations: - Metal stream limits still undocumented by Apple - CUDA API coverage incomplete (in progress) - Some blocking favors stability over raw speed

    pip install metaxuda

    Open to questions on Metal vs CUDA architecture, Rust async patterns, or Apple GPU quirks. Also looking for feedback on scheduler design.

    License inquiries: p.perinban@gmail.com