I designed a vector engine to be RAM/CPU efficient

1 points | by saeedq 2 days ago

1 comments

saeedq 2 days ago
My name is Saeed, and in general, I'm a deep learning specialist since 2019, but within the past 2 years, I've been writing optimized algorithms with cuda/C++. Recently, I decided to build a vector engine that consumes less resources than vector databases while not sacrificing accuracy, and throughput. So, we created brinicle. It is a production-oriented ANN index engine (not a full vector database) designed to stay usable under strict resource budgets. It focuses on disk-first operation and low memory overhead, while still supporting the operations you typically need in a real service: build/load, search, insert/upsert/delete, and rebuild. It can handle parallel read/write/rebuild requests without corrupting the index. I compared it with vector databases and in-process libraries like FAISS, and hnswlib. During one experiment, I gave one cpu, and 256 MB of RAM to a docker container to benchmark them. Only brinicle, and chroma survived, the rest of them OOMKilled (out of memory error), including milvus, qdrant, and weaviate. brinicle is an index engine, it does not aim to provide database features like payload indexing, distributed replication, auth, or multi-tenancy. If you need those features, use a vector database. This separation is intentional. The benchmarks (which I'll provide the link) show why a full DB stack often has a baseline memory footprint that is not compatible with extreme RAM caps, even before you start tuning.
Repo: https://github.com/bicardinal/brinicle The benchmark link: https://brinicle.bicardinal.com/benchmark Website: https://brinicle.bicardinal.com