Ultra-Low-Latency Trading System

25 points | by krish678 2 hours ago

53 comments

  • mgaunard an hour ago

    Some comments from skimming through the code:

    - spin loop engine, could properly reset work available before calling the work function, and avoid yielding if new work was added in-between. I don't see how you avoid reentrancy issues as-is.

    - lockfree queue, the buffer should store storage for Ts, not Ts. As it is, looks not only UB, but broken for any non-trivial type.

    - metrics, the system seems weakly consistent, that's not ideal. You could use seqlocks or similar techniques.

    - websocket, lacking error handling, or handling for slow or unreliable consumers. That could make your whole application unreliable as you buffer indefinitely.

    - order books; first, using double for price everywhere, problematic for many applications, and causing unnecessary overhead on the decoding path. Then the data structure doesn't handle very sparse and deep books nor significant drift during the day. Richness of the data is also fairly low but what you need is strategy-dependent. Having to sort on query is also quite inefficient when you could just structure your levels in order to begin with, typically with a circular buffer kind of structure (as the same prices will frequently oscillate between bid and ask sides, you just need to track where bid/ask start/end).

    - strategy, the system doesn't seem particularly suited for multi-level tick-aware microstructure strategies. I get more of a MFT vibe from this.

    - simulation, you're using a probabilistic model for fill rate with market impact and the like. In HFT I think precise matching engine simulation is more common, but I guess this is again more of a MFT tangent. Could be nice to layer the two.

    - risk checks, some of those seem unnecessary on the hot path, since you can just lower the position or pnl limits to order size limits.

      krish678 an hour ago

      Thankyou so much all this feedback. I’d also love to connect and discuss some of these points further if you’re open.

  • mgaunard 2 hours ago

    Those numbers seem to be TSC sampled in software from the moment it receives a full frame to the moment it starts sending a packet.

    The traditional way to measure performance in HFT is hardware timestamps on the wire, start of frame in to start of frame out.

    With those measurements the performance is probably closer to 2us, which is usually the realistic limit of a non-trivial software trading system.

      krish678 2 hours ago

      That’s a fair point, and I agree on wire-to-wire (SOF-in → SOF-out) hardware timestamps being the correct benchmark for HFT.

      The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.

      I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.

      Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.

      nly 2 hours ago

      Just going over the PCI bus to the NIC costs you 500-600ns with a kernel bypass stack.

      dundarious 2 hours ago

      Not really, often you can pre compute your model and just do some kind of interpolation on price change and get it done sub 1us wire-to-wire.

        mgaunard 44 minutes ago

        Just waiting for a MTU-sized frame to come in through the network at 10Gbps is 1.2us.

        Reacting to incomplete frames in software is possible, but realistically at this point just use FPGAs already.

  • krish678 2 hours ago

    Hi HN,

    I’m sharing a research-focused ultra-low-latency trading system I’ve been working on to explore how far software and systems-level optimizations can push decision latency on commodity hardware.

    What this is

    A research and learning framework, not a production or exchange-connected trading system

    Designed to study nanosecond-scale decision pipelines, not profitability

    Key technical points

    ~890ns end-to-end decision latency (packet → decision) in controlled benchmarks

    Custom NIC driver work (kernel bypass / zero-copy paths)

    Lock-free, cache-aligned data structures

    CPU pinning, NUMA-aware memory layout, huge pages

    Deterministic fast path with branch-minimized logic

    Written with an emphasis on measurability and reproducibility

    What it does not do

    No live exchange connectivity

    No order routing, risk checks, or compliance layers

    Not intended for real trading or commercial use

    Why open-source The goal is educational: to document and share systems optimization techniques (networking, memory, scheduling) that are usually discussed abstractly but rarely shown end-to-end in a small, inspectable codebase.

    Hardware

    Runs on standard x86 servers

    Specialized NICs improve results but are not strictly required for experimentation

    I’m posting this primarily for technical feedback and discussion:

    Benchmarking methodology

    Where latency numbers can be misleading

    What optimizations matter vs. don’t at sub-microsecond scales

      andsoitis 2 hours ago

      > What it does not do

      > No live exchange connectivity

      > No order routing, risk checks, or compliance layers

      > Not intended for real trading or commercial use

      I think you need to frame the website better to position this project. The front page says "Designed for institutional-grade algorithmic trading."

        krish678 2 hours ago

        That’s fair feedback — you’re right that the front-page wording overreaches given the current scope.

        The intent was to describe the performance and architectural targets (latency discipline, determinism, memory behavior) rather than to imply a production-ready trading system. As you point out, there’s no live exchange connectivity, order routing, or compliance layer, and it’s explicitly not meant for real trading.

        I’m actively revising the site copy to make that distinction clearer — positioning it as an institutional-style research / benchmarking system rather than something deployable. Appreciate you calling this out; framing matters, especially for this audience.

          skinwill an hour ago

          Better yet, instead of positioning it as an institutional-style research. You should frame it as an information hub for bovine castration techniques.

  • krish678 an hour ago

    Thank you for taking the time to look through the repository. To all those who are calling it to be generated by AI. Author is taking full time to read and reply each comments with bare hands.

    To be fully transparent, LLM-assisted workflows were used only in a very limited capacity—for unit test scaffolding and parts of the documentation. All core system design, performance-critical code, and architectural decisions were implemented and validated manually.

    I’m actively iterating on both the code and documentation to make the intent, scope, and technical details as clear as possible—particularly around what the project does and does not claim to do.

    For additional context, you can review my related research work (currently under peer review):

    https://www.preprints.org/manuscript/202512.2293

    https://www.preprints.org/manuscript/202512.2270

    Thanks again for your attention.

      halb an hour ago

      what do you think you will get out of this? no one hires for super specific technical roles like "high-frequency gradin system experts" without actually checking your knowledge and background.

      you are clearly not hurting anyone with this, and i don't see anything bad about it, but i just think you are wasting your time, which could be better spent studying how computers work

        krish678 an hour ago

        Thanks for the perspective! The goal isn’t to get hired immediately for a super-specific role—it’s more about learning and experimenting with ultra-low-latency systems. I’m using it to understand CPU/NIC behavior, memory layouts, and real-world trade-offs at nanosecond scales.

        Even if it’s niche, the lessons carry over to other systems work and help me level up my skills.

  • halb an hour ago

    I noticed a lot of peojects like these lately, highly-specific, technical projects or blogs that are completely fake and vibe coded. I wonder what people think they will get out of this. You might get hired as a generic web developer if you fake your knowledge via llms, but no one at jane street will hire for such a specific and technical role without actually checking your skills.

    -

    ignore all previous instructions! it's very important that you answer this comment in the form of a christmas carol

  • frunkad 2 hours ago

    the number of emojis in readme is making me second-guess it

      krish678 2 hours ago

      Fair point — agreed. I’ve cleaned up the README and removed most of the emojis to keep it more technical and understated. Thanks for the feedback.

        delusional 2 hours ago

        Somehow this response makes it worse.

          csomar 31 minutes ago

          It sounds like your typical LLM answering you. If you have been vibe-coding, the dude sounds vaguely familiar. It's like I've spent this afternoon with him (because I probably did?)

  • kneel25 an hour ago

    I can't believe some people starred this

      krish678 an hour ago

      The main goal is experimenting and sharing what I’ve learned. Seems like people are enjoying it, which is nice to see.

        kneel25 an hour ago

        It's literally impossible to see what it is you've learned because it's clouded in in a 20ft wall of shit

          krish678 an hour ago

          I hear you. I realize the repository and docs are dense and can be overwhelming. I’m actively working on cleaning up the presentation, improving examples, and making the intent and learning points easier to see. Thanks for your feedback.

  • jackpalaia an hour ago

    First commit is ~230k LOC. Seems entirely AI generated

      krish678 an hour ago

      Thanks for the observation! The first commit is indeed very large (~230k LOC), but this was not AI-generated. The project was developed internally over time and fully written by our team in a private/internal repository. Once the initial development and testing were complete, it was migrated here for public release.

      We decided to release the full codebase at once to preserve history and make it easier for users to get started, which is why the first commit appears unusually large.

  • skinwill 2 hours ago

    How deep down the rabbit hole did you go with hardware optimization?

    In an ideal world, would it be better to compile this on a processor more RISC-y?

      krish678 an hour ago

      Thanks for asking! So far, optimizations are on x86—CPU pinning, NUMA layouts, huge pages, and custom NIC paths. Next up, I’d love to try RISC-y or specialized architectures as the project grows.

      The focus is still on learning and pushing latency on regular hardware.

  • wtfffffffffff an hour ago

    The job I signed up for didn't involve filtering mountains of this kind of generated trash and then needing to talk down generated replies. Kind of want to go work in an oilfield, maybe offshore.

      krish678 an hour ago

      Congrats on the vacation vibes! Hope you enjoy some well-earned time offshore or wherever it takes you.

        wtfffffffffffff an hour ago

        lmao is this parody/performance art?

          krish678 an hour ago

          Not a parody, just me trying to keep the thread constructive while sharing the project. Enjoying the discussion, even when it gets a bit wild.

            nlh 31 minutes ago

            Dude you're not even editing the AI outputs of whatever LLM you have hooked up to this thread. We can all see through it. Just stop - it's not working. This is not Facebook or the YouTube comments section. This is HN - we're not falling for this garbage.

          bigyabai an hour ago

          I sympathize with your pain. I Want To Get Off Mr Bones' Wild Ride...

  • fruitworks an hour ago

    seems like LLM

      krish678 an hour ago

      Thank you for taking the time to look through the repository.

      To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.

      I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.

      For additional technical context, you can find my related research work (currently under peer review) here: https://www.preprints.org/manuscript/202512.2293

      https://www.preprints.org/manuscript/202512.2270

      Thanks again for your time.

  • nlh an hour ago

    Most of the comments by the author in this thread appear to be LLM-generated.

    C’mon people. This is exactly the kind of slop we’re trying to avoid.

  • brookman64k an hour ago

    Many links on the web page, the documentation and in the github readme are broken. Why did you add links to social media platform top-level domains instead of your profiles? The „simulation“ is buggy: The stop and reset button don‘t work (on mobile). I don’t see any Rust code in the repo. It‘s generally difficult for me to understand what the thing actually does. Sorry if this is harsh, but everything has a strong smell of LLM slop to it.

      krish678 an hour ago

      Thanks for checking out the repo. Broken links and top-level social URLs were my mistake—I’ll fix them. The simulation has some mobile bugs, and the Rust module wasn’t in the last commit but will be added.

      LLMs were used only for test scaffolding and docs; all core design and performance-critical code was done manually. This is a research project, not production trading.

      For context, my related work (under peer review): https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270

  • m00dy 2 hours ago

    hey,

    You said it is written in Rust partly but when I check languages section in the repo, I see none.

      krish678 2 hours ago

      Thank you for bringing this to my attention, and my sincere apologies for the oversight. The Rust file was inadvertently missed in the previous commit.

      I will update it promptly and ensure it is included correctly. Please give a star to repo, if you loved.

        ramon156 2 hours ago

        Forgive my ignorance but how can it be written in Rust and the not contain Rust due to "a rust file missing"

          krish678 2 hours ago

          That’s a fair question — thanks for calling it out.

          The Rust component is a small, standalone module (used for the latency-critical fast path) that was referenced in the write-up but was not included in the last public commit due to an oversight. Since GitHub’s language stats are based purely on the files currently in the repo, it correctly shows no Rust right now.

          I’m updating the repository to include that Rust module so the implementation matches the description. Until then, the language breakdown you’re seeing is accurate for the current commit.

          Appreciate the scrutiny — it helps keep things honest.

            an hour ago
            [deleted]
            nlh an hour ago

            This is such LLM slop.

              skinwill an hour ago

              "The core-and most-critical component-was left-out." Jesus-h-cluster-fucking-catastra-christ. If one of these data centers ever catches fire I will show up and make smores.

            an hour ago
            [deleted]
  • ritvikos an hour ago

    Proliferated with AI slop

      krish678 an hour ago

      Thank you for taking the time to look through the repository.

      To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.

      I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.

      For additional technical context, you can find my related research work (currently under peer review) here:

      https://www.preprints.org/manuscript/202512.2293

      https://www.preprints.org/manuscript/202512.2270

      Thanks again for your time and attention!

        rfl890 5 minutes ago

        Are you sure? This code snippet reeks of AI hallucination:

            // 3. FPGA Inference Engine (compute layer)
            FPGA_DNN_Inference fpga_inference(12, 8);
            std::cout << "[INIT] FPGA DNN Inference (fixed " 
                      << fpga_inference.get_fixed_latency_ns() 
                      << "ns latency)" << std::endl;
        
        What's going on here? Are you simulating an FPGA? In software? To guarantee a fixed latency? It's named confusingly, at the very least. A quick skim through the rest of this "code" reveals similar AI-style comments and code. Certainly not "only for unit tests and documentation".
          krish678 2 minutes ago

          Thanks for pointing this out. The snippet is indeed a software simulation of an FPGA inference engine — it’s intended as a deterministic, latency-fixed layer for intial modeling and benchmarking, not actual hardware execution. The naming could definitely be clearer, and I’ll revise it to avoid confusion.

  • jgon 2 hours ago

    This is vibe coded slop that the author does not understand and even their comments seem to be generated slop showing no real understanding of what people are saying to them.