The defining curve of electronic market structure — log-scale descent from human-speed to physics-limited
Every data point in the descent — from dial-up to FPGA, color-coded by technology era
| Year | Latency | Medium / Technology | Context | Era |
|---|---|---|---|---|
| 1995 | 1000ms | Dial-up modem | Seconds to execute. Phone-based trading. | -- |
| 2000 | 100ms | T1 lines to exchanges | Electronic exchanges. Sub-second execution becomes possible. | -- |
| 2005 | 10ms | Co-located fiber | Servers in exchange data centers. Reg NMS drives speed competition. | -- |
| 2007 | 1ms | Optimized fiber + co-lo | Spread Networks. Sub-millisecond matching engines. | -- |
| 2010 | 100.0µs | FPGA + co-location | Hardware acceleration. FPGA processes market data in microseconds. | -- |
| 2012 | 10.0µs | Microwave + FPGA | McKay Brothers microwave. Chicago-NJ in 4.1ms one-way (vs 6.5ms fiber). | -- |
| 2015 | 5.0µs | Millimeter wave + custom NIC | Kernel bypass networking. Custom network cards process packets in nanoseconds. | -- |
| 2018 | 1.0µs | Laser + FPGA + custom silicon | Sub-microsecond tick-to-trade. ASIC-based feed handlers. | -- |
| 2023 | 500ns | Custom ASIC + integrated optics | Nanosecond-scale decisions. Physics limit (speed of light) becomes binding constraint. | -- |
Network Infrastructure
| Medium | Speed (fraction of c) | Latency/km | Bandwidth | Weather Sensitivity | Cost Tier |
|---|---|---|---|---|---|
| Fiber Optic | -- | -- | Terabits/s | -- | -- |
| Microwave | -- | -- | ~1 Gbps | -- | -- |
| Millimeter Wave | -- | -- | ~10 Gbps | -- | -- |
| Free-Space Optical (Laser) | -- | -- | ~100 Gbps | -- | -- |
CME Aurora, IL ↔ NYSE Mahwah, NJ. The most fought-over data path in finance. Fiber: ~6.0ms one-way. Microwave: ~4.0ms one-way. The ~2ms delta between fiber and microwave at the speed of light is worth hundreds of millions annually. Spread Networks spent $300M laying a fiber route in 2010; it was obsoleted by microwave within two years.
LD4 Slough ↔ FR2 Frankfurt. European backbone. Microwave towers across the North Sea and Low Countries. Shorter distance means tighter absolute latency margins. Sub-4ms one-way via microwave. The Channel crossing complicates microwave line-of-sight — requires relay towers on high points in Belgium/Netherlands.
Kanto ↔ Kansai. JPX arbitrage route. Mountainous terrain limits microwave options — Japan's topography favors fiber or millimeter-wave solutions with more relay hops. Sub-3ms one-way is the target. Less competitive than US/EU routes but still multi-million dollar infrastructure.
The refractive index problem: Light in a vacuum travels at c = 299,792 km/s. In fiber optic cable, light travels through a glass core (silica, SiO₂) with a refractive index of n ≈ 1.47. The effective speed is c/n ≈ 203,940 km/s — about 68% of c. In air, n ≈ 1.0003, so microwave propagation is effectively at c.
The physics: The refractive index arises because photons interact with the electron clouds of silicon and oxygen atoms in the glass lattice. Each interaction introduces a tiny delay as the photon is absorbed and re-emitted. The cumulative effect across billions of interactions per meter reduces the effective group velocity. Air molecules are too sparse to cause significant delay at microwave frequencies.
Compute Hardware
| Hardware | Typical Latency | Flexibility | Power | Use Case |
|---|---|---|---|---|
| CPU (x86) | ~1-10μs | Highest | 150-300W | -- |
| GPU | ~10-100μs | Medium | 300-700W | -- |
| FPGA | ~100ns-1μs | Medium | 10-75W | -- |
| ASIC | <100ns | None | 5-50W | -- |
What is an FPGA? A Field-Programmable Gate Array is a chip containing millions of configurable logic blocks (CLBs) connected by a programmable interconnect fabric. Unlike a CPU — which fetches, decodes, and executes instructions sequentially through a pipeline — an FPGA implements logic directly in hardware. There is no instruction fetch. There is no branch prediction. There is no cache miss. The logic IS the circuit. When a market data packet arrives at the FPGA's input pins, the processing begins at the speed of electrical signal propagation through the gate fabric — nanoseconds, not microseconds.
How does it achieve nanosecond latency? Consider parsing a FIX message to extract a price update. On a CPU: the NIC DMAs the packet into memory, the kernel processes the interrupt, copies the packet to userspace (or you bypass the kernel), your application reads the bytes, branches through parsing logic, updates the order book, evaluates the strategy, and builds a response. Dozens of pipeline stages, cache accesses, and branch predictions. On an FPGA: the Ethernet frame enters the FPGA's integrated MAC, the FIX parser is a state machine implemented in combinatorial logic that extracts fields as bits arrive (no store-and-forward), the order book update is a parallel lookup in on-chip BRAM, the strategy evaluation is a combinatorial circuit, and the outbound order is serialized onto the wire — all in a single pass through the gate fabric. Wire-to-wire in under 1 microsecond.
Key platforms: Xilinx (now AMD) Alveo U250/U55C and Intel (formerly Altera) Stratix 10 are the dominant HFT FPGA platforms. The Alveo U250 offers ~1.3M LUTs, 54MB of on-chip BRAM, and 100GbE integrated MACs. Firms like Optiver, Citadel Securities, and Jump Trading have dedicated FPGA engineering teams of 20-50 people, each maintaining custom RTL codebases of hundreds of thousands of lines of SystemVerilog.
Feed handler: Ethernet MAC → IP/UDP parsing → exchange protocol decode (ITCH, OUCH, ARCA, CME MDP3) → order book reconstruction. All in streaming logic, no store-and-forward.
Pre-trade risk: Position limits, order rate limits, price band checks, fat-finger guards. These are simple comparators and counters that must never be bypassed — and on an FPGA they add <10ns.
Order encoder: Strategy decision → FIX/native protocol encoding → TCP/IP framing → wire. The inverse of the feed handler, equally latency-critical.
Simple strategies: Market making with deterministic spread logic, statistical arbitrage triggers, cross-venue price comparisons. Anything that can be expressed as combinatorial logic or simple state machines.
Complex strategy logic: Portfolio optimization, multi-leg options pricing, machine learning inference, regime detection. These require floating-point arithmetic, large memory, and algorithmic flexibility that FPGAs handle poorly.
Risk management: End-of-day P&L, portfolio Greeks, VaR calculations, margin monitoring. Important but not latency-sensitive — milliseconds are fine.
Configuration & control: Parameter updates to the FPGA (spread widths, position limits, symbol universe), monitoring dashboards, logging, compliance recording.
Recovery & resilience: Gap detection, sequence number recovery, reconnection logic, failover coordination. The CPU manages the system's lifecycle; the FPGA handles the hot path.
Software Stack
Kernel bypass eliminates the entire Linux networking stack from the critical path. Standard path: NIC → DMA to ring buffer → interrupt → kernel softirq → sk_buff allocation → protocol processing → socket buffer → copy to userspace → application. With kernel bypass (DPDK, Solarflare OpenOnload, ef_vi, or Exanic): NIC → DMA directly to userspace-mapped memory → application polls the buffer. No interrupts, no context switches, no copies. Saves ~10µs per packet. This is table stakes — every serious HFT firm uses kernel bypass.
Lock-free data structures replace mutexes and condition variables with atomic compare-and-swap (CAS) operations and memory ordering guarantees. A lock acquisition on a contended mutex can cost 5-15µs (futex syscall, context switch, scheduler). A CAS operation costs ~20ns. Lock-free SPSC (single-producer, single-consumer) ring buffers for inter-thread communication are the standard pattern. The Disruptor pattern (LMAX) demonstrated this at scale in 2011.
Custom memory allocators replace malloc/free (which use mmap/brk syscalls and maintain free lists) with pre-allocated pools of fixed-size objects. No syscalls, no fragmentation, deterministic allocation time. Object pools are initialized at startup and never freed. This eliminates ~1µs of jitter per allocation and, critically, eliminates the tail latency spikes from garbage collection or heap compaction.
FPGA on the critical path removes software entirely for feed handling and order encoding. The software stack applies only to the strategy layer and non-latency-sensitive operations. The critical path — market data in, order out — never touches a CPU.
When you've eliminated every software microsecond, what's left is the speed of light
| Route | Distance | Fiber (c/1.47) | Microwave (≈c) | Physics Limit |
|---|---|---|---|---|
| Chicago ↔ New Jersey | ~1,200 km | ~5.88ms | ~4.00ms | 4.00ms one-way |
| London ↔ Frankfurt | ~640 km | ~3.14ms | ~2.13ms | 2.13ms one-way |
| Tokyo ↔ Osaka | ~500 km | ~2.45ms | ~1.67ms | 1.67ms one-way |
| Within data center | ~100m | ~0.49µs | N/A | 0.33µs |
What's left to optimize when you're at the speed of light?
1. Path shortening. Real fiber routes are 20-40% longer than the geodesic because they follow railroad rights-of-way, avoid mountains, and navigate urban infrastructure. Microwave paths are closer to geodesic but require line-of-sight relay towers every 50-80 km. Millimeter-wave and free-space laser links can achieve near-geodesic paths with fewer relay points but are more susceptible to atmospheric attenuation.
2. Processing latency. Even at 500ns wire-to-wire on an FPGA, there are gates to optimize. Pipelining can reduce clock-to-clock latency. Combinatorial shortcuts (look-ahead adders, parallel prefix structures) save individual clock cycles. When your competitor is at 500ns and you're at 480ns, those 20ns matter — it's the difference between being first or second in the matching engine queue.
3. Serialization delay. A 64-byte Ethernet frame at 10GbE takes 51.2ns to serialize. At 100GbE it takes 5.12ns. Moving to 100GbE (or 400GbE) reduces serialization delay by an order of magnitude. This is a real and measurable improvement for minimum-size order packets.
4. Switch hop elimination. Every network switch adds 200-400ns of latency. Direct NIC-to-NIC connections (cross-connects within a data center) eliminate switch hops. Some firms negotiate with exchanges for dedicated cross-connects to the matching engine.
5. Shorter wavelengths. Hollow-core fiber is an emerging technology where light travels through an air core (n ≈ 1.0) instead of glass. This would give fiber the speed advantage of microwave with the reliability advantage of a physical cable. Currently expensive and limited in availability, but it represents the next frontier.
Where the matching engines live — and why even rack placement matters
| Exchange | Location | Facility |
|---|---|---|
| CME Group | Aurora, IL | CME co-lo facility. Futures, options on futures. E-mini S&P, crude oil, treasury futures. The Globex matching engine runs here. |
| NYSE | Mahwah, NJ | 400,000 sq ft facility. Equities matching engine. The physical successor to the trading floor at 11 Wall Street. |
| NASDAQ | Carteret, NJ | Equinix NY5. NASDAQ matching engine. Also hosts dark pools and ATS operators seeking proximity. |
| CBOE | Secaucus, NJ | Options matching engine. SPX options, VIX futures. Equinix NY4/NY5 complex. |
Historical gravity: Wall Street was in Manhattan. Early electronic trading infrastructure was built nearby. As trading went electronic and matching engines needed space, power, and cooling that Manhattan couldn't provide, data centers moved across the Hudson to northern New Jersey — close enough to maintain low-latency connections to Wall Street firms, but with the space, power grid capacity, and cost structure needed for warehouse-scale computing.
The NJ data center corridor: Mahwah, Secaucus, Carteret, and Weehawken form a dense corridor of financial data centers. Equinix alone operates six major facilities in the area (NY1-NY9). CyrusOne, QTS, and Digital Realty have additional campuses. The density creates a network effect: once the exchanges are there, the firms must be there, which attracts more exchanges, dark pools, and service providers.
Power: Northern NJ has robust grid infrastructure. A single co-location facility can draw 20-40 MW. The area benefits from multiple utility feeds and proximity to natural gas generation.
The cable length problem: Even within a data center, cable length varies. A rack 10 meters from the matching engine switch has ~33ns less latency than a rack 20 meters away. Over a year of trading, 33ns of consistent advantage translates to thousands of queue-priority wins. Regulators and exchanges recognized this asymmetry.
Equalized cable lengths: Most major exchanges now mandate equal cable lengths to all co-located participants. Every participant's cross-connect to the matching engine is the same length — excess cable is coiled on the rack. CME's Aurora facility and NYSE's Mahwah facility both implement equalized cables. This doesn't eliminate the co-location advantage over remote participants, but it ensures fairness among co-located firms.
What you're actually buying: Co-location isn't just about cable length. It's about being on the same switch fabric as the matching engine, eliminating WAN hops, and having deterministic latency. A co-located firm has ~1µs round-trip to the matching engine. A firm in Manhattan has ~200µs. A firm in Chicago has ~8ms. The co-location advantage is 200× over Manhattan and 8,000× over Chicago.
The complete path from market event to order acknowledgment — annotated with approximate latency at each hop