Trade Fill Archive

Cross-exchange trade data for quantitative analysis and backtesting. Tick-level fills from Hyperliquid L1 and Binance spot markets, aligned by timestamp, coin, and side — ready for signal research, execution analysis, and market microstructure study.
Hyperliquid L1 Binance Spot 23 Coins 2020 – Present

What is this?

A complete archive of every trade fill on two major crypto exchanges, structured for research.

The Data

Two exchange feeds, one timestamp axis.

Hyperliquid L1 Fills

On-chain
Sources3://hl-mainnet-node-data/node_fills/
WindowRolling ~2 months (latest 64 days)
GranularityEvery fill, ms timestamps
FormatLZ4-compressed NDJSON, hourly files
Unique fieldswallet, closedPnl, startPosition, dir
Size~30 GB / 64 days
Coins323 (perps + spot + indices)
Wallets~19,000 unique addresses / day

Binance Spot Trades

CEX
Sourcedata.binance.vision (public, free)
WindowJan 2020 – present (5.5 years)
GranularityEvery trade, μs timestamps
FormatCSV (zipped), daily + monthly archives
Unique fieldstrade_id, quote_qty, is_buyer_maker
Size~150 GB / 5.5 years (est.)
Symbols18 USDT pairs (of 23 mapped)
Depth3M+ trades/day for BTC alone

Common Fields (Cross-Exchange)

ConceptHyperliquidBinanceJoin Key
PricepxpriceDirect comparison
SizeszqtySame base unit
Timestamptime (ms epoch)time (μs epoch)Normalize to ms
Sideside (B/A)is_buyer_maker (bool)B=!maker, A=maker
Coincoin (e.g. BTC)Symbol (e.g. BTCUSDT)Strip USDT suffix
Notionalpx × szquote_qtyBoth in USDT

HL-Only Fields (Not on Binance)

FieldTypeResearch Value
walletaddressTrack individual trader PnL, identify smart money
closedPnlUSDCRealized profit/loss per fill — aggregate by wallet for trader skill
startPositionfloatPosition size before fill — reveals conviction, scaling behavior
direnumOpen/Close/Flip — distinguishes entry from exit, long from short
crossedboolAggressive (taker) vs passive (maker) — execution quality signal
feeUSDCNegative = rebate (maker), positive = fee (taker)
HL CoinBinance SymbolBinance SpotNote
BTCBTCUSDTSince 2017Full history
ETHETHUSDTSince 2017Full history
SOLSOLUSDTSince 2020
XRPXRPUSDTSince 2018
DOGEDOGEUSDTSince 2019
BNBBNBUSDTSince 2017
AVAXAVAXUSDTSince 2020
SUISUIUSDTSince 2023
AAVEAAVEUSDTSince 2020
WLDWLDUSDTSince 2023
WIFWIFUSDTSince 2024
ENAENAUSDTSince 2024
TRUMPTRUMPUSDTSince 2025
VIRTUALVIRTUALUSDTSince 2024
ONDOONDOUSDTSince 2024Gaps after Jun 29
PENGUPENGUUSDTSince 2024
INITINITUSDTSince 2025
BERABERAUSDTSince 2025
HYPENot listedHL-native token
FARTCOINNot listedHL-native / Solana
MOODENGNot listedHL-native / Solana
POPCATNot listedHL-native / Solana
GRASSNot listedHL-native / Solana

Analysis Framework

What this data enables, mapped to available tooling.

Data Pipeline

Ingesthl_trade_dl
binance_dl
NormalizeAlign timestamps
Map symbols
Feature Engtsfresh
VPIN / OFI
ModelLightGBM
HMM / GARCH
Backtestalpha_research
engine
EvaluateSharpe, PnL
Regime splits

Research Directions

Cross-Exchange Arbitrage

Compare HL fill prices to Binance trades at the same millisecond. Measure price impact, latency premium, and venue selection alpha. HL has wallet-level attribution that Binance lacks.

alpha_research/src/alpha/
26 alpha implementations including momentum, mean reversion, carry
Smart Money Tracking

HL exposes wallet addresses on every fill. Rank wallets by cumulative closedPnl, then forward-test: do top wallets predict direction? Cluster by trading style using dir, crossed, position sizing.

ts_embed/skills/embedding/
Embed trade sequences, find similar patterns via vector search
Regime Detection

Binance 5-year history covers multiple regimes: 2020 COVID crash, 2021 bull run, 2022 bear market, 2023-24 recovery. Train HMM on volatility and volume to label regimes, then condition alpha signals.

hmmlearn + arch (GARCH)
Already used in alpha_research for regime-conditional strategies
Market Microstructure

Compute VPIN (volume-synchronized probability of informed trading) from tick data. HL's crossed flag directly labels aggressor vs passive. Compare maker/taker composition between DEX (HL) and CEX (Binance).

vpin repo + svpy/pyprod/
Volume profile analysis, intraday session management
Execution Quality

HL fills include fee (negative = maker rebate). Compare effective cost (fee + slippage) between HL and Binance for the same coin. Build venue selection model: when is HL cheaper than Binance?

svstrat/src/pyexecution/
Execution algorithms, optimal routing, cost analysis
Time Series Forecasting

Aggregate tick data to bars (1m, 5m, 1h). Run ARIMA/GARCH for short-term vol forecasting. Use tsfresh for automated feature extraction. Ensemble with LightGBM, CatBoost for cross-sectional prediction.

tsprojection: pmdarima, tsfresh, optuna
Bayesian hyperparameter tuning, multi-model ensemble
LayerToolRepoUse in This Project
Data Ingestboto3 (S3), urllib (CDN)hyperliquidHL fill download (requester-pays S3), Binance public archive
Data Ingestyfinance, tradingview-taetfdataReference prices, technical indicators for cross-validation
BacktestCustom engine (Sharpe, PnL)alpha_researchWalk-forward backtest on HL+Binance alpha signals
Alpha Library26 signal implementationsalpha_researchMomentum, mean reversion, carry, HMM regime, seasonality
Portfoliocvxpy, cvxoptsvstrat, cyfoptOptimal position sizing, risk parity, min-variance
Time Seriespmdarima, archtsprojectionARIMA/GARCH forecasting on aggregated bars
Feature EngtsfreshtsprojectionAutomated feature extraction from trade sequences
ML ModelsLightGBM, CatBoost, XGBoostsvpy, novaCross-sectional return prediction, regime classification
Regimehmmlearnalpha_researchHidden Markov Model for volatility regime detection
EmbeddingsPyTorch, sentence-transformersts_embedEmbed trade sequences, semantic pattern matching
Vector SearchQdrant, LanceDBts_embed, docvecFind similar trading patterns across time
RiskCustom metricssvriskDrawdown, VaR, portfolio risk decomposition
ExecutionExecution algorithmssvstratVenue selection, cost analysis, optimal routing
VisualizationPlotly, matplotlibsvpy, alpha_researchInteractive PnL curves, heatmaps, signal plots
ExperimentMLflow, wandbnova, tsprojectionTrack model versions, hyperparameter sweeps
ComputeVast.ai (GPU), Daskgpu-workersDistributed feature engineering, model training
OptimizationoptunatsprojectionBayesian hyperparameter tuning for alpha parameters
DatabasePostgreSQL, Redissvpy, novaPersistent trade storage, cached feature sets

Infrastructure

How the data gets from exchange to analysis-ready.

Download Infrastructure

ComputeVast.ai RTX A4000, Delaware
Disk500 GB NVMe
Network7.8 Gbps (same-region to S3)
Cost$0.12/hr ($2.88/day)
HL egress$0 (same AWS region)
BN egress$0 (public CDN, no charge)
Parallelism6 worker threads per job
IdempotentSafe to re-run (size-based skip)

Data Scripts

hl_trade_dl_standalone.pyHL S3 downloader
  probeVerify credentials + bucket access
  listInspect keys, validate date regex
  downloadParallel download with retry + streaming
binance_dl.pyBinance public archive downloader
  downloadDaily trades for a date range
  backfillMonthly archives for bulk history
  symbolsHL ↔ Binance coin mapping