A Comprehensive Study and Benchmarking of Transferable LOB Representations
The first benchmark that unifies data, preprocessing, and evaluation for LOB representation learning on China A-share markets.
The paper follows the structure of the accompanying Chinese explainer by tracing the evolution from industry demands to methodological advances. In high-frequency trading desks and quantitative research, analysts routinely re-engineer bespoke indicators for each task. Such pipelines struggle to transfer across markets or asset classes, and they rarely expose the intrinsic quality of learned representations.
LOBench reframes the problem through three guiding questions:
This perspective connects academic advances—self-attention, temporal convolutions, latent diffusion—to concrete trading scenarios such as short-term trend forecasting and liquidity risk assessment.
The Limit Order Book (LOB) records the full depth of bid and ask intentions across price levels and is indispensable for understanding micro-structure dynamics, liquidity formation, and price discovery. This work revisits LOB modelling through the lens of representation learning. Instead of crafting task-specific predictors, the authors compare canonical CNN/RNN baselines, Transformer families, and recent time-series encoders under a unified experimental protocol.
LOBench consolidates the fragmented landscape by offering consistent data curation, an encoder–decoder benchmarking pipeline, and reproducible downstream evaluations. The benchmark targets two central questions: representation sufficiency—whether an encoder can capture strong temporal autocorrelation, cross-level constraints, and heterogeneous feature scales inherent to LOBs; and representation necessity—whether transferable embeddings outperform bespoke task designs in both accuracy and development efficiency.
All experiments are conducted on newly curated China A-share datasets that reflect T+1 settlement, price-limit rules, and retail-driven order flows. The resulting analysis produces practical guidelines for building general-purpose, finance-ready time-series representations.
Unified schemas for data formatting, normalization, slicing, and labelling underpin fair comparisons across reconstruction, trend prediction, and imputation tasks.
Curated desensitized snapshots from Shenzhen Stock Exchange securities in 2019 capture liquidity tiers, sector diversity, and distinct institutional constraints.
Benchmarked architectures span DeepLOB, TransLOB, SimLOB, TimesNet, iTransformer, TimeMixer, and other state-of-the-art time-series encoders.
Lightweight feed-forward decoders probe how learned embeddings generalize to downstream tasks, highlighting when representation learning pays off.
LOBench decouples representation learning from task-specific objectives. Each model is trained as an encoder–decoder pair on standardized LOB windows; downstream experiments reuse the frozen encoder with simple decoders to isolate the contribution of the learned representations. The protocol mirrors the workflow described in the Chinese article: data alignment → representation learning → task transfer → quantitative diagnosis.
The dataset contains 10-level bid/ask prices and volumes, order-flow derived features (mid-price, spread, depth imbalances), and calendar indicators. Raw transactions drawn from the Shenzhen Stock Exchange in 2019 are aggregated following exchange regulations, with outlier handling for suspended sessions and capped limit-move days. Sliding windows of 120 time steps form the input tensors, while target windows cover both reconstruction horizons and forward-looking prediction spans.
Weighted MSE and unweighted MSE capture reconstruction and imputation fidelity; weighted cross-entropy and macro-F1 score quantify trend prediction accuracy under class imbalance. Every benchmark run logs training time, parameter count, and FLOPs, enabling a nuanced comparison between representation quality and computational overhead.
The evaluation is organised into two groups, echoing the narrative set out in the Chinese article. Group 1 compares representation models in a reconstruction-only setting. Group 2 freezes the encoders and attaches lightweight heads for downstream prediction and imputation. All models are trained with Adam, cosine learning-rate decay, and early stopping based on validation weighted MSE.
Transformer-centric encoders dominate. TimesNet delivers the lowest reconstruction error across all five benchmark stocks, closely followed by iTransformer and TimeMixer. CNN-based DeepLOB and recurrent baselines struggle to separate nearby price curves, especially during volatility spikes. Introducing weighted losses consistently lowers both MSE and macro-F1 variance, underscoring the importance of prioritising best-level liquidity information.
Transferability aligns with representation quality. When encoders are reused for trend prediction, performance improvements mirror reconstruction rankings: the best representations yield the highest macro-F1 and the most stable calibration across stocks. For imputation, Transformer variants recover masked depth values with 10–15% lower error than convolutional counterparts, highlighting their ability to model cross-level dependencies.
Efficiency matters. LOBench reports training time in seconds, revealing that iTransformer and TimeMixer strike a favourable balance between accuracy and runtime, while TimesNet trades marginally higher cost for superior fidelity. These statistics provide actionable guidance for practitioners who must weigh latency constraints against modelling accuracy.
The public repository ships with preprocessing scripts, benchmark configurations, and experiment logs to replicate every figure reported in the paper, while the processed dataset is available through the HuggingFace hub. To comply with exchange policies, the released dataset consists of desensitized LOB snapshots; future updates will broaden the stock universe and extend coverage to additional trading venues. The team plans to enrich LOBench with self-supervised objectives, anomaly detection tasks, and risk-oriented evaluation metrics, inviting the community to collaborate on high-frequency financial representation learning.
@misc{zhong2025representationlearninglimitorder,
title={Representation Learning of Limit Order Book: A Comprehensive Study and Benchmarking},
author={Muyao Zhong and Yushi Lin and Peng Yang},
year={2025},
eprint={2505.02139},
archivePrefix={arXiv},
primaryClass={cs.CE},
url={https://arxiv.org/abs/2505.02139},
}