AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

HXPY: A High-Performance Data Processing Package for Financial Time-Series Data

The Hong Kong University of Science and Technology, Hong Kong, China
International Digital Economy Academy, Shenzhen 518048, China
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511455, China
Show Author Information

Abstract

A tremendous amount of data has been generated by global financial markets everyday, and such time-series data needs to be analyzed in real time to explore its potential value. In recent years, we have witnessed the successful adoption of machine learning models on financial data, where the importance of accuracy and timeliness demands highly effective computing frameworks. However, traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues, such as the outlier handling with stock suspension in Pandas and TA-Lib. In this paper, we propose HXPY, a high-performance data processing package with a C++/Python interface for financial time-series data. HXPY supports miscellaneous acceleration techniques such as the streaming algorithm, the vectorization instruction set, and memory optimization, together with various functions such as time window functions, group operations, down-sampling operations, cross-section operations, row-wise or column-wise operations, shape transformations, and alignment functions. The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts. From MiBs to GiBs data, HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.

Electronic Supplementary Material

Video
JCST-2209-12879-video.mp4
Download File(s)
JCST-2209-12879-Highlights.pdf (1,020.1 KB)

References

[1]
Farnoosh A, Azari B, Ostadabbas S. Deep switching auto-regressive factorization: Application to time series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(8): 7394–7403. DOI: 10.1609/aaai.v35i8.16907.
[2]
Rasul K, Seward C, Schuster I, Vollgraf R. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In Proc. the 38th International Conference on Machine Learning, Jul. 2021, pp.8857–8868.
[3]
Pan Q Y, Hu W B, Chen N. Two birds with one stone: Series saliency for accurate and interpretable multivariate time series forecasting. In Proc. the 30th International Joint Conference on Artificial Intelligence, Aug. 2021, pp.2884–2891. DOI: 10.24963/ijcai.2021/397.
[4]
Lee D, Lee S, Yu H. Learnable dynamic temporal pooling for time-series classification. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(9): 8288–8296. DOI: 10.1609/aaai.v35i9.17008.
[5]
Mbouopda M F. Uncertain time series classification. In Proc. the 30th International Joint Conference on Artificial Intelligence, Aug. 2021, pp.4903–4904. DOI: 10.24963/ijcai.2021/683.
[6]
Yue Z H, Wang Y J, Duan J Y, Yang T M, Huang C R, Tong Y H, Xu B X. TS2Vec: Towards universal representation of time series. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(8): 8980–8987. DOI: 10.1609/aaai.v36i8.20881.
[7]
Eldele E, Ragab M, Chen Z H, Wu M, Kwoh C K, Li X L, Guan C T. Time-series representation learning via temporal and contextual contrasting. In Proc. the 30th International Joint Conference on Artificial Intelligence, Aug. 2021, pp.2352–2359. DOI: 10.24963/ijcai.2021/324.
[8]
Deng A L, Hooi B. Graph neural network-based anomaly detection in multivariate time series. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(5): 4027–4035. DOI: 10.1609/aaai.v35i5.16523.
[9]
Kim S, Choi K, Choi H S, Lee B, Yoon S. Towards a rigorous evaluation of time-series anomaly detection. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(7): 7194–7201. DOI: 10.1609/aaai.v36i7.20680.
[10]
McGowan M J. The rise of computerized high frequency trading: Use and controversy. Duke L. & Tech. Rev., 2010, 16.
[11]
Yang X, Liu W Q, Zhou D, Bian J, Liu T Y. Qlib: An AI-oriented quantitative investment platform. arXiv: 2009.11189, 2021. https://arxiv.org/abs/2009.11189, Dec. 2022.
[12]
Ding Q G, Wu S F, Sun H, Guo J D, Guo J. Hierarchical multi-scale Gaussian transformer for stock movement prediction. In Proc. the 29th International Joint Conference on Artificial Intelligence, Jul. 2020, pp.4640–4646. DOI: 10.24963/ijcai.2020/640.
[13]
Wang J Y, Zhang Y, Tang K, Wu J J, Xiong Z. Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. In Proc. the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Jul. 2019, pp.1900–1908. DOI: 10.1145/3292500.3330647.
[14]
McKinney W. Pandas: A foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing, 2011, 14(9): 1–9.
[15]
Petersohn D. Dataframe systems: Theory, architecture, and implementation. Technical Report No. UCB/EECS-2021-193, University of California, Berkeley, 2021. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-193.html, Dec. 2022.
[16]
Petersohn D, Macke S, Xin D, Ma W, Lee D J L, Mo X X, Gonzalez J E, Hellerstein J M, Joseph A D, Ganesh A. Towards scalable dataframe systems. Proceedings of the VLDB Endowment, 2020, 13(12): 203–204. DOI: 10.14778/3407790.3407807.
[17]
Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Elibol M, Yang Z H, Paul W, Jordan M I, Stoica I. Ray: A distributed framework for emerging AI applications. In Proc. the 13th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2018, pp.561–577. https://www.usenix.org/system/files/osdi18-moritz.pdf, Jan. 2023.
[18]
Petersohn D, Tang D X, Durrani R, Melik-Adamyan A, Gonzalez J E, Joseph A D, Parameswaran A G. Flexible rule-based decomposition and metadata independence in modin: A parallel dataframe system. Proceedings of the VLDB Endowment, 2021, 15(3): 739–751. DOI: 10.14778/3494124.3494152.
[19]
Hord R M. The Illiac IV: The First Supercomputer. Springer Science & Business Media, 2013.
[20]
Langdale G, Lemire D. Parsing gigabytes of JSON per second. The VLDB Journal, 2019, 28(6): 941–960. DOI: 10.1007/s00778-019-00578-5.
[21]
Watanabe H, Nakagawa K M. SIMD vectorization for the Lennard-Jones potential with AVX2 and AVX-512 instructions. Computer Physics Communications, 2019, 237: 1–7. DOI: 10.1016/j.cpc.2018.10.028.
[22]
Kahan W. IEEE standard 754 for binary floatingpoint arithmetic. Lecture Notes on the Status of IEEE 754, 1996.
[23]
Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 1996, 5(3): 299–314. DOI: 10.2307/1390807.
[24]
Bezanson J, Edelman A, Karpinski S, Shah V B. Julia: A fresh approach to numerical computing. SIAM Review, 2017, 59(1): 65–98. DOI: 10.1137/141000671.
[25]
Duvinage M, Mazza P, Petitjean M. The intra-day performance of market timing strategies and trading systems based on Japanese candlesticks. Quantitative Finance, 2013, 13(7): 1059–1070. DOI: 10.1080/14697688.2013.768774.
[26]
Nelson D M Q, Pereira A C M, De Oliveira R A. Stock market’s price movement prediction with LSTM neural networks. In Proc. International Joint Conference on Neural Networks (IJCNN), May 2017, pp.1419–1426. DOI: 10.1109/IJCNN.2017.7966019.
[27]
Tummon E, Raja M A, Ryan C. Trading cryptocurrency with deep deterministic policy gradients. In Proc. the 21st International Conference on Intelligent Data Engineering and Automated Learning, Nov. 2020, pp.245–256. DOI: 10.1007/978-3-030-62362-3_22.
[28]
De Guzman J, Nuffer D. The Spirit parser library: Inline parsing in C++. CC Plus Plus Users Journal, 2003, 21(9): 22–46.
[29]
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 1999, 58(1): 137–147. DOI: 10.1006/jcss.1997.1545.
[30]
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache FlinkTM: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28–38.
[31]
Iqbal M H, Soomro T R. Big data analysis: Apache storm perspective. International Journal of Computer Trends and Technology, 2015, 19(1): 9–14. DOI: 10.14445/22312803/IJCTT-V19P103.
[32]
Foley D, Danskin J. Ultra-performance Pascal GPU and NVLink interconnect. IEEE Micro, 2017, 37(2): 7–17. DOI: 10.1109/MM.2017.37.
[33]
Grant D A. The Latin square principle in the design and analysis of psychological experiments. Psychological Bulletin, 1948, 45(5): 427–442. DOI: 10.1037/h0053912.
[34]
Agarap A F. Deep learning using rectified linear units (ReLU). arXiv: 1803.08375, 2018. https://arxiv.org/abs/1803.08375, Dec. 2022.
[35]
Harris C R, Millman K J, Van Der walt S J, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith N J, Kern R, Picus M, Hoyer S, Van Kerkwijk M H, Brett M, Haldane A, Del Río J F, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant T E. Array programming with NumPy. Nature, 2020, 585(7825): 357–362. DOI: 10.1038/s41586-020-2649-2.
[36]
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z M, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J J, Chintala S. PyTorch: An imperative style, high-performance deep learning library. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 712.
[37]
Dagum L, Menon R. OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 1998, 5(1): 46–55. DOI: 10.1109/99.660313.
[38]
Rodriguez S, Cardiff P. A general approach for running Python codes in OpenFOAM using an embedded Pybind11 Python interpreter. OpenFOAM® Journal, 2022, 2: 166–182. DOI: 10.51560/ofj.v2.79.
[39]
Titus A J, Kishore S, Stavish T, Rogers S M, Ni K. PySEAL: A Python wrapper implementation of the SEAL homomorphic encryption library. arXiv: 1803.01891, 2018. https://arxiv.org/abs/1803.01891, Dec. 2022.
[40]
Anderson C. Docker [software engineering]. IEEE Software, 2015, 32(3): 102-c3. DOI: 10.1109/MS.2015.62.
Journal of Computer Science and Technology
Pages 3-24
Cite this article:
Guo J, Peng J, Yuan H, et al. HXPY: A High-Performance Data Processing Package for Financial Time-Series Data. Journal of Computer Science and Technology, 2023, 38(1): 3-24. https://doi.org/10.1007/s11390-023-2879-5

420

Views

3

Crossref

1

Web of Science

2

Scopus

0

CSCD

Altmetrics

Received: 30 September 2022
Revised: 29 October 2022
Accepted: 10 January 2023
Published: 28 February 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return