Taiga: Performance Optimization of the C4.5 Decision Tree Construction Algorithm

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Technology Innovation Center at Yinzhou, Yangtze Delta Region Institute of Tsinghua University, Yinzhou 315100, China.
Classification is an important machine learning problem, and decision tree construction algorithms are an important class of solutions to this problem. RainForest is a scalable way to implement decision tree construction algorithms. It consists of several algorithms, of which the best one is a hybrid between a traditional recursive implementation and an iterative implementation which uses more memory but involves less write operations. We propose an optimized algorithm inspired by RainForest. By using a more sophisticated switching criterion between the two algorithms, we are able to get a performance gain even when all statistical information fits in memory. Evaluations show that our method can achieve a performance boost of 2.8 times in average than the traditional recursive implementation.


Tsinghua Science and Technology
Pages 415-425
Received: 04 June 2015
Accepted: 11 July 2015
Published: 11 August 2016
