Journal Home > Volume 1 , Issue 1

Data visualization transforms data into images to aid the understanding of data; therefore, it is an invaluable tool for explaining the significance of data to visually inclined people. Given a (big) dataset, the essential task of visualization is to visualize the data to tell compelling stories by selecting, filtering, and transforming the data, and picking the right visualization type such as bar charts or line charts. Our ultimate goal is to automate this task that currently requires heavy user intervention in the existing visualization systems. An evolutionized system in the field faces the following three main challenges: (1) Visualization verification: to determine whether a visualization for a given dataset is interesting, from the viewpoint of human understanding; (2) Visualization search space: a "boring" dataset may become interesting after an arbitrary combination of operations such as selections, joins, and aggregations, among others; (3) On-time responses: do not deplete the user’s patience. In this paper, we present the DeepEye system to address these challenges. This system solves the first challenge by training a binary classifier to decide whether a particular visualization is good for a given dataset, and by using a supervised learning to rank model to rank the above good visualizations. It also considers popular visualization operations, such as grouping and binning, which can manipulate the data, and this will determine the search space. Our proposed system tackles the third challenge by incorporating database optimization techniques for sharing computations and pruning.


menu
Abstract
Full text
Outline
About this article

DeepEye: An Automatic Big Data Visualization Framework

Show Author's information Xuedi QinYuyu LuoNan Tang( )Guoliang Li( )
Department of Computer Science, Tsinghua University, Beijing 100084, China.
Qatar Computing Research Institute, HBKU, Qatar.

Abstract

Data visualization transforms data into images to aid the understanding of data; therefore, it is an invaluable tool for explaining the significance of data to visually inclined people. Given a (big) dataset, the essential task of visualization is to visualize the data to tell compelling stories by selecting, filtering, and transforming the data, and picking the right visualization type such as bar charts or line charts. Our ultimate goal is to automate this task that currently requires heavy user intervention in the existing visualization systems. An evolutionized system in the field faces the following three main challenges: (1) Visualization verification: to determine whether a visualization for a given dataset is interesting, from the viewpoint of human understanding; (2) Visualization search space: a "boring" dataset may become interesting after an arbitrary combination of operations such as selections, joins, and aggregations, among others; (3) On-time responses: do not deplete the user’s patience. In this paper, we present the DeepEye system to address these challenges. This system solves the first challenge by training a binary classifier to decide whether a particular visualization is good for a given dataset, and by using a supervised learning to rank model to rank the above good visualizations. It also considers popular visualization operations, such as grouping and binning, which can manipulate the data, and this will determine the search space. Our proposed system tackles the third challenge by incorporating database optimization techniques for sharing computations and pruning.

Keywords: big data, automatic data visualization, visualization verification, visualization ranking, visualization search space

References(22)

[1]
[2]
Wu E., Psallidas F., Miao Z. J., Zhang H. C., Rettig L., Wu Y. F., and Sellam T., Combining design and performance in a data visualization management system, in Proc. 8th Biennial Conf. Innovative Data Systems Research CIDR, Chaminade, CA, USA, 2017.
[3]
Siddiqui T., Lee J., Kim A., Xue E., Wang C. R., Zou Y. X., Liu C. F., Guo L. J., Yu X. F., Karahalios K., et al, Fast-forwarding to desired visualizations with zenvisage, in Proc. 8th Biennial Conf. Innovative Data Systems Research CIDR, Chaminade, CA, USA, 2017.
[4]
Wu E., Battle L., and Madden S. R., The case for data visualization management systems: Vision paper, Proc. VLDB Endowment, vol. 7, no. 10, pp. 903-906, 2014.
DOI
[5]
Quinlan J. R., Induction of decision trees, Mach. Learn., vol. 1, no. 1, pp. 81-106, 1986.
[6]
Burges C., Shaked T., Renshaw E., Lazier A., Deeds M., Hamilton N., and Hullender G., Learning to rank using gradient descent, in Proc. 22nd Int. Conf. Machine Learning ICML, Bonn, Germany, 2005, pp. 89-96.
DOI
[7]
Wu Q., Burges C. J. C., Svore K. M., and Gao J. F., Ranking, boosting, and model adaptation, Technical Report, MSR-TR-2008-109, 2008.
[8]
Järvelin K. and Kekäläinen J., Cumulated gain-based evaluation of IR techniques, ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422-446, 2002.
[9]
[10]
Vartak M., Rahman S., Madden S., Parameswaran A., and Polyzotis N., SEEDB: Efficient data-driven visualization recommendations to support visual analytics, Proc. VLDB Endowment, vol. 8, no. 13, pp. 2182-2193, 2015.
DOI
[11]
Amatriain X. and Basilico J., Netflix recommendations: Beyond the 5 stars (part I), https://medium.com/netflix-techlblog, 2012.
[12]
He K. M., Zhang X. Y., Ren S. Q., and Sun J., Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition CVPR, Las Vegas, NV, USA, 2016, pp. 770-778.
[13]
Li G. L., Feng J. H., and Li C., Supporting search-as-you-type using SQL in databases, IEEE Trans. Knowl. Data Eng., vol. 25, no. 2, pp. 461-475, 2013.
[14]
Hui M., Jiang D. W., Li G. L., and Zhou Y., Supporting database applications as a service, in Proc. IEEE 25th Int. Conf. Data Engineering ICDE, Shanghai, China, 2009, pp. 832-843.
DOI
[15]
Li G. L., Ooi B. C., Feng J. H., Wang J. Y., and Zhou L. Z., EASE: An effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, in Proc. 2008 ACM SIGMOD Int. Con. Management of Data SIGMOD, Vancouver, Canada, 2008, pp. 903-914.
[16]
Li G. L., Feng J. H., Zhou X. F., and Wang J. Y., Providing built-in keyword search capabilities in RDBMS, VLDB J., vol. 20, no. 1, pp. 1-19, 2011.
[17]
Deng D., Li G. L., Feng J. H., Duan Y., and Gong Z. G., A unified framework for approximate dictionary-based entity extraction, VLDB J., vol. 24, no. 1, pp. 143-167, 2015.
[18]
Fan J., Li G. L., and Zhou L. Z., Interactive SQL query suggestion: Making databases user-friendly, in Proc. IEEE 27th Int. Conf. Data Engineering ICDE, Hannover, Germany, 2011, pp. 351-362.
DOI
[19]
Siddiqui T., Kim A., Lee J., Karahalios K., and Parameswaran A., Effortless data exploration with zenvisage: An expressive and interactive visual analytics system, Proc. VLDB Endowment, vol. 10, no. 4, pp. 457-468, 2016.
DOI
[20]
Satyanarayan A. and Heer J., Lyra: An interactive visualization design environment, Comput. Graph. Forum, vol. 33, no. 3, pp. 351-360, 2014.
[21]
Kim A., Blais E., Parameswaran A., Indyk P., Madden S., and Rubinfeld R., Rapid sampling for visualizations with ordering guarantees, Proc. VLDB Endowment, vol. 8, no. 5, pp. 521-532, 2015.
DOI
[22]
Wesley R., Eldridge M., and Terlecki P. T., An analytic data engine for visualization in tableau, in Proc. 2011 ACM SIGMOD Int. Conf. Management of Data SIGMOD, Athens, Greece, 2011, pp. 1185-1194.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 05 September 2017
Accepted: 01 December 2017
Published: 25 January 2018
Issue date: March 2018

Copyright

© The author(s) 2018

Acknowledgements

This work was supported by the National Key Basic Research and Development (973) Program of China (No. 2015CB358700) and the National Natural Science Foundation of China (Nos. 61373024, 61632016, 61422205, and 61472198).

Rights and permissions

Return