Journal Home > Volume 9 , Issue 3

In this paper, we propose a correlation-aware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations. The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information, spatial location, and correlation distribution using Bayes’ rule. This effectively preserves statisticalproperties without merging data blocks in different parallel computing nodes and repartitioning them, thus significantly reducing the computational cost. Furthermore, this enables reconstruction of the original data more accurately than existing methods. We demonstrate the effectiveness of our technique using six datasets, with the largest having one billion grid points. The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-artmethods while providing a higher reconstruction accuracy at a lower computational cost.


menu
Abstract
Full text
Outline
About this article

Correlation-aware probabilistic data summarization for large-scale multi-block scientific data visualization

Show Author's information Yang Yang1Kecheng Lu2Yu Wu1Yunhai Wang2Yi Cao1( )
Institute of Applied Physics and ComputationalMathematics, Beijing 100094, China
School of Computer Science and Technology, Shandong University, Qingdao 266237, China

Abstract

In this paper, we propose a correlation-aware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations. The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information, spatial location, and correlation distribution using Bayes’ rule. This effectively preserves statisticalproperties without merging data blocks in different parallel computing nodes and repartitioning them, thus significantly reducing the computational cost. Furthermore, this enables reconstruction of the original data more accurately than existing methods. We demonstrate the effectiveness of our technique using six datasets, with the largest having one billion grid points. The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-artmethods while providing a higher reconstruction accuracy at a lower computational cost.

Keywords: correlation-awareness, large-scale data, multi-block methods, probabilistic data summarization

References(41)

[1]
Ahrens, J.; Hendrickson, B.; Long, G.; Miller, S.; Ross, R.; Williams, D. Data intensive science in the Department of Energy. Technical Report, LA-UR-10-07088. Los Alamos National Laboratory, 2010.
[2]
Nowell, L. Science at extreme scale: Architectural challenges and opportunities. 2014. Available at https://www.mcs.anl.gov/∼hereld/doecgf2014/slides/ScienceAtExtremeScale_DOECGF_Nowell_140424v2.pdf.
[3]
Luo, A.; Kao, D.; Pang, A. Visualizing spatial distribution data sets. In: Proceedings of the Symposium on Data Visualisation, 29–38, 2003.
[4]
Kniss, J. M.; Van Uitert, R.; Stephens, A.; Li, G.; Tasdizen, T.; Hansen, C. Statistically quantitative volume visualization. In: Proceedings of the IEEE Visualization, 287–294, 2005.
[5]
Potter, K.; Krüger, J.; Johnson, C. Towards the visualization of multi-dimensional stochastic distribution data. In: Proceedings of the International Conference on Computer Graphics and Visualization, 2008. Available at http://www.sci.utah.edu/publications/Pot2008a/ CGV08-Potter-Kruger-Johnson.pdf.
[6]
Johnson, C. R.; Huang, J. Distribution-driven visualization of volume data. IEEE Transactions on Visualization and Computer Graphics Vol. 15, No. 5, 734–746, 2009.
[7]
Gosink, L. J.; Garth, C.; Anderson, J. C.; Bethel, E. W.; Joy, K. I. An application of multivariate statistical analysis for query-driven visualization. IEEE Transactions on Visualization and Computer Graphics Vol. 17, No. 3, 264–275, 2011.
[8]
Potter, K.; Kniss, J.; Riesenfeld, R.; Johnson, C. R. Visualizing summary statistics and uncertainty. Computer Graphics Forum Vol. 29, No. 3, 823–832, 2010.
[9]
Thompson, D.; Levine, J. A.; Bennett, J. C.; Bremer, P. T.; Gyulassy, A.; Pascucci, V.; Pébay, P. P. Analysis of large-scale scalar data using hixels. In: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization, 23–30, 2011.
DOI
[10]
Liu, S. S.; Levine, J. A.; Bremer, P. T.; Pascucci, V. Gaussian mixture model based volume visualization. In: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization, 73–77, 2012.
[11]
Dutta, S.; Shen, H. W. Distribution driven extraction and tracking of features for time-varying data analysis. IEEE Transactions on Visualization and Computer Graphics Vol. 22, No. 1, 837–846, 2016.
[12]
Pöthkow, K.; Hege, H. Nonparametric models for uncertainty visualization. Computer Graphics Forum Vol. 32, No. 3pt2, 131–140, 2013.
[13]
Chaudhuri, A.; Wei, T. H.; Lee, T. Y.; Shen, H. W.; Peterka, T. Efficient range distribution query for visualizing scientific data. In: Proceedings of the IEEE Pacific Visualization Symposium, 201–208, 2014.
DOI
[14]
Nouanesengsy, B.; Woodring, J.; Patchett, J.; Myers, K.; Ahrens, J. ADR visualization: A generalized framework for ranking large-scale scientific data using Analysis-Driven Refinement. In: Proceedings of the IEEE 4th Symposium on Large Data Analysis and Visualization, 43–50, 2014.
DOI
[15]
Athawale, T.; Sakhaee, E.; Entezari, A. Isosurface visualization of data with nonparametric models for uncertainty. IEEE Transactions on Visualization and Computer Graphics Vol. 22, No. 1, 777–786, 2016.
[16]
Wei, T. H.; Chen, C. M.; Biswas, A. Efficient local histogram searching via bitmap indexing. Computer Graphics Forum Vol. 34, No. 3, 81–90, 2015.
[17]
Dutta, S.; Chen, C. M.; Heinlein, G.; Shen, H. W.; Chen, J. P. In situ distribution guided analysis and visualization of transonic jet engine simulations. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 1, 811–820, 2017.
[18]
Dutta, S.; Woodring, J.; Shen, H. W.; Chen, J. P.; Ahrens, J. Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets. In: Proceedings of the IEEE Pacific Visualization Symposium, 111–120, 2017.
DOI
[19]
Reynolds, D. R.; Gardner, D. J.; Balos, C. J.; Woodward, C. S. SUNDIALS Multiphysics+MPIMany-Vector performance testing. arXiv preprint arXiv: 1909.12966, 2019.
[20]
Wang, K. C.; Lu, K. W.; Wei, T. H.; Shareef, N.; Shen, H. W. Statistical visualization and analysis of large data using a value-based spatial distribution. In: Proceedings of the IEEE Pacific Visualization Symposium, 161–170, 2017.
DOI
[21]
Sklar, A. Fonctions de Répartition à n Dimensions et Leurs Marges. Publications de l’Institut Statistique de l’Université de Paris Vol. 8, 229–231, 1959.
[22]
Hazarika, S.; Biswas, A.; Shen, H. W. Uncertainty visualization using copula-based analysis in mixed distribution models. IEEE Transactions on Visualization and Computer Graphics Vol. 24, No. 1, 934–943, 2018.
[23]
Hazarika, S.; Dutta, S.; Shen, H. W.; Chen, J. P. CoDDA: A flexible copula-based distribution driven analysis framework for large-scale multivariate data. IEEE Transactions on Visualization and Computer Graphics Vol. 25, No. 1, 1214–1224, 2019.
[24]
Ihm, I.; Park, S. Wavelet-based 3D compression scheme for very large volume data. In: Proceedings of the Graphics Interface, 107–116, 1998.
DOI
[25]
Kim, T.; Shin, Y. An efficient wavelet-based compression method for volume rendering. In: Proceedings of the 7th Pacific Conference on Computer Graphics and Applications, 147–156, 1999.
[26]
Sasaki, N.; Sato, K.; Endo, T.; Matsuoka, S. Exploration of lossy compression for application-level checkpoint/restart. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 914–922, 2015.
DOI
[27]
Deering, M. Geometry compression. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, 13–20, 1995.
DOI
[28]
Peng, J. L.; Kuo, C.-C. J. Geometry-guided progressive lossless 3D mesh coding with octree (OT) decomposition. In: Proceedings of the ACM SIGGRAPH 2005 Papers, 609–616, 2005.
DOI
[29]
Khodakovsky, A.; Schröder, P.; Sweldens, W. Progressive geometry compression. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 271–278, 2000.
DOI
[30]
Gu, X. F.; Gortler, S. J.; Hoppe, H. Geometry images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 355–361, 2002.
DOI
[31]
Tzeng, F. Y.; Lum, E. B.; Ma, K. L. A novel interface for higher-dimensional classification of volume data. In: Proceedings of the IEEE Visualization, 505–512, 2003.
[32]
Kindlmann, G.; Whitaker, R.; Tasdizen, T.; Moller, T. Curvature-based transfer functions for direct volume rendering: Methods and applications. In: Proceedings of the IEEE Visualization, 513–520, 2003.
[33]
Tenginakai, S.; Lee, J.; Machiraju, R. Salient iso-surface detection with model-independent statistical signatures. In: Proceedings of the Visualization, 231–238, 2001.
[34]
Hladůvka, J.; König, A.; Gröller, E. Salient representation of volume data. In: Data Visualization 2001. Eurographics. Ebert, D. S.; Favre, J. M.; Peikert, R. Eds. Springer Vienna, 203–211, 2001.
DOI
[35]
Kniss, J.; Kindlmann, G.; Hansen, C. Multidimensional transfer functions for interactive volume rendering. IEEE Transactions on Visualization and Computer Graphics Vol. 8, No. 3, 270–285, 2002.
[36]
Wang, K. C.; Xu, J. Y.; Woodring, J.; Shen, H. W. Statistical super resolution for data analysis and visualization of large scale cosmological simulations. In: Proceedings of the IEEE Pacific Visualization Symposium, 303–312, 2019.
DOI
[37]
Schmidt, T. Coping with copulas. In: Copulas - From Theory to Application in Finance. Bloomberg Press, 3–34, 2006.
[38]
Bilmes, J. A. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 1998. Available at http://www.leap.ee.iisc.ac.in/sriram/teaching/MLSP_18/refs/GMM_Bilmes.pdf.
[39]
Nocedal, J.; Wright, S. Numerical Optimization. New York: Springer, 2006.
[40]
Wang, C. L.; Shen, H. W. Information theory in scientific visualization. Entropy Vol. 13, No. 1, 254–273, 2011.
[41]
Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 07 March 2022
Accepted: 01 July 2022
Published: 18 March 2023
Issue date: September 2023

Copyright

© The Author(s) 2023.

Acknowledgements

This work was supported by the Chinese Postdoctoral Science Foundation (2021M700016).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return