528
Views
18
Downloads
11
Crossref
11
WoS
11
Scopus
0
CSCD
The prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modeling tools. This study compares different statistical and machine learning-based models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modeling approaches to provide accurate and ecologically-consistent predictions.
We evaluated and compared the performance of two statistical modeling techniques, namely, generalized linear mixed models and geographically weighted regression, and four techniques based on different machine learning algorithms, namely, random forest, extreme gradient boosting, support vector machine and artificial neural network to predict fungal productivity. Model evaluation was conducted using a systematic methodology combining random, spatial and environmental blocking together with the assessment of the ecological consistency of spatially-explicit model predictions according to scientific knowledge.
Fungal productivity predictions were sensitive to the modeling approach and the number of predictors used. Moreover, the importance assigned to different predictors varied between machine learning modeling approaches. Decision tree-based models increased prediction accuracy by more than 10% compared to other machine learning approaches, and by more than 20% compared to statistical models, and resulted in higher ecological consistence of the predicted biogeographical patterns of fungal productivity.
Decision tree-based models were the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modeling data. In this study, we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. This allows for reducing the dimensions of the ecosystem space described by the predictors of the models, resulting in higher similarity between the modeling data and the environmental conditions over the whole study area. When dealing with spatial-temporal data in the analysis of biogeographical patterns, environmental blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales.
The prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modeling tools. This study compares different statistical and machine learning-based models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modeling approaches to provide accurate and ecologically-consistent predictions.
We evaluated and compared the performance of two statistical modeling techniques, namely, generalized linear mixed models and geographically weighted regression, and four techniques based on different machine learning algorithms, namely, random forest, extreme gradient boosting, support vector machine and artificial neural network to predict fungal productivity. Model evaluation was conducted using a systematic methodology combining random, spatial and environmental blocking together with the assessment of the ecological consistency of spatially-explicit model predictions according to scientific knowledge.
Fungal productivity predictions were sensitive to the modeling approach and the number of predictors used. Moreover, the importance assigned to different predictors varied between machine learning modeling approaches. Decision tree-based models increased prediction accuracy by more than 10% compared to other machine learning approaches, and by more than 20% compared to statistical models, and resulted in higher ecological consistence of the predicted biogeographical patterns of fungal productivity.
Decision tree-based models were the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modeling data. In this study, we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. This allows for reducing the dimensions of the ecosystem space described by the predictors of the models, resulting in higher similarity between the modeling data and the environmental conditions over the whole study area. When dealing with spatial-temporal data in the analysis of biogeographical patterns, environmental blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales.
Alday JG, Martínez de Aragón J, de-Miguel S, Bonet JA (2017) Mushroom biomass and diversity are driven by different spatio-temporal scales along Mediterranean elevation gradients. Sci Rep 7(1). https://doi.org/10.1038/srep45824
Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52(4): 2249–2260. https://doi.org/10.1016/j.csda.2007.08.015
Bahn V, McGill BJ (2012) Testing the predictive performance of distribution models. Oikos 122(3): 321–331. https://doi.org/10.1111/j.1600-0706.2012.00299.x
Barnard RL, Osborne CA, Firestone MK (2014) Changing precipitation pattern alters soil microbial community response to wet-up under a Mediterranean-type climate. ISME J 9(4): 946–957. https://doi.org/10.1038/ismej.2014.192
Bastin J-F, Finegold Y, Garcia C, Mollicone D, Rezende M, Routh D, Constantin MZ, Crowther TW (2019) The global tree restoration potential. Science 365(6448): 76–79. https://doi.org/10.1126/science.aax0848
Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67: 1–48. https://doi.org/10.18637/jss.v067.i01
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach. Learn Res 13: 281–305
Bonet JA, Fischer CR, Colinas C (2004) The relationship between forest age and aspect on the production of sporocarps of ectomycorrhizal fungi in Pinus sylvestris forests of the Central Pyrenees. Forest Ecol Manag 203(1–3): 157–175. https://doi.org/10.1016/j.foreco.2004.07.063
Bonet JA, Palahí M, Colinas C, Pukkala T, Fischer CR, Miina J, Martínez de Aragón J (2010) Modelling the production and species richness of wild mushrooms in pine forests of the Central Pyrenees in northeastern Spain. Can J For Res 40(2): 347–356. https://doi.org/10.1139/x09-198
Bonete IP, Arce JE, Figueiredo Filho A, Retslaff FA de S, Lanssanova LR (2020) Artificial neural networks and mixed-effects modeling to describe the stem profile of Pinus taeda L. Floresta 50(1): 1123. doi: https://doi.org/10.5380/rf.v50i1.61764
Büntgen U, Kauserud H, Egli S (2012) Linking climate variability to mushroom productivity and phenology. Front Ecol Environ 10(1): 14–19. https://doi.org/10.1890/110064
Büntgen U, Peter M, Kauserud H, Egli S (2013) Unraveling environmental drivers of a recent increase in Swiss fungi fruiting. Glob Chang Biol 19(9): 2785–2794. https://doi.org/10.1111/gcb.12263
Christin S, Hervet É, Lecomte N (2019) Applications for deep learning in ecology. Methods Ecol Evol 10: 1632–1644. https://doi.org/10.1111/2041-210X.13256
Coelho MTP, Diniz-Filho JA, Rangel TF (2018) A parsimonious view of the parsimony principle in ecology and evolution. Ecography. https://doi.org/10.1111/ecog.04228
Collado E, Bonet JA, Camarero JJ, Egli S, Peter M, Salo K, Martínez-Peña F, Ohenoja E, Martín-Pinto P, Primicia I, Büntgen U, Kurttila M, Oria-de-Rueda JA, Martínez-de-Aragón J, Miina J, de-Miguel S (2019) Mushroom productivity trends in relation to tree growth and climate across different European forest biomes. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2019.06.471
Collado E, Camarero JJ, Martínez de Aragón J, Pemán J, Bonet JA, de-Miguel S (2018) Linking fungal dynamics, tree growth and forest management in a Mediterranean pine ecosystem. Forest Ecol Manag 422: 223–232. https://doi.org/10.1016/j.foreco.2018.04.025
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88: 2783–2792. https://doi.org/10.1890/07-0539.1
De Cáceres M, Martin-StPaul N, Turco M, Cabon A, Granda V (2018) Estimating daily meteorological data and downscaling climate models over landscapes. Environ Model Softw 108: 186–196. https://doi.org/10.1016/j.envsoft.2018.08.003
de-Miguel S, Bonet JA, Pukkala T, Martínez de Aragón J (2014) Impact of forest management intensity on landscape-level mushroom productivity: a regional model-based scenario analysis. Forest Ecol Manag 330: 218–227. https://doi.org/10.1016/j.foreco.2014.07.014
Diamantopoulou MJ, Özçelik R, Crecente-Campo F, Eler Ü (2015) Estimation of Weibull function parameters for modelling tree diameter distribution using least squares and artificial neural networks methods. Biosyst Eng 133: 33–45. https://doi.org/10.1016/j.biosystemseng.2015.02.013
Dixon PA, Milicich MJ, Sugihara G (1999) Episodic fluctuations in larval supply. Science 283(5407): 1528–1530. https://doi.org/10.1126/science.283.5407.1528
Duarte E, Wainer J (2017) Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters. Pattern Recognit Lett. 88: 6–11. https://doi.org/10.1016/j.patrec.2017.01.007
Ehrlén J, Morris WF (2015) Predicting changes in the distribution and abundance of species under environmental change. Ecol Lett 18(3): 303–314. https://doi.org/10.1111/ele.12410
Gange AC, Gange EG, Sparks TH, Boddy L (2007) Rapid and recent changes in fungal fruiting patterns. Science 316(5821): 71. https://doi.org/10.1126/science.1137489
Gasch CK, Hengl T, Gräler B, Meyer H, Magney TS, Brown DJ (2015) Spatio-temporal interpolation of soil water, temperature, and electrical conductivity in 3D + T: the cook agronomy farm data set. Spat Stat 14: 70–90. https://doi.org/10.1016/j.spasta.2015.04.001
Georganos S, Abdi AM, Tenenbaum DE, Kalogirou S (2017) Examining the NDVI-rainfall relationship in the semi-arid Sahel using geographically weighted regression. J Arid Environ 146: 64–74. https://doi.org/10.1016/j.jaridenv.2017.06.004
Glassman SI, Wang IJ, Bruns TD (2017) Environmental filtering by pH and soil nutrients drives community assembly in fungi at fine spatial scales. Mol Ecol 26: 6960–6973. https://doi.org/10.1111/mec.14414
Gobeyn S, Mouton AM, Cord AF, Kaim A, Volk M, Goethals PLM (2019) Evolutionary algorithms for species distribution modelling: a review in the context of machine learning. Ecol Model 392: 179–195. https://doi.org/10.1016/j.ecolmodel.2018.11.013
Görgens EB, Montaghi A, Rodriguez LCE (2015) A performance comparison of machine learning methods to estimate the fast-growing forest plantation yield based on laser scanning metrics. Comput Electron Agric 116: 221–227. https://doi.org/10.1016/j.compag.2015.07.004
Hamilton DA Jr, Brickell JE (1983) Modeling methods for a two-state system with continuous responses. Can J For Res 13(6): 1117–1121. https://doi.org/10.1139/x83-149
Hannemann H, Willis KJ, Macias-Fauria M (2015) The devil is in the detail: unstable response functions in species distribution models challenge bulk ensemble modelling. Glob Ecol Biogeogr 25(1): 26–35. https://doi.org/10.1111/geb.12381
Hao T, Guillera-Arroita G, May TW, Lahoz-Monfort JJ, Elith J (2020) Using species distribution models for fungi. Fung Biol Rev. https://doi.org/10.1016/j.fbr.2020.01.002
Hill L, Hector A, Hemery G, Smart S, Tanadini M, Brown N (2017) Abundance distributions for tree species in Great Britain: a two-stage approach to modeling abundance using species distribution modeling and random forest. Ecol Evol 7: 1043–1056. https://doi.org/10.1002/ece3.2661
Hughes G (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1): 55–63. https://doi.org/10.1109/tit.1968.1054102
Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. Learn Intell Optim. 507–523. https://doi.org/10.1007/978-3-642-25566-3_40
Juel A, Groom GB, Svenning J-C, Ejrnæs R (2015) Spatial application of random forest models for fine-scale coastal vegetation classification using object based analysis of aerial orthophoto and DEM data. Int J Appl Earth Obs Geoinf 42: 106–114. https://doi.org/10.1016/j.jag.2015.05.008
Karavani A, De Cáceres M, Martínez de Aragón J, Bonet JA, de-Miguel S (2018) Effect of climatic and soil moisture conditions on mushroom productivity and related ecosystem services in Mediterranean pine stands facing climate change. Agric Forest Meteorol 248: 432–440. doi: https://doi.org/10.1016/j.agrformet.2017.10.024
Kauserud H, Stige LC, Vik JO, Okland RH, Hoiland K, Stenseth NC (2008) Mushroom fruiting and climate change. PNAS 105(10): 3811–3814. https://doi.org/10.1073/pnas.0709037105
Kauserud H, Heegaard E, Semenov MA, Boddy L, Halvorsen R, Stige LC, Sparks TH, Gange AC, Stenseth NC (2009) Climate change and spring-fruiting fungi. Proc R Soc B Biol Sci 277: 1169–1177. https://doi.org/10.1098/rspb.2009.1537
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York. https://doi.org/10.1007/978-1-4614-6849-3
Lang M, Binder M, Richter J, Schratz P, Pfisterer F, Coors, S, Au, Q, Casalicchio, G, Kotthoff, L, Bischl, B (2019) mlr3: a modern objectoriented machine learning framework in R. J Open Source Softw. https://doi.org/10.21105/joss.01903
Le Rest K, Pinaud D, Monestiez P, Chadoeuf J, Bretagnolle V (2014) Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation. Glob Ecol Biogeogr 23(7): 811–820. https://doi.org/10.1111/geb.12161
Liang J, Crowther TW, Picard N, Wiser S, Zhou M, Alberti G, Schulze ED, McGuire AD, Bozzato F, Pretzsch H, De-Miguel S, Paquette A, Herault B, Scherer-Lorenzen M, Barrett CB, Glick HB, Hengeveld GM, Nabuurs GJ, Pfautsch S, Viana H, Vibrans AC, Ammer C, Schall P, Verbyla D, Tchebakova N, Fischer M, Watson JV, HYH C, Lei XD, Schelhaas MJ, Lu HC, Gianelle D, Parfenova EI, Salas C, Lee E, Lee B, Kim HS, Bruelheide H, Coomes DA, Piotto D, Sunderland T, Schmid B, Gourlet-Fleury S, Sonke B, Tavani R, Zhu J, Brandl S, Vayreda J, Kitahara F, Searle EB, Neldner VJ, Ngugi MR, Baraloto C, Frizzera L, Balazy R, Oleksyn J, Zawila-Niedzwiecki T, Bouriaud O, Bussotti F, Finer L, Jaroszewicz B, Jucker T, Valladares F, Jagodzinski AM, Peri PL, Gonmadje C, Marthy W, O'Brien T, Martin EH, Marshall AR, Rovero F, Bitariho R, Niklaus PA, Alvarez-Loayza P, Chamuya N, Valencia R, Mortier F, Wortel V, Engone-Obiang NL, Ferreira LV, Odeke DE, Vasquez RM, Lewis SL, Reich PB (2016) Positive biodiversity-productivity relationship predominant in global forests. Science 354(6309): aaf8957. https://doi.org/10.1126/science.aaf8957
Martínez de Aragón J, Bonet JA, Fischer CR, Colinas C (2007) Productivity of ectomycorrhizal and selected edible saprotrophic fungi in pine forests of the pre-Pyrenees mountains, Spain: predictive equations for forest management of mycological resources. Forest Ecol Manag 252(1–3): 239–256. https://doi.org/10.1016/j.foreco.2007.06.040
Meyer H, Reudenbach C, Hengl T, Katurji M, Nauss T (2018) Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environ Model Softw 101: 1–9. https://doi.org/10.1016/j.envsoft.2017.12.001
Meyer H, Reudenbach C, Wöllauer S, Nauss T (2019a) Importance of spatial predictor variable selection in machine learning applications - moving from data reproduction to spatial prediction. Ecol Model 411: 108815. https://doi.org/10.1016/j.ecolmodel.2019.108815
Micheletti N, Foresti L, Robert S, Leuenberger M, Pedrazzini A, Jaboyedoff M, Kanevski M (2013) Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci 46(1): 33–57. https://doi.org/10.1007/s11004-013-9511-0
Mohan JE, Cowden CC, Baas P, Dawadi A, Frankson PT, Helmick K, Hughes E, Khan S, Lang A, Machmuller M, Taylor M, Witt CA (2014) Mycorrhizal fungi mediation of terrestrial ecosystem responses to global change: mini-review. Fungal Ecol 10: 3–19. https://doi.org/10.1016/j.funeco.2014.01.005
Özçelik R, Diamantopoulou MJ, Crecente-Campo F, Eler U (2013) Estimating Crimean juniper tree height using nonlinear regression and artificial neural network models. Forest Ecol Manag 306: 52–60. https://doi.org/10.1016/j.foreco.2013.06.009
Pohjankukka J, Pahikkala T, Nevalainen P, Heikkonen J (2017) Estimating the prediction performance of spatial models via spatial k-fold cross validation. Int J Geogr Inf Sci 31(10): 2001–2019. https://doi.org/10.1080/13658816.2017.1346255
Prasad AM (2018) Machine learning for macroscale ecological niche modeling - a multi-model, multi-response ensemble technique for tree species management under climate change. Mach Learn Ecol Sust Nat Res Manag: 123–139. https://doi.org/10.1007/978-3-319-96978-7_6
Prasad A, Iverson L, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2): 181–199. https://doi.org/10.1007/s10021-005-0054-1
Propastin PA (2009) Spatial non-stationarity and scale-dependency of prediction accuracy in the remote estimation of LAI over a tropical rainforest in Sulawesi, Indonesia. Remote Sens Environ 113(10): 2234–2242. https://doi.org/10.1016/j.rse.2009.06.007
Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G, Dormann CF (2017) Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40(8): 913–929. https://doi.org/10.1111/ecog.02881
Sánchez-González M, de-Miguel S, Martin-Pinto P, Martínez-Peña F, Pasalodos-Tato M, Oria-de-Rueda JA, Martínez de Aragón J, Canellas I, Bonet JA (2019) Yield models for predicting aboveground ectomycorrhizal fungal productivity in Pinus sylvestris and Pinus pinaster stands of northern Spain. Forest Ecosyst 6(1): 52. https://doi.org/10.1186/s40663-019-0211-1
Schratz P, Muenchow J, Iturritxa E, Richter J, Brenning A (2019) Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecol Model 406: 109–120. https://doi.org/10.1016/j.ecolmodel.2019.06.002
Shmueli G (2010) To explain or to predict? Stat Sci 25(3): 289–310. https://doi.org/10.1214/10-STS330
Smoliński S, Radtke K (2016) Spatial prediction of demersal fish diversity in the Baltic Sea: comparison of machine learning and regression-based techniques. ICES J Marine Sci. https://doi.org/10.1093/icesjms/fsw136
Snowdon P (1991) A ratio estimator for bias correction in logarithmic regressions. Can J For Res 21(5): 720–724. https://doi.org/10.1139/x91-101
Stojanova D, Panov P, Gjorgjioski V, Kobler A, Džeroski S (2010) Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol Inf 5(4): 256–266. https://doi.org/10.1016/j.ecoinf.2010.03.004
Stokland JN, Siitonen J, Jonsson BG (2012) Biodiversity in dead Wood, biodiversity in dead Wood. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139025843
Taye ZM, Martínez-Peña F, Bonet JA, Martínez de Aragón J, de-Miguel S (2016) Meteorological conditions and site characteristics driving edible mushroom production in Pinus pinaster forests of Central Spain. Fungal Ecol 23: 30–41. https://doi.org/10.1016/j.funeco.2016.05.008
Thessen A (2016) Adoption of machine learning techniques in ecology and earth science. One Ecosyst 1: e8621. https://doi.org/10.3897/oneeco.1.e8621
Thornton PE, Running SW (1999) An improved algorithm for estimating incident daily solar radiation from measurements of temperature, humidity, and precipitation. Agric Forest Meteorol 93(4): 211–228. https://doi.org/10.1016/s0168-1923(98)00126-9
Thornton PE, Running SW, White MA (1997) Generating surfaces of daily meteorological variables over large regions of complex terrain. J Hydrol 190(3–4): 214–251. https://doi.org/10.1016/s0022-1694(96)03128-9
Thuiller W (2003) BIOMOD - optimizing predictions of species distributions and projecting potential future shifts under global change. Glob Change Biol 9(10): 1353–1362. https://doi.org/10.1046/j.1365-2486.2003.00666.x
Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14): 1986–1994. https://doi.org/10.1093/bioinformatics/btr300
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018) blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Method Ecol Evol. doi: https://doi.org/10.1111/2041-210x.13107
Wood SN, Thomas MB (1999) Super-sensitivity to structure in biological models. Proc R Soc Lond B Biol Sci 266(1419): 565–570. https://doi.org/10.1098/rspb.1999.0673
Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77(1), 1-17. Doi: https://doi.org/10.18637/jss.v077.i01
Ye H, Beamish RJ, Glaser SM, Grant SC, Hsieh C, Richards LJ, Schnute JT, Sugihara G (2015) Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. PNAS 112:E1569–E1576. https://doi.org/10.1073/pnas.1417063112
Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.