Project-4 Deep Learning and Bayesian Inference to Characterize Spatially Continuous Global Soil Moisture Error Patterns
1 Research Background
Estimating accurate soil moisture (SM) information from space and knowing the error characteristics of the data are of great importance in applying satellite-based SM data in many Earth Science disciplines. Three of the most popular methods used to calculate independent errors of global-scale SM data sources are the instrumental variable (IV) method, triple collocation analysis (TCA), and quadruple collocation analysis (QCA). Although the IV and QCA methods were proposed to overcome the limitations of TCA by making fewer assumptions, we are still limited in our ability to investigate the error characteristics of satellite- and model-based SM data because of intrinsic limitations in retrieving SM over forest and extremely dry areas, and over areas that are prone to radio frequency interference.
2 Research Objectives
In this project, I tried to provide spatially complete error maps of SM data from Soil Moisture Active Passive (SMAP), Advanced Scatterometer (ASCAT), Soil Moisture and Ocean Salinity (SMOS), Advanced Microwave Scanning Radiometer 2 (AMSR2), and Cyclone Global Navigation Satellite System (CYGNSS). We calculated these spatial distributions using the ensemble predictions from different models because Bayesian model averaging has demonstrated several advantages: 1) Improved accuracy 2) Reduced variance (better generalization) 3) Explicit accounting for uncertainty in the choice of model.
I also constructed models from the following machine learning and regression methods: 1) Deep learning models with different generalization methods 2) Hierarchical models for assessing the amount of uncertainty 3) Simple machine learning approaches from a support vector machine, random forest, eXtreme Gradient Boosting 4) Lasso/Ridge regression models.
Laslty, I focused on investigating the uncertainty analysis for model parameters by evaluating credible intervals as they help us to analyze different predictor variables and assess their importance in SM data error characterizations. Specifically, I used non-informative priors for regression parameters and hyper-parameters. I used the No-U-Turn Sampler (an extension of the Hamiltonian Monte Carlo Markov Chain; sampling approach) and Variational Inferences (approximation approach) methods to estimate these parameters.