Machine Learning Emulation of Microwave Forward Radiative Transfer

Machine Learning Emulation of Microwave Forward Radiative Transfer

Emulating the functional response of a complex Radiative Transfer Model using deep learning leverages the value of the model by allowing it to run in cutting edge distributed computing while maintaining its core scientific reliability. 

Overview

Quite often the data we have are not the data we want, and the data we want are not the data we need. This endless cycle of data collection, analysis and more data collection is fundamental to advancing atmospheric science, and in particular, making the best use of remote sensing of the atmosphere.

Modeling the scattering, absorption, and transmission of electromagnetic radiation as it traverses an atmospheric state (Radiative Transfer Modeling, or RTM) is part of the complex and iterative process of retrieving the three-dimensional atmospheric state from satellite sensor observations.  RTM is thus fundamental to fully using these remotely sensed data in real-time weather analysis and forecasting. The growing supply of observations from the proliferation of low-cost satellites offers new and unique insights into the atmosphere, however the computationally expensive current RTM calculations do not scale well with this rapid growth in sensor observations.

To help ensure that more really is better, AER investigated machine learning approaches in RTM to improve computational speed while maintaining fidelity in the retrievals. Our investigation showed positive results using a deep neural network (DNN) to emulate high-fidelity iterative RTM results using training data sampled from the NOAA18 microwave sounder. These results help to inform future development of light-weight models that could be integrated with larger numerical weather prediction (NWP) systems, further streamlining the process from raw sensor data to actionable weather forecasts. For more information on the larger efforts to integrate machine learning into atmospheric remote sensing and NWP systems, please see Leveraging Modern Artificial Intelligence for Remote Sensing and NWP: Benefits and Challenges, and Artificial Intelligence May Be Key to Better Weather Forecasts

Approach

Retrieving an atmospheric state from a set of satellite-observed radiances requires multiple iterations of an RTM to match a modeled atmospheric profile to an observed set of radiances.  This inverse problem requires iteration because solutions are not unique—multiple profiles could produce the same set of radiances, and the fundamental task is identifying the best or most likely profile.  This computationally expensive process impedes integration of new atmospheric data into current operational systems.  This expense is rooted in the full-physics approach to solving the problem, motivating the question: is there another way?

Advances in cloud-hosted computing infrastructure offer an opportunity to explore the RTM inverse problem at the intersection of atmospheric and data sciences. A machine learning approach, trading the runtime physics of RTM for a well-trained but computationally simpler neural network, may provide a means to accommodate the proliferation of raw sensor data without the present bottlenecks in processing.

To provide initial insights, we sought to emulate current full-physics RTM results with a machine learning model. Specifically, we intended that this model:

  1. Faithfully reproduce model outputs, given model inputs
    1. Across a training data set
    2. Across a testing data set
  2. Reproduce those outputs faster than the physics-based model

We explored candidate models using three well-known approaches: a partial least-squares (PLS) method; a shallow neural network (SNN); and a deep neural network (DNN).

Partial Least Squares (PLS) regression combines classic least-squares with a statistical approach usable when the predictor matrix has more variables than observations, or when there is multicollinearity among predictors. In our case, the predictor matrix represents the profiles and the predicted matrix is the channel response to the profiles.  The results of this process formulate our baseline of performance, with the expectation that this linear based approach will work, but it should not work nearly as well as a non-linear NN or DNN based approach – therefore any architecture we devise is compared to a best available linear approach (PLS, in our case), thus providing a frame of reference for any incremental increase in accuracy above and beyond a simple, linear regression based approach.

The second method applied was a simple shallow neural network (SNN).  The inspiration for this network was taken from Taylor et al. 2015 [https://doi.org/10.1016/J.JQSRT.2015.08.018].  The network consists of the input layer, followed by two hidden layers and the output layer.  The number of neurons in the first hidden layer were set to 2,028, while the second hidden layer contained 514 neurons.  The result is a network with 2,945,330 trainable parameters.  Theoretically, this combination of layers in a feedforward neural network with nonlinear activation functions is a type of universal function approximator, which implies that given enough degrees of freedom (neurons), these systems could learn the mathematical mapping between inputs and outputs.  The data used to train this network was described above.  This approach does produce a model that can learn, and generalize, but it is not able to exactly match the CRTM results (at-least not to within a standard deviation of error < 0.05 K), either in training or testing. 

The third method applied was a very deep neural network (DNN).  We experimented with multiple iterations on this theme, but the basic network architecture remained the same.  This network consisted of multiple, locally connected layers.  The inspiration here was to create a network that, in some fashion, could more readily learn something like a weighting function, whence the locally connected layers constitute filters across the atmospheric profiles of temperature, water vapor, ozone and cloud properties.

Figure 1. Deep Neural Network Architecture

 

Data

The data used for the radiative transfer model emulator was derived from the Community Radiative Transfer Model (CRTM) of the NOAA-18 microwave sounder. We gathered data for multiple orbits across multiple days – both the CRTM channel radiances and the corresponding European Center for Midrange Weather Forecast (ECMWF) model profile data.  The data were gathered and processed into formats amenable for use in TensorFlow and other statistical packages.  The targets were the 20 channel radiances from the CRTM model, and the input data was the ECMWF profiles.  The total number of features in the input data was 652 and each row represented a profile/radiance pair.  We performed standard scaling and one-hot-encoding on the data and proceed to run two distinct types of statistical methods and compare the results. 

We randomly sampled 82 orbits of data (about 10,000 profiles from each orbit).  The sampling resulted in 777,060 rows of data.  We trained the algorithms on this dataset – it represents a robust sample of the about 16 million rows present in all of the 82 orbits. 

The breakdown by surface type was approximately 60% of the profiles over ocean, 20% over land, 10% each for snow and ice.

Computation

All computations in this project were done on cloud resources. Data were stored in Amazon Web Service (AWS) Simple Storage Service (S3). Training was done using the AWS-managed ML service SageMaker. This allowed us to quickly scale up the computational resources as needed to economically train very large models using gigabyte to terabyte-scale datasets using a well-known set of tools in R and Python using a common Jupyter Notebook interface.

Results

The results of the training, as well as three subsequent experiments, are presented in Figures 2 and 3.  Figure 2 represents the training results for the DNN architecture.  In this case, we used 70% of one orbit to train the DNN and the remaining 30% of the orbit to validate the training.  The maps show the side-by-side comparison of brightness temperature for the model (CRTM) and the DNN model emulator.  The lower panel in Figure 2 shows the goodness of fit to the validation data that was left out of the training demonstrating that the DNN is effectively able to reproduce the model results (where we restrict ourselves to the same orbit).  The results of the three experiments are presented in a series of plots.  First, we have panels displaying the mapped channel brightness temperature for channel 1.  These panels (Figure 3) show the model (CRTM), the model emulator (PLS, SNN, DNN) and the mapped difference between CRTM and the emulator along with a goodness of fit plot listing statistics pertaining to the overall accuracy of each approach.  The results show an overall increase in accuracy as we move from simple (PLS), to complex (SNN), to very complex (DNN).  The results for the PLS represent the best-fit to the target data (i.e. the test data are in sample for the PLS).  The results for the S/DNN are generated by applying the models that were trained using the data described in the previous section to a days-worth (8 Sep 2018 – which was a day that was not included in the training for the S/DNN models) of ECMWF profiles and comparing to the corresponding CRTM model runs.  Comparing the predictions that result from running the S/DNN architectures to the best-fit linear approach shows that the architectures do a better job fitting the data and thus generalize to cases that were not encountered in training. We feel this finding is significant and that, given the proper amount of training data, the S/DNN approaches would have a good chance at generalization, that is, the S/DNN would learn the mapping of inputs to outputs for any arbitrary case and become suitable emulators of the physical RT model. 

  

Figure 2. Example training exercise for DNN on one orbit of data from November of 2017

 

This exercise did demonstrate that the DNN can replicate the physical model in training.  This is noteworthy given that other techniques cannot.  Not surprisingly, PLS cannot.  We also tested extreme gradient boosted trees (XGBoost), which is an ensemble of shallow learners.  This technique, too, was not able to reproduce the training data (in sample predictions).  

Figure 3a. PLS example results

 

Figure 3b. Example SNN performance

 

Figure 3c. Example DNN performance

 

In this initial effort, we have shown that an ML-informed RTM can learn the mapping domain from input to output, and that the ML system generalized better than a linear benchmark (indicative of effective learning, not over-fitting).  That is, we can faithfully reproduce a full physics-informed RTM using machine learning.

With respect to the speed of the model, there are a few things to consider.  The first is that with standard ML libraries, once a model is trained it is readily deployable in a GPU environment.  This means that any speed-up only requires additional GPU resources and not refactoring of code.  Further, the ability to emulate the radiative transfer forward process indicates that containing the inverse process within an ML model is achievable.

Our findings suggest that a Deep Neural Network (DNN), trained over a sufficient quantity of data, will have contained within the weights and activation functions the ability to “look-up” the most likely profile.  Another way to think of it is that the iterations that are performed as part of the training in the DNN may effectively encapsulate the iterative process undertaken in the traditional maximum likelihood approach.

Conclusion

This effort offers a path forward to the integration of scientifically verifiable and physically realistic modeling in complex domains with modern cloud-based machine learning algorithms. The emulation in this effort did not seek to improve on the physics-based models AER has developed in the radiative transfer domain. Rather, the intent was to create functional mappings of those processes into a machine learning model that can be scaled across cloud resources to meet the computational needs of the satellite observation data stream. 

In this effort we attempted a large scale one-for-one emulation that would return a very complex model capturing the whole domain of microwave RT on a global scale. These results suggest that similar approaches, emulating small subsets of other domains, might be solvable using light-weight models more appropriate to integration with operational NWP systems.

The authors would like to acknowledge the support of AER internal research and development funds and the collaboration of NOAA STAR in this work.