Our neural network learns from past growing conditions and their associated outcomes (
Williams et al 2024;
Lobell et al 2015) in order to predict future yields given climate projections (
Williams et al 2024).
More details about inputs, outputs, and climate variables >>
The inputs to the neural network include climate variables, year, and baseline yield (
mean and
std of all historic yields for the cell). The output is
mean and
std of
changes to that future yield for the grid cell. Note that our
scientific paper (preprint) considers other non-normal distribution shapes. Anyway, yield data come from SCYM (
Lobell et al 2015). Historic growing conditions and future climate
projections come from CHC-CMIP6 (
Williams et al 2024). The climate variables are as follows (
Williams et al 2024) where all are daily:
- rhn: overall relative humidity
- rhx: relative humidity peak
- tmax: maximum temperature
- tmin: minimum temperature
- chirps: precipitation
- svp: saturation vapor pressure
- vpd: vapor pressure deficit
- wbgt: wet bulb globe temperature
These daily climate values are summarized to min, max,
mean, and
std of daily values per month per grid cell before going to the neural network. We use the dataset variable names for consistency.
First, we divide up the
US Corn Belt into a
geographic grid, breaking up the problem so it becomes tractable for modeling. Next, our network goes year by year, forecasting the yields for each cell ("neighborhood") in that grid. Within these
small areas, the network describes the range of expected changes to yield as a
mean and
standard deviation per cell. This is further discussed in our
scientific paper (preprint).
More details about the grid >>
We divide the
US Corn Belt into small groups of fields to create a grid where each cell is about 28 by 20 kilometers in size (
Haugen 2020). This happens through four character geohashing, an algorithm that helps create these grids (
Niemeyer 2008). Every piece of land ends up in exactly one grid cell but there may be more land dedicated to growing maize in some areas versus others. Therefore, our model gives more weight to a cell which has more corn compared to a cell with less.
This interactive lets us try out different settings for our neural network. These different configuration options are often called hyper-parameters. Our tool looks at different depths, L2s, and dropouts as well as removing individual variables from the model to see how it performs. See
hyper-parameter details to learn more about what each setting does.
Details of hyper-parameters >>
Optimizing the model requires a lot of experimentation. In general, we can follow some basic rules:
-
A large distance between training set and validation set MAE suggests the model pays too much attention to noise in the data or is "overfit" and, in this case, need to increase L2 / dropout or decrease depth (Koehrsen 2018).
-
Error too far to the upper right suggests "underfit" and may have to increase depth or decrease dropout / L2 (Koehrsen 2018).
All that said, sometimes changing the parameters does not have the intended effect. Instead, data scientists building these models simply have to try a lot of different combinations of our hyper-parameters:
- Depth: The number of layers in the network (Brownlee 2020). Deeper networks can learn more sophisticated lessons from the data but can require more data to train (Brownlee 2019) otherwise it may memorize examples instead of learning broader trends (Brownlee 2020).
- L2: A penalty on the model for very deep pathways between neurons (Oppermann 2020). Conceptually, this avoids focusing too much on small specifics in the data unless there is strong evidence for that connection. Turning this "knob" up sets a higher bar for the evidence required for those deep connections.
- Dropout: Randomly turn off neurons while training (Srivastava et al 2014). Conceptually, this encourages finding multiple ways to use inputs in making predictions. This avoids relying on one small aspect of the data (though at the risk of disrupting learning). This "knob" turned up means neurons are disabled with higher frequency.
- Blocks: Sometimes certain data can be distracting to the model or simply having a large number of variables can become overwhelming (Karanam 2021). This option lets us take out individual climate variables.
Clicking the "Try Model" button will test a new model where we split the historic data such that the network trains from one piece (training set) before we grade it on the part it has not seen before (validation set). We score performance with
mean absolute error (
Acharya 2021). For example, if the model predicts that a neighborhood's yield will drop by 10% and it actually drops by 15%, the error would be 5%. Our goal: go left on the horizontal axis (minimize error predicting
mean) and to the bottom on the vertical axis (minimize error predicting
std).
⚡ This section contains interactive components. It will take just a moment to load the lab.
Load Interactives >>
Hyper-parameters: Visualization that allows users to try different neural network configurations and see their validation set performance. How would you balance the error between predicted mean and predicted standard deviation?
💾 Alternative data downloads:
- Sweep Ag All: Information about candidates considered in our model sweep.
These are CSV files which can be opened in spreadsheet software. These are available under the
CC-BY-NC License.
⌨️ The
hyper-parameters visualization has the following controls:
- Esc: Exit the visualization
- n: Change depth
- l: Change L2
- d: Change dropout
- b: Change variable block
- t: Try the currently selected configuration
- s: Execute model sweep
The visualization will need focus in order to recieve keyboard commands.
It can be helpful to see how the neural network responds to different configurations. However, we can also ask a computer to optimize these metrics through a sweep where it tries different combinations of different parameters to try to find the best one (
Joseph 2018). Execute this by clicking / tapping the "Run Sweep" button. For more details, see our
scientific paper (preprint). We use that optimized configuration in the rest of this explorable explanation.
More about why the sweep selected this configuration >>
Even with a computer trying hundreds of models, we still have to make some choices:
-
Do we favor the model's ability to predict the mean equally with its ability to predict standard deviation?
-
After chosing a model configuration, we retrain with all available data so do we want to emphasize a model with the best metrics in the validation set or one that has just slightly worse performance but less overfit so might do better with more examples to learn from?
-
Do we want a deeper model which might benefit from more data or a shallower model that seems to perform well in the smaller training dataset used in the sweep?
Sadly there is no one right answer but, in the selection we are using for the rest of this online lab, we prefer the ability to predict
mean over
std by a ratio in weights of 3 to 1. We also elect for a slightly deeper model in anticipation of retraining with all historic data.
The errors in the
visualization reflect a smaller dataset than the one used to train the final model. After retraining with more information, the model sees an
MAE of 6% for
mean prediction and 2% for
std prediction.
More about model performance and uncertainty >>
During the model sweep, we have to divide up our data into three different sets. In this process, we split the data by year. For example, all of the observations from 2011 end up in only one of the sets (
Shah 2017):
- The validation set allows us to compare different models together (what is used in the visualization).
- The test set provides a prediction of future performance.
- Only the training set teaches the model while we try to optimize parameters like L2, dropout, etc.
That said, after determining our model preferred model configuration, we can have the model learn from the combined
validation and
training sets before we make a final estimation of error we expect in the future using the
test set. In this final trial, we see an
MAE of 6% for
mean prediction and 2% for
std prediction. We can later use an understanding of this model uncertainty in Monte Carlo.
Anyway, with this neural network built, we run Monte Carlo simulations to see what the future of MPCI and YP might look like while incorporating these measures of neural network uncertainty.
Citations for this section
The citations for this section are as follows:
- S. Acharya, “What are RMSE and MAE?” Towards Data Science, May 13, 2021. [Online]. Available: https://towardsdatascience.com/what-are-rmse-and-mae-e405ce230383
- P. Baheti, “The Essential Guide to Neural Network Architectures.” V7Labs, Jul. 08, 2021. [Online]. Available: https://www.v7labs.com/blog/neural-network-architectures-guide
- J. Brownlee, “How Much Training Data is Required for Machine Learning?” Guiding Tech Media, May 23, 2019. [Online]. Available: https://machinelearningmastery.com/much-training-data-required-machine-learning/
- J. Brownlee, “How to Control Neural Network Model Capacity With Nodes and Layers.” Guiding Tech Media, Aug. 25, 2020. [Online]. Available: https://machinelearningmastery.com/how-to-control-neural-network-model-capacity-with-nodes-and-layers/
- B. Haugen, “Geohash Size Variation by Latitude.” Mar. 14, 2020. [Online]. Available: https://bhaugen.com/blog/geohash-sizes/
- R. Joseph, “Grid Search for Model Tuning.” Towards Data Science, Dec. 29, 2018. [Online]. Available: https://towardsdatascience.com/grid-search-for-model-tuning-3319b259367e
- S. Karanam, “Curse of Dimensionality — A ‘Curse’ to Machine Learning.” Towards Data Science, Aug. 10, 2021. [Online]. Available: https://towardsdatascience.com/curse-of-dimensionality-a-curse-to-machine-learning-c122ee33bfeb
- W. Koehrsen, “Overfitting vs. Underfitting: A Complete Example.” Towards Data Science, Jan. 28, 2018. [Online]. Available: https://towardsdatascience.com/overfitting-vs-underfitting-a-complete-example-d05dd7e19765
- D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” 2014. doi: 10.48550/ARXIV.1412.6980.
- I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” in International Conference on Learning Representations, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:53592270
- D. B. Lobell, D. Thau, C. Seifert, E. Engle, and B. Little, “A scalable satellite-based crop yield mapper,” Remote Sensing of Environment, vol. 164, pp. 324–333, Jul. 2015, doi: 10.1016/j.rse.2015.04.021.
- A. Maas, A. Hannun, and A. Ng, “Rectifier Nonlinearities Improve Neural Network Acoustic Models,” in Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia: JMLR, 2013.
- NCEI, “Corn Belt.” NOAA, 2024. [Online]. Available: https://www.ncei.noaa.gov/access/monitoring/reference-maps/corn-belt
- G. Niemeyer, “geohash.org is public!” Labix Blog, Feb. 26, 2008. [Online]. Available: https://web.archive.org/web/20080305102941/http://blog.labix.org/2008/02/26/geohashorg-is-public/
- A. Oppermann, “Regularization in Deep Learning — L1, L2, and Dropout.” Towards Data Science, Feb. 19, 2020. [Online]. Available: https://towardsdatascience.com/regularization-in-deep-learning-l1-l2-and-dropout-377e75acc036
- T. Shah, “About Train, Validation and Test Sets in Machine Learning.” Towards Data Science, Dec. 06, 2017. [Online]. Available: https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7
- D. Shiffman, The nature of code: simulating natural systems with JavaScript. San Francisco: No Starch Press, 2024.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014.
- E. Williams, C. Funk, P. Peterson, and C. Tuholske, “High resolution climate change observations and projections for the evaluation of heat-related extremes,” Sci Data, vol. 11, no. 1, p. 261, Mar. 2024, doi: 10.1038/s41597-024-03074-w.
Some definition citations are in
glossary.
Data downloads for methods
The data downloads for this methods section ("next, a neural network") are as follows:
These are available under the
CC-BY-NC License.