The Science and Practice of Seasonal Climate Forecasting
The effect of time-averaging on a weak but consistent climate influence
To understand the effect of long-time averaging more fully, consider that the average effect of a La Nina on the rainfall in a given Southern Hemisphere country in the South American extratropics during the November through February period is to decrease the rainfall to 120 mm less than the average amount over the four months. This decrease results in a rainfall amount of less than what has occurred in 70% of the years during a recent 30-year period, and therefore greater than what occurs in 30% of the years. This rainfall deficit would be noticeable, and might well have practical consequences. Now let’s focus on a single 5-day period within the 4-month season, such as December 13-17, and assume that the best guess of the expected effect of the La Nina is proportional to that found over the 4-month period. Since there are about 24 5-day periods in the 4-month period, there would be an expected reduction of 120/24, or 5 mm. This amount would be difficult to notice when compared to the year-to-year variability of 5-day rainfall totals that occur during mid-December in Uruguay. In some years the 5-day total would have 3 or less mm of rainfall, while in other years there could be more than 75 mm. In fact, the expected 5-day rainfall reduction due to La Nina produces a rainfall total that is less than what occurs in 54% of the years during a recent 30-year period, and therefore greater than what occurs in 46% of the years. The smallness of the departures from 50% reflects the small effect of the La Nina on 5-day rainfall in this South American location in summer, since it is overshadowed by the variability of the day-to-day weather changes that dominate 5-day total rainfalls. Over longer periods, day-to-day variations tend to cancel themselves out and the more subtle effect of ENSO remains as a much more obvious residual. (Note that while ENSO is used as the climate-affecting factor in this example, there are other sources of climate variability besides ENSO, as for example the effects of SST anomalies in locations other than in the tropical Pacific Ocean.)
3. The IRI Forecast Process
Starting in late 1997, the IRI has made near-global forecasts of anomalies of temperature and precipitation four times per year. Each forecast contains an outlook for the coming 3-month season and for the 3-month season following that one. Forecasts are issued in late March, June, September, and December. The forecast made in late June, for example, would contain outlooks for the periods of July-September and October-December. In the discussions that follow, the two seasons being forecast are called the two target seasons.
Because anomalies of SST—particularly tropical SST--are known to be a fundamental driver of atmospheric climate anomalies, forecasting the SST anomalies for the two target seasons is the first part of the climate forecasting task. Since SST anomalies usually do not change quickly, use of the observed SST anomalies for the most recent month is done for the first forecast period, in addition to using forecast SST anomalies. The version using persisted SST anomalies can be thought of as a baseline forecast that is roughly representative, and in fact may be weighted fairly highly for certain regions that are governed by the SST in locations where the SST forecast is known to be very uncertain. That is, when the forecast SST has low confidence, the persisted SST anomaly is more likely to be used as a fallback. In many cases the persisted and the forecast SST are found to differ by only small amounts over appreciable areas of the globe, and by moderate amounts only in a minority of locations. The second target season cannot be approximated using persisted SST anomalies because the anomalies may change significantly in many regions over 4 to 6 months.
The SST forecasts are made using a combination of dynamical and statistical models. During the 1997-2001 period, coupled ocean-atmosphere model predictions of tropical Pacific SST were run covering the two forecast target periods. During the late 1990s and through the time of this writing in 2001, the model given the most weight for Pacific SST is that of the National Centers for Environmental Prediction (NCEP). The NCEP forecasts cover the area from 30N to 25S, and 70W to 120E. Predictions from other dynamical models are also examined, such as the COLA model and the model run at the European Center for Medium Range Weather Forecasts (ECMWF). Forecasts of the tropical Atlantic SST are made using the statistical canonical correlation analysis (CCA) model of CPTEC/INPE in Brazil, using the tropical Atlantic and Pacific SST fields as predictors. Similarly, forecasts of the Indian Ocean are presently done at the IRI using a CCA model, using the observed Indo-Pacific SST anomalies, and the forecasts of the Pacific SST field, as predictors. This Indian Ocean CCA model makes use of, among other things, the observed tendency for the Indian Ocean’s SST anomalies to approximately follow the ENSO-related SST anomalies of the tropical Pacific, with a lag time of about one season. Prediction of the Indian, and especially the tropical Atlantic, ocean is more difficult than the tropical Pacific, and improved dynamical models for predicting them are needed. Examination of the dynamical model forecasts of the global SST run at the ECMWF indicates that they may have sufficient skill to replace the statistical predictions now used for the Atlantic and Indian oceans. For the extratropical latitudes of all of the oceans, the SST anomalies of the most recent month are slowly damped toward the target period’s observed climatological normals, such that the anomalies become about three-quarters of their observed value by the middle month of the initial target period and about one-fifth of their observed value by the middle month of the second target period. This crude forecasting procedure for extratropical SST is based on the fact that extratropical SST is often unable to be predicted with adequate skill, and also that the SST outside the tropics has a less significant influence on the continental climate than does tropical SST.
Once the data for the persisted and the forecast versions of the global SST are in hand, the second step of the two-tiered forecasting task can begin. This consists of using the SST as a prescribed condition for the running of several atmospheric general circulation models (GCMs) to predict the global climate. The GCMs predict a large set of features of the atmosphere over the globe, such as geopotential height, surface pressure, winds, moisture, and so forth. The predicted variables of most interest to us are the surface air temperature and precipitation. The SST predictions are called boundary conditions because they represent the lower boundary of the atmosphere in the GCM runs. Three models are currently (as of early 2001) run for each forecast: The MRF9 from the NCEP, the ECHAM3 from Max Planck Institute, and the CCM3 from the National Center for Atmospheric Research (NCAR). These models are run at T42 spatial resolution, which is approximately 2.8 degrees of longitude and latitude, and using about 18 vertical levels. During the years 2001 and 2002, more refined versions of these models (e.g. higher spatial resolution [such as T63], or improved physical representations), and perhaps additional models to the current three, may be used. (Additional models may be, for example, those of COLA in Calverton, Maryland, or of NASA/Goddard in Greenbelt, Maryland.) Ensembles of 10 runs are produced from each of the models, where the different ensemble members are exposed to the same predicted (or persisted) SST boundary conditions but are initialized with differing atmospheric initial conditions. The fact that the initial atmospheric conditions are not the actual observed ones does not degrade the seasonal forecast, since those conditions lose their predictable impact after the first 1 to 2 weeks as discussed above in the context of the prediction of individual weather events. Thus, while the forecast results using different initial conditions may differ from one another quite substantially, their accuracy would not benefit from using the actually observed conditions. The use of ensembles provides some idea of the probability distribution of outcomes, as well as the mean outcome which may reasonably be regarded as a best guess for the forecast. The use of several GCMs helps achieve a greater area of the globe that has some reliable forecast indications, given that each model’s strengths and weaknesses differ and the weaknesses of one may be counterbalanced by the strengths of another. In total, there are thus 30 runs from the three models, and double this number for the first target period if we includes the persisted SST runs as well, which may be compared with the predicted SST runs to estimate the sensitivity of the climate forecasts to the SST differences.
One way that the reliability of the forecasts of each model in a given location and season is assessed, is through hindcast experiments in which the observed SST of past years is used as a boundary condition and the models are run to simulate the seasonal atmospheric climate. The real observations are then examined to determine the model’s accuracy. This kind of model skill evaluation, when extended over a sufficiently large number of years, indicates how realistically the model responds to SST when the SST is perfectly known. In real forecasting, of course, the SST itself is predicted, and the SST prediction will be imperfect. Thus, the skill estimates obtained from this sort of model hindcasting using observed SST can be considered as un upper limit of forecast skill. More realistic estimates of model skill are produced in a so-called retrospective forecasting exercise, in which observed SST information is not permitted once the model run has begun. Rather, SST hindcasts are used as the boundary conditions for the GCM. Skill evaluations usually confirm the existence of regions that are observed to have predictive skill due to associations with recurring SST anomaly patterns of certain types, such as those of El Nino and La Nina. A weakness of relying on the skill computed over a large number of years is that there is a great variety of climate situations that has taken place, and it is possible that the instances of highest skill are concentrated more in certain climate situations than in others. It is possible to analyze the accuracy in terms of such climate situations by looking at those cases alone. However, when there are only a few (e.g. 6 or fewer) cases of a given type of situation, then there is a danger that the result could have been accidental, or partly so. Even so, the results may be suggestive and thus informative. Whether stratified by climate situation or not, estimates of skill using hindcasting help provide evidence of the regions, and times of the year, in which a given GCM’s forecasts can be relied upon, and to what extent.
The output of dynamical models may have systematic errors due to the model physics or the parameterizations that are used to represent certain physical processes indirectly. For example, a model may tend to forecast too much precipitation in eastern Africa in November and December (the "short rains"), but only when the rainfall is predicted to be above normal, such as during El Ninos. This conditional bias could be corrected by applying a downward adjustment of the rainfall only in the upper portion of the rainfall forecast distribution, where the adjustment would be larger as the predicted rainfall was larger. This kind of error is known as a conditional bias. Other examples of systematic errors are constant biases (e.g. always forecasting 50 mm more than the actual rainfall), insufficient variance (wettest and driest periods predicted too weakly), or combinations of these. The presence of most types of systematic error would degrade skill results as reflected in most skill measures, although not necessarily all of them. (For example, a constant bias or a perfectly linear conditional bias would not degrade the temporal correlation skill score.) At the IRI, systematic errors are presently being corrected in two ways. In one method, a contingency table of the ensemble mean forecast versus observed temperature or precipitation is used, expressed in terms of terciles (the lower, middle and upper thirds of the climatological distribution, to be discussed in more detail below). There are 9 cells in the table, one for each combination of the forecast tercle and the observed tercile associated with that forecast. High skill would be revealed by a large percentage of entries in the three cells for corresponding (matched) terciles in the forecasts and observations, while low skill would show up as a near-uniform distribution across the observation terciles for each forecast tercile. However, systematic errors would also be revealed by the table, as, for example, when forecasts for above normal temperature result in near-normal observed temperature more frequently than above normal temperature. The statistical corrections use the table to translate the forecast tercile into its historically observed implication for the observed tercile. The forecast is thereby "rebuilt". Another method to evaluate the tercile forecast probabilities is in terms of the percentages of the ensemble members falling into each tercile—that is, the frequency distribution of the individual ensemble forecasts across the tercile categories. This distribution (the implied forecast probability distribution) is then compared to the frequency of occurrence of the subsequent observations in the historical records. An attempt is then made to adjust the model’s implied forecast probabilities to get better reliability, based on the forecast versus observed distributions. Here, "reliability" refers to the correspondence between the probability forecast and the frequency distribution of the subsequent observations (of tercile occurrences) following that probability forecast, based on a large enough number of cases to make such an evaluation. For example, suppose all the cases in which the forecast had just one out of 10 ensemble members in the above normal tercile were collected and examined, and suppose the observations showed that the above normal tercile occurred 25% of the time. Then the implied above-normal tercile forecast probability of 10% from forecasts of this type would be calibrated upward accordingly. This method of correcting systematic errors differs from the first method discussed above in that characteristics of the forecast ensemble distribution are used, as opposed to just the ensemble mean. This is a more refined approach, but it has a weakness of being based on too few cases having each of the many possible probability types. The first method is less refined but is more assured of getting sufficient sample sizes for each of the three forecast terciles.
In addition to dynamical models and their statistical corrections, purely statistical forecast models are also used to some extent, particularly when there is a well defined climate event such as an El Nino or La Nina. Statistical models analyze historical data and identify relationships between precursors and consequences. They are distinct from dynamical models in their lack of use of any physical equations. In a statistical approach, certain variables are typically designated as predictors, while others are predictands—that which needs to be predicted. A common predictor choice is the SST anomaly field in recent months in regions considered pertinent to the predictand, and the predictand in our case is the temperature or precipitation in the future season and region of interest. Typical methods are simple linear regression or multiple linear regression, or its multivariate (pattern) counterpart—canonical correlation analysis (CCA) or singular value decomposition (SVD). The amplitudes of empirical orthogonal functions (EOFs) of the predictor fields are sometimes used as the predictors in order to concentrate the predictor data into a small number of variables believed to be meaningful. A simpler statistical method, and one that has been used at the IRI, is that of composites. A composite is a quantitative summary of what was observed in the past under climate circumstances deemed similar to that in progress for the current forecast, such as an El Nino. An objective criterion would be used to identify past years having an analogous climate state, to within some tolerance. The subsequent climate result would then be summarized through its mean and its variance, where the variance could be couched in terms of a frequency distribution with respect to the terciles of the climatological distribution. Such a frequency distribution may then be interpreted as representing the forecast probabilities for the terciles, perhaps subject to some conservative adjustment for sampling variability. A closely related method would use a set of best analogs to the current climate state, and use them in the same way as in the composite method. The only difference between composites and analogs is that composites are usually used with respect to a well defined climate phenomenon such as an ENSO phase, while analogs can be identified for any climate state whatsoever. A danger in the use of analogs is that there may be only a very few (e.g. 5 or less) cases that can be considered reasonably good analogs, and the problem of uncertainty related to the small sample could cast doubt on the reliability of the resulting forecast.
During the 1997-2001 period, the IRI has used three dynamical model forecasts--ECHAM3, CCM3, NCEP-MRF9--for the global temperature and precipitation for each of the two target seasons. Two of the models (ECHAM3 and CCM3) are run with persisted (from the most recent month) SST as well as with forecast SST. Each of these forecasts is shown in terms of the ensemble mean, and the uncorrected distribution of the 10 ensemble members across the terciles. Also at hand are the spatial distributions of the hindcast skills of each model for the case of perfectly known SST, expressed as a temporal correlation coefficient, and results of the two types of statistical correction discussed above. In addition, purely statistical forecast guidance is sometimes present for certain regions, particularly when a warm or cold ENSO is in progress. An analog forecast keyed to the Nino 3.4 SST is also run for some regions, using an average of the 10 most similar past years. All of the tools mentioned so far are derived at the IRI. These are further supplemented by any forecasts that have already been produced for the same or similar target periods at climate outlook fora for various regions. It is IRI policy to respect the forecasts that are in effect at the time IRI issues its forecast. In fact, even without such locally produced forecasts, the IRI advises that its own forecasts be further detailed and/or interpreted by the local weather and climate authorities. There are instances in which the IRI forecast do not agree with the forecasts produced locally at a climate outlook forum. This is usually the case when the dynamical models do not support the features indicated in the forum forecast, which may have been derived more on the basis of statistical indicators or other nationally or otherwise locally accepted criteria. In those cases the IRI forecast is usually brought into closer alignment with the forum forecast by compromising. In other words, the forum forecast is used as a fairly heavily weighted input to the final IRI forecast. Nonetheless, some disagreement may remain, and as always, users are advised to acknowledge the difference but follow the forecast officially issued by their local authorities. Because the IRI’s final forecast is the result of a combination of many inputs, and the combining process is partly objective and partly subjective (involving human deliberation and intuition), a final IRI forecast is called a net assessment.
The part of the forecast process that has been subjective—the combining of the forecasts of the different models on the basis of their historical track records—is currently being objectified and automated. The time saved by the automation is expected to enable net assessments to begin being issued once per month rather than once per 3 months sometime during year 2001. Even before that increase in frequency of issuance occurs, the forecasts of each dynamical model are made available, individually, on the web each month at http://iri.columbia.edu/climate/forecast/net_asmt/
The net assessment is issued in the form of maps that show regions having homogeneous forecast probabilities for the below, near and above normal terciles. The probabilities always add to 100. While the probabilities within one region are shown as being uniform, this is an approximation to what is actually a more locally varying field of probability. However, climate anomalies usually occur on a large scale, and the regions are drawn to represent the average over the entire region. In cases where there is an inner core of much stronger probabilities, this would be shown. Similarly, the borders of a forecast region are intended to be approximate. Since it is impossible to judge the borders exactly, the probabilities may be considered to be about half as far from 33% near the borders when the region is surrounded by a "C" region, indicating climatological probabilities. Where climatological probabilities are shown, there are no indications for the forecast and the probabilities for all terciles are indicated to be 33.3%. Where a non-climatology forecast is given, the probabilities favor one or more of the terciles over the other(s), and the probabilities for these are more than 33.3%. The shifted probabilities are given in multiples of 5, e.g. 25%-35%-40% slightly favors the above normal tercile. Stronger imbalances in the probabilities may also be given, although the most likely category is rarely given a probability of more than 70%. The probabilities indicate the direction of the forecast as well as the amount of confidence in the forecast. A forecast with complete certainty would have one of the terciles assigned at 100% and the other two terciles 0%. The state-of-the-art in climate prediction never allows for such high levels of certainty. Note that in daily weather forecasting, a 1-day forecast could sometimes be assigned with such certainty (at least when rounded to multiples of 5), such as a 100% probability of below-normal temperature in an arctic air mass that is expected to occupy a region for several days. The lack of certainty in climate forecasts tends to result in more uniform tercile forecast probabilities. Such uncertainty stems from (1) the lack of a one-to-one relationship between SST anomalies and the climate, due to the unpredictable day-to-day weather (storms, fronts, etc.) related to internal atmospheric processes, and (2) the imperfect forecasts of the SST. Note that during some regions and seasons, there is little or no recognizable impact of the SST upon the climate. The user of the IRI’s forecasts should use the probabilities shown on the maps, and not simply regard the most likely tercile as the main content of the forecast. The user should be particularly aware of the forecast probabilities of the non-favored terciles, as these probabilities are never small enough to disregard.
In addition to maps showing the probabilities of the terciles, forecast maps are shown for the probability of extremely above or below normal precipitation for the first target period. Here "extreme" means in the lower or upper 15 percent of the climatological distribution. Three levels of increased risk are defined: slightly enhanced risk, enhanced risk, and greatly enhanced risk. For slightly enhanced risk, there is a 25-40% probability that precipitation will be within the indicated extreme, i.e. wet or dry. This represents an approximate doubling of the climatological risk of 15%. For enhanced risk, there is a 40-50% probability that precipitation will be within the indicated extreme. This represents an approximate tripling of the climatological risk. For greatly enhanced risk, the probability that precipitation will be within the indicated extreme exceeds 50%, i.e. the indicated extreme is the most likely outcome.
The IRI’s global forecasts are verified with respect to the observations. The observed climate anomalies are posted alongside the forecasts on the forecast pages of IRI’s web site: http://iri.columbia.edu/climate/forecast/net_asmt/. It should be noted that these observed maps use data that become available in "real time" and have not yet been checked carefully. While this preliminary data set is usually fairly accurate, revisions are likely during the following few months and a small set of these could be significant. Additionally, one component of the precipitation verification data is satellite estimates, which usually differ from local ground truth gage-measured rainfalls.
The skill levels of IRI’s seasonal predictions are difficult to define over as brief a period as they have been issued. A typical variation of skills by location, season, and lead time would require more than a handful of years. Nonetheless, skill scores may have some meaning when averaged over all seasons and/or over large continents. Several skill scores have been computed for the Net Assessments issued through 1999. An easily understood skill score is the Heidke score. The Heidke score is a measure of categorical hits vs. misses, where a hit is defined as an observation in the same tercile as that which was forecast with the highest probability. Note that this is a misleading interpretation of a probability forecast, as it picks a single category to represent the forecast even though the forecast probability of that category was far less than 100% and was often less than 50% (but more than 33%). There is a 33% chance of a hit in the absence of any skill. When this number of hits occurs, the Heidke skill score is defined to be zero. When all forecasts are hits, the Heidke score is defined as 100. Intermediate skill levels are defined as linear interpolations between the chance expectation and all hits. Negative scores are possible, and occur when the number of hits is lower than would be expected by chance. A more refined skill score, called the rank probability skill score, takes the forecast probabilities into account, and gives bigger rewards for strong probability forecasts that verify than weak ones that verify. It also penalizes more strongly for incorrect forecasts that had been issued with high probability. Because of the weak to moderate probability shifts of most climate forecasts, rank probability scores are not expected to be high, even in the case where most of the verifications agree with the forecast category that had highest probability. This is an accurate reflection of the state-of-the-art in climate forecasting. Eventually, the ROC (relative operating characteristics) score will also be computed. The ROC score is also intended for probability forecasts rather than deterministic forecasts. In order to be meaningful, the ROC score needs a considerably longer period of forecasts than the IRI has presently. A large sample is necessary because the score describes the rates of hits versus false alarms for a set of different forecast probability thresholds defined to represent an affirmative forecast for a category. For example, any forecast of more than 40% might at first be taken as a "yes" forecast for the given category, and those forecasts would be compared with the frequency of the category’s occurrence in the observations. Then the threshold would be raised to 50%, then 60%, and so forth. These thresholds may be thought of in terms of the implications of the forecast to various users, some of which are alarmed very easily (e.g. at the 40% probability level), and some of which only worry with a much higher probability (e.g. 70%) of a the categorical outcome. The set of forecasts with any given probability threshold is much smaller than the set of all forecast cases, so several years of forecasts is inadequate for this detailed type of skill diagnostic. At least 10 years might begin to supply the needed numbers of cases. The rewards of the ROC score are great, however, since this score can provide a detailed set of calibrations for the forecast system, in which systematic forecast errors could be identified and corrected.
For the Net Assessments for the first season from Oct-Nov-Dec 1997 through Oct-Nov-Dec 1999 (a total of 9 forecasts), the global average of the Heidke skill score for precipitation was 12. During that period the lowest global average Heidke score was 4, for the Jul-Aug-Sep 1998 Net Assessment, and the highest was 16, for the Apr-May-Jun 1999 and the Oct-Nov-Dec 1999 Net Assessments. Thus, the scores have all been positive, but modest. In the tropics (25 degrees latitude or less), precipitation scores have been higher, with an average score of 17, lowest score –1, and highest score 34. For the second season, the global and tropical mean scores for this period were slightly higher, at 13 and 19, respectively. This is considered accidental, as scores are expected to decrease as lead forecast time increases. The average percentages of area given non-climatology precipitation forecasts for the first and second lead times for the globe during this period were 47% and 31%, respectively. One possible reason for the slightly higher skill score for the longer lead might be that the forecasters assigned non-climatology forecasts to too great an area for the first lead time, but not the second lead time. An equally likely explanation is that with only 9 forecasts, the mean differences in scores by lead time are insignificant, and will not continue this way as the sample size of forecasts increases. Despite that the mean global skills were modest, skills for some regions were much higher. For example, for southeastern South America (e.g. Uruguay and vicinity), mean skill for the first season was 50, while in eastern Africa it was 41. The score in northeastern Brazil was 29, and in all of Africa and in Indonesia it was 26. Because all of the above areas are known to be affected by ENSO, and ENSO was active during the period, it is not surprising that the precipitation forecasts there had skill.
In general, climate forecast skills are higher for temperature (averaging approximately 20 globally) than for precipitation, higher in the tropics than in the extratropics, and, within the extratropics, higher in winter than in summer. Globally averaged skills are higher during warm and cold ENSO episodes than in neutral ENSO situations. In many parts of the world, climate forecast skills are positive but not high. This means that small probability tilts away from 33%-33%-33% are typical, such as 20%-35%-45%, where the above normal tercile is favored but still less than 50% likely. In some situations a much more confident forecast can be made, such as the 10%-30%-60% above-normal precipitation forecast that was made for western equatorial South America (coast of Ecuador and Peru) for January-March 1998 during the strong El Nino.
4. The probabilistic IRI forecast product - what are terciles and why do they matter?
Terciles are three ranges, or intervals, of values of a variable (e.g. precipitation or temperature) that are defined to describe the lower, middle, and upper thirds of the climatologically expected distribution of values. For example, let us consider the total rainfall in Pretoria, South Africa in the 3-month period of January-March. Suppose we use the recent 30-year period of 1971-2000 to gather observed rainfall totals for this season, and examine them. To help determine the three tercile ranges, we order the rainfall data from the highest to the lowest values—i.e. we rank the data. The highest value is ranked 1, the second highest 2, and so forth. The highest third of the 30 values, i.e. the highest 10 values, span the range of the upper tercile. The middle 10 values (the 11th to the 20th-ranked values) span the range of the middle tercile, and the lowest 10 (ranks of 21 to 30) values span the range of the lower tercile. To fully define the terciles, all that is really necessary are two boundaries: the borderline between the upper and the middle tercile, and the borderline between the middle and the lower tercile. The upper limit of the upper tercile and the lower limit of the lower tercile are not necessary, because amounts higher than the highest observation or lower than the lowest observation would still be placed into the upper or the lower tercile, respectively. In our present example, with 10 cases forming each tercile, the boundary between the lower and the middle tercile would be defined as the average of the highest observation in the lower tercile and the lowest observation in the middle tercile – that is, the average of the 20th and the 21st ranked observations. This eliminates the gap between the two observations (assuming that they are not tied) by halving the difference. In similar fashion, the boundary between the middle and the upper tercile would be the average of the observations ranked 10th and 11th. With the two borderlines defined, any forecast or future observation can be classified as being in one of the three terciles. Note that if a different 30-year period had been used for the representative period to describe the climate at Pretoria, then the ranges would typically turn out to be slightly different. This brings out the fact that the tercile boundaries are only statistical estimates of the true underlying tercile boundaries. When the number of cases is not evenly divisible by 3, then the determination of the tercile boundaries using the above-described ranking technique would be modified slightly, but would still be precisely determinable.
Some researchers choose to determine tercile boundaries using techniques other than ranking. These alternate methods often involve fitting the data to a statistical distribution such as a Gaussian (often used for temperature) or a gamma (often used for precipitation) distribution. The advantage of such techniques is that they conveniently circumvent the problem of random irregularities in the distribution that can affect the exact boundary outcomes resulting from ranking. For example, sometimes there are gaps or clusters of values that are related only to the fact that the sample size of 30 is not large enough to remove random irregularities. It should be noted that some irregularities may not be random, but rather physically based. If increasing the sample size does not largely eliminate them, this would provide evidence that they are not random. However, in many instances we do not have the luxury of a much larger sample and must proceed with 30 or sometimes even fewer cases. The disadvantage of statistical fitting techniques is that they usually carry underlying assumptions, e.g. for the Gaussian and gamma the underlying population distributions are assumed to fit these theoretical structures. However, when the fit is satisfactory, the distributions can be very helpful.
Forming terciles for temperature using a Gaussian distribution fit.
The interannual variability of temperature is well fit by a Gaussian distribution for locations not near coastlines or important terrain features like a large mountain range, either of which can produce physically-based irregularities in the distribution. Here the procedure for using a Gaussian model to fit temperature will be described, following a warning about using this for non-Gaussian distributions such as precipitation. A Gaussian distribution is not a satisfactory fit for precipitation when there is a positive skew in the precipitation data—i.e. when the highest values are farther above the mean than the lowest values are below the mean, and where the middle-ranked value (the median) is substantially lower than the mean. Three-month total precipitation is Gaussian only in very wet climates, or when very long totaling periods (e.g. >2 year totals) are used. Otherwise, alternative distributions such as the gamma distribution, or a mathematical transformation applied to the original precipitation data to make it Gaussian, can be accomplished for fitting precipitation. For a variable thought to be approximately Gaussian, such as temperature, the procedure of defining the terciles using a Gaussian fit is straightforward. Suppose that one uses a 30-year base period and hence has 30 temperature values for a given season and location. First, the mean is calculated. Calculation of the standard deviation follows. The standard deviation measures the amount of variability of the individual cases about the mean. It is calculated by subtracting the mean from each case (giving the difference, or departure, from the mean), squaring the departure, adding all of the departures, dividing the sum by the number of them (30 in this example), and finally taking the square root. The critical tercile boundaries are then, for the lower/middle tercile boundary, equal to the mean minus 0.43075 times the standard deviation, and for the middle/upper tercile boundary, the mean plus 0.43075 times the standard deviation. For example, if the mean is 30 degrees and the year-to-year standard deviation is 4 degrees, then the two tercile boundaries are 28.28 and 31.72. The factor 0.43075 is used because this fraction of one standard deviation below and above the mean marks the 33.33 and the 66.67 percentiles of the Gaussian distribution, respectively, as seen in tables showing the areas under the Gaussian probability density curve. In computing the standard deviation, the sampling-based irregularities encountered when using the ranking method are in effect smoothed, because the standard deviation accounts for the values of all observations collectively, and does not take the pattern formed by ranking the data as "literally" as is done in the ranking method. It should be noted once again, however, that in cases where there is a physical reason to expect an irregular temperature distribution (e.g. on the leeward side of a large mountain range, where the presence and absence of chinook conditions could cause an asymmetric and/or two-peaked frequency distribution), the ranking approach would likely produce the most realistic tercile boundaries despite its sampling problems.
Terciles are used to represent three broad sectors of the probability distribution that are equally likely, climatologically. Recall that for each location and season, the terciles correspond to actual temperature or precipitation ranges, based on the set of historical observations. In using tercile forecasts, users need to know the ranges (i.e. the two main cutoff values that define the terciles) to which the terciles refer for the location/season of concern. These are given on the web, usually on a map other than the one showing the probability forecasts since all this information would be too much to post on one map. Without any forecast clues, the probability that any of the three outcomes will occur is one-third, or 33.3%, which means that if the situation could be "rerun" many times, each outcome would occur one out of three times. However, with forecast clues, such as the presence of an El Nino, a La Nina, or other climate event, the probabilities of the terciles might no longer be equal, so that the probability of one (or two) of them would be greater than 33.3% and the remaining one(s) less than 33.3%. This deviation from the climatological 33.3-33.3-33.3% represents a forecast, because it suggests increases and decreases in the likelihoods of occurrence of terciles relative to the likelihoods reflected in the long-term observations. Forecasts are expressed in terms of the likelihood of terciles because of the typically large amount of uncertainty in the forecasts. This uncertainty makes the forecasting of exact temperatures, or amounts of precipitation, misleading, since large errors are often likely. (Such errors would not be as large, however, as the errors that would result from random guessing, or from always forecasting the climatological average.) The use of tercile probabilities provides both the direction of the forecast relative to climatology, as well as the uncertainty of the forecast. For example, suppose a forecast calls for precipitation probabilities of 20% for the dry tercile, 35% for the middle tercile, and 45% for the wet tercile. Since the wet tercile is above 33.3% and the dry tercile is below 33.3%, this forecast suggests that above normal precipitation is more likely than it usually is, and below normal is less likely than usual. Note, however, that there is much uncertainty implied in the forecast. Even though it is in the direction of above-normal precipitation, the probability for above normal precipitation is still less than 50%. And the probability of below normal precipitation is still 20%, implying that one time out of 5 cases of this climate situation, below normal precipitation would be expected. It is clear that even though this forecast shows a tilt of the odds toward wetness relative to the climatological probabilities, there is much uncertainty in the outlook.
The use of terciles is only one way out of many to express forecasts that have appreciable uncertainty. The use of three equally likely categories could be replaced by the use of 2, 4, 5, 10, or any number of equally likely categories. The use of unequally likely categories is also possible. However, in any of these cases the same kind of information would be conveyed as when using three categories. For example, if 9 equally likely categories (naniles) were used, the climatological probabilities would be 11.1% for each of them. In the case of the forecast example used above, the 20%-35%-45% tercile probabilities might be expressed in a nanile forecast as 5%-7%-8%-10%-12%-13%-15%-16%-14%. The nanile forecast gives basically the same forecast information as the tercile forecast, except that each tercile now has three subsectors of its own, and the probabilities are allocated accordingly. Because the state-of-the-art in climate forecasting is not yet sufficiently refined to be able to confidently define irregularities within each tercile, the forecast distribution using naniles would generally be expected to be just a smoothly interpolated version of the tercile forecast. That is, no additionally resolved features of the probability distribution ("bumps") would be expected to appear in the nanile forecast. The overall message of uncertainty would remain the same in naniles as in terciles, showing a general preference toward higher than normal precipitation but yet a non-negligible probability of below normal precipitation. However, there is something to be said for a nanile, or even higher-order categorical forecast. Because different users are concerned with different portions of the distribution, a more highly resolved forecast format would be preferable even merely as a smooth interpolation of what is expressed in terciles. Taken still farther, the probability forecast could be expressed in a format such that the probability of any interval could be extracted, not just the pre-set tercile intervals. For example, 400 tiny-interval categories could be defined that would allow probabilities to be accumulated between any two cutoffs desired by various users of the forecast.
As an alternative to using climatologically equally likely categories like terciles, a probability forecast could be expressed as a confidence interval with respect to a lower and an upper limit of temperature or precipitation. For example, consider the temperature climatology discussed above, with a mean of 30 and a standard deviation of 4. If a Gaussian distribution is assumed, then there is climatologically an 80% probability that the temperature will fall within the interval defined by 30 - 1.216 times the standard deviation of 4, to 30 + 1.216 times 4. This interval ranges from 24.87 to 35.13. An 80% confidence interval could be given for the forecast as well, to be compared to the 80% climatological confidence interval. The forecast interval might be displaced toward warmer or colder temperatures, and also might span a somewhat narrower interval than the 10.25-wide climatological interval. For a non-Gaussian distribution such as precipitation, confidence intervals are also possible, and would require a way of estimating the climatological distribution other than through a Gaussian fit. As discussed above, a ranking method or a fitting method could be used to estimate the climatological distribution. If a fitting method were chosen, either the data would be transformed to become symmetric (non-skewed) and then Gaussian assumptions accepted, or a model that accommodates skewed distributions (such as the Gamma) used. Once the percentile points of the distribution are known, confidence intervals can be constructed.
Note that confidence intervals are expressed in values of the actual temperature or precipitation, rather than in terms of probability values of the predetermined climatological terciles of those variables. This means that maps showing the upper or lower limits of confidence intervals might show great spatial variations, depending on the geography of the region being shown (e.g. especially in and around major terrain features or near coastlines), in similar fashion to maps showing the climatological value. This can cause display problems where spatial gradients are very sharp, and look-up tables may be necessary where maps do not suffice. Maps showing tercile probabilities are comparatively free of these major discontinuities, but have the disadvantage that users ultimately need to know the implied actual values, in physical units, of the variable of interest—temperature in degrees C or F, and precipitation in millimeters or inches. The maps of these tercile cutoff values would have large spatial variations.
One might wonder whether, and how, forecasts that have only modest levels of skill can be used beneficially in decision making. The answer to these very valid inquiries, first, is that they can indeed be used beneficially, when used very carefully. If used inappropriately, they may not be beneficial and in fact could be detrimental, at least in the short term. The economic benefits from using the forecasts properly should be expected to accrue over an extended time frame, and to accrue somewhat irregularly, rather than to appear immediately and to occur uniformly from one year to the next. This expectation follows from the fact that climate forecasts are expected to be incorrect some of the time, but to be correct more frequently than incorrect. (By "correct", we mean that the forecast verified at least in terms of the direction of the deviation from climatology, or in terms of some other simple quantifiable criterion such as the size of the error compared with the size expected on average from random forecasts, or from climatological forecasts.) Because of the forecasts’ uncertainty, decisions should be made cautiously, but using criteria that are consistent over time, in accordance with the probability anomalies that are forecast. The cost of taking precautions must be weighed against the savings that the precautions would bring if the unwanted climate event occurred. These judgments can be made using the probabilities supplied in the forecasts. Because some businesses cannot necessarily count on existing for the long time frame needed to be sure to benefit from using climate forecasts, a major consideration is the short-term consequences of a loss during the coming individual season that would be caused by a forecast that does not verify. If such a loss would result in an unacceptable probability of bankruptcy, then the long-term use of the climate forecasts would need to be modified accordingly. One strategy for doing this would be to act on the direction of the forecast but replace the given probabilities with more conservative probabilities—i.e. ones that more weakly depart from climatology. A detailed cost-benefit analysis, which would differ greatly from one business to another, would yield decision recommendations, depending on all relevant potential rewards and penalties. Such an analysis is strongly encouraged, as it is a vital part of the task of climate forecast applications.