Understanding the Probability Density and Probability of Exceedance
Forecast Graphs, and Using the User-Specificed
Probability Look-up Utility and the
Quantile Table

Why Probability Forecasts are Necessary
Climate predictions for the upcoming few seasons are expressed in terms of probabilities, because the uncertainty is too large to forecast a single number or a narrow range. For any given location for a given season of the year, without any knowledge of the climate situation for the current year, a  probability distribution for the expected climate can be formed based on the observations over a historical period. These observations describe what is typical, what is less typical but possible, and what is extremely unlikely. The probability distribution coming from the historical observations is known as a climatological probability distribution. For example, in the city of Iloilo, on Panay Island in the Philipines, there may be typically a 50% probability that the rainfall during the period of May through July will total 658 mm or more, and a 20% probability that it will be greater 868 mm. Any rainfall amount would have its typical probability of being exceeded.  A climate forecast, by contrast, can give a probability distribution of the same kind, except that it takes into account information for the current year. The distribution may be shifted from the above-mentioned climatological probability distribution--i.e. the probabilities may differ than the "typical" probabilities. For example, a precipitation forecast may indicate that drier than normal conditions are more likely than usual for the coming season, and that wetter than normal conditions are less likely than usual. The probabilities given in the forecast would tell us how much more likely dry conditions of any severity level are than they would be on average, and how much less likely wet conditions of any level are than they are on average. In some cases, there are no useful hints about the upcoming season, and the forecast would simply be the climatological distribution. Although this may sound useless, it can still be helpful because not everyone knows what the climatological distribution covers--i.e. what rainfall amounts are possible (e.g. have ever occurred), versus what amounts are practically impossible. Furthermore, when an actual probability forecast is the same as the climatological probability distribution, it becomes this way only after a thorough examination is completed, and is therefore an informed outcome.

It is not easy to express a probability forecast such as that described above, because any interval of precipitation amount, or range, would have its normal (or climatological) probability, and would also have its probability as given by the forecast for the particular upcoming season of interest. How do we list the different possible precipitation intervals, and how many intervals should be listed? How small or large should the intervals be so that they will be relevant to the most people? Furthermore, we have been discussing just one particular location or station. There are many possible locations. Can a map be produced that would summarize the shift in the probability distribution over an extended area, such as over all of Africa? In this product, we focus on one location at a time. Other products use maps to show the general direction of the forecast over large areas. When one location is chosen, a greater amount of detail can be provided about the forecast probabilities.

This set of products has four parts. Two of them are graphs that are highly related to one another. One is called a probability density graph, and one is called a probability of exceedance graph. Each shows the nature of the probability forecast for a given location and season in a complete and general way. Another part is a quantile table that gives probabilities of different portions of the historical range of possibilities. The final part is a probability look-up utility that provides the probability of any user-supplied range of precipitation amount, and compares that probability to the climatological probability of the same interval based on the historical record. These four product parts will be referred to in the material presented below.

Meaning of the Two Graphs
The probability density and probability of exceecdance graphs both have the value of a climate variable on the horizonatal axis. In our case, the climate variable is 3-month precipitation total. On the vertical axis, probability is shown. Probability is expressed either as a percentage, ranging from 0 to 100, or as a fraction, ranging from 0 to 1. So both graphs say something about the probability for a value of precipitation, ranging from precipitation values lower than have ever occurred historically (but never lower than zero) to values higher than have ever occurred. How the precipitation amount and probability are related to one another differs between the two graphs. Whether the graph is one of probability density or probability of exceedance can be told by the label on the vertical axis on the left side of the graph. The difference can also be seen quickly by the features of the curves they contain, to be discussed next.

The probability density graph has a curve that begins at low values for very low values of precipitation, rises to a maximum point, and then declines back to lower values and finally to zero for very high values of precipitation. The curve shows the relative liklihood that the precipitation amounts shown on the horizontal axis will occur. Amounts that correspond to the part of the curve having highest values have relatively highest probabilities, while amounts that correspond to low values of the curve (e.g. in the left or right tails) have lower probabilities of occurring. The curve may not go all the way down to zero on the left side, if some years have no prepitation at all for the given season. This would imply that the probability of getting no precipitation is not zero, but is some higher probability. 

The probability of exceedance graph has a curve that usually begins at 100% for very low values of precipitation, and declines through intermediate levels toward zero, reaching zero at very high precipitation values. This curve is a backwards-cumulative form of the probability density density curve. (It would be a forward cumulative form if the curve increased, rather than decreased, from left to right.) Its downward slope is steepest where the value of the probability density curve is highest. The probability of exceedance graph gives the probability that the precipitation quantity shown on the horizontal axis will be exceeded. The probability density and the probability of exceedance graphs have a one-to-one relationship with each other. That is, any probability density graph has only one probability of exceedance graph that corresponds to it, and vice versa. (A technical note: The probability of exceedance graph shows what percentage of the area under the probability density graph lies to the right of the value on the horizontal axis.) Which type of graph is preferred depends on one's purpose. Some purposes are met by either graph, and in those cases the preferred graph may depend on personal preference. Both types of graph will be incorporated into most of the discussions to follow.

In both graphs, the black curve represents an estimate of the climatological distribution, based on the recent 30-year period of observations in effect. The climatological distribution shows probabilities for precipitation amounts without any knowledge of the current climate situation. These probabilities simply indicate the range of probabilities based only on the season and location. On the other hand, the red curve shows the forecast probability when taking into account the available current climate forecast information. By examining the relationship between the red and black curves, one can compare the forecast for this year with the climatological average for the station and season. For example, a weak tendency toward drier than normal conditions would be shown if the red curve lies slightly to the left (toward lower precipitation amounts) of the black curve. Sometimes the red and black curves may coincide. This would mean that there are no recognizable current indications that the climate will deviate in any way (in either direction, or in terms of the size of the range of possibilities) from its climatological distribution. This would not be a forecast for near normal (or average) conditions, but rather a forecast for equal chances for anything to happen that has happened during the 30-year climatological period, and with the same probabilities as indicated by the observed relative frequencies over that 30-year period. A forecast showing a specific preference for near-normal conditions would have red and black curves that do not coincide. In such a forecast favoring near normal precipitation, in the density graph the red curve would be narrower than the black curve and would have a higher maximum value than the black curve. Additionally, the red curve's maximum would correspond to a precipitation amount that is near that indicated by the black curve's maximum. In the probability of exceedance graph, a forecast with a preference for near-normal conditions would have a red curve with a steeper slope than the black curve's slope near the middle part of the distibution, and the two curves would intersect at or near that middle part.

Using either graphical form, it is possible to determine the forecast probability that the precipitation amount will be between any lower limit and upper limit. This probability can then be compared with the climatological probability, or the historical average probability for the precipitation being between the same two limits. A utility to automatically compute these is provided here, as will be discussed below. 

The information shown on the graphs is consistent with the information given in the forecast maps of the IRI's seasonal forecasts that have been issued since late 1997. Those forecast maps show the probabilities of the three tercile-based categories with respect to the climatological distribution: below normal, near normal, and above normal. A minor difference is that the probabilities on the forecast maps are rounded to the nearest 5%, while the probabities on the graphs are not always rounded. The graphs shown here provide additional detail about the forecast probability distributions at individual stations, through provision of the entire probability distribution as opposed to just the probabilities of the three climatologically equally likely tercile categories. Also, the direct linkage to the precipitation amounts (cm) on the horizontal axis is a convenience not provided directly on the standard forecast maps (although other maps are available that show the tercile category boundaries). Providing the entire distribution enables users to find probabilities that precipitation will be within any self-selected limits that are of interest to them, and not just categories that are pre-defined by the forecasters.  

What Each Curve Means
Each graph contains four curves. One of the four is actually a set of two curves. This section explains the curves in the graphs. An example of interpreting a particular set of graphs, for Bangkok, Thailand, is provided later.

The first curve, shown in black, shows the "normal", or climatological, probability distribution on both the probability density graph and the probability of exceedance graph. Sometimes the black line appears dotted, and coincides with the dotted red (forecast) curve, indicating that the forecast is simply the climatological distribution. The climatological distribution is derived by computing the average, and also computing a measure of the degree of year-to-year deviations from that average (called the standard deviation). The curve is therefore called a "fitted" curve, because it is defined using a formula making possible the construction of a smooth curve to represent the 30 years of observed data. The data itself may not be smooth and regular, but the formula uses only the average and the standard deviation to define the curve, disregarding much of the finer detail about the spacing of the observations. The advantages and disadvantages of disregarding these details will be be discussed below. The center of this distribution (such as the average [the mean] or the median) is based on the observations over the 30-year base period--the period of 1972-2001 in our case. The value of the center of the distribution, or normal, is printed numerically in the top left portion of the graph.  For precipitation, whose distribution is often not symmetric with respect to the size of the deviations on one side versus the other side of the average value, the value at which the cumulative (probability of exceedance) curve crosses the 50% line (the median) is considered a better representation of the typical precipitation amount than the mean. The mean is often greater than the median, and more individual cases fall below the mean than above it. The mean precipitation is higher than the typical amount because there are typically a few extremely wet cases that are farther above the mean than the driest cases are below it, and these very wet cases bring the mean to a level higher than more than half  of the cases. This is a key feature of a positively skewed distribution. Because of these features, the mean is not used to represent the normal. Although the median does a better job representing the typical precipitation amount, an even more effective measure of the center of the distribution, to be called the center, will be described below.

The second curve, shown in yellow, labeled "observed data", is a probability density curve (or a probability of exceedance curve) derived directly from the observed data without any fitting to make a smoother curve. These curves, which contain a series of right-angle steps, are sometimes called raw data representations, and show the distribution of the 30 precipitation amounts observed over the 30-year period. The raw probability density curve is a frequency histogram that shows what percentage of the years in the base period had precipitation falling between certain pre-defined amount intervals. With only 30 years in the period being described, it is likely to have very marked irregularities (gaps in some places, clusters in other places) as compared with the smoother, fitted probability density curve. When its shape very roughly resembles the shape of the fitted density curve, it may imply that with a much larger sample of years (e.g. over 300 years having the same climate) that the irregularites would smooth out and the resemblance to the fitted curve would improve greatly. With only 30 years, irregularities are expected due to chance associated with inadequate sampling. When there are extremely gross differences between the raw and fitted curves, one may suspect that the fitting process may not yield a black curve that is representative of the true probability density for the station. This suspicion can be confirmed with greater certainty by looking at the raw probability of exceedance curve.

In the probability of exceedance curve, sampling irregularities tend to cancel themselves out as one procedes along the yellow and black curves from left to right, due to the accumulation of irregularities of opposite sense (gaps versus clusters). The yellow probability of exceedance curve steps down every time an observed datum no longer exceeds the value shown on the x-axis. Because there are 30 years in the base period used for this stepped curve, each step represents a 3.33% (1/30) drop in the probability of exceedance. This curve is displayed so that the user may observe and judge how good a fit the smoothed (black) climatological curve is to the actual data. The fitted curve in both the probably density and probability of exceedance graphs is based on a Gaussian distribution (based only on the mean and standard deviation), after using a flexible power transformation to eliminate the asymmetry (skewness) of the original precipitation data. (More is said about the fitting of precipitation near the bottom.) In the probability of exceedance graph, the yellow curve based directly on the data is expected to be somewhat irregular, with gaps in some places (level portions) and clustering in others (steeply downsloping portions) due to the short sampling period. If the same number of observations were sampled from an earlier period and the underlying climate were identical, the places having gaps and clusters would be expected to change randomly. When the irregularities are changeable from one sample to another and have equal chances of appearing anywhere in the distribution, the smooth fitted climatological curve is thought to estimate the true population distribution better than the curve formed from any single sampling of the data. However, in some cases there may be a physical reason for deviations from a smooth distribution, such as proximity to sharp terrain (mountain versus valley) features or an upwind coastline. In such cases, sampling a very large number of years of data would not eliminate these deviant features. However, these features would be expected to appear somewhat more smoothly (less "noisy" or jumpy) than features caused purely by sampling variations. For example, a tendency for a plateau of shallow slope might appear near the middle of the "probability of exceedance" distribution, where the steepest slope is usually found, or a steep slope might be found off the center of the distribution. It is believed that in most cases the fitted curve is a better representation of nature than the raw data curve--that most of the irregularities in the raw data curve occur by chance alone, and would not appear if a much larger set of cases were able to be used.

The third curve, shown in red, labeled "final forecast", represents the probability distribution of the IRI's final forecast. Within roundoff error, this curve is consistent with the forecast probability maps with which it is associated. Thus, the probability indicated for the most favored tercile in this product should correspond to the probability anomaly shown in the probability maps. When the maps indicate climatological probabilities (a white area on the map), this product displays a final forecast curve that coincides with the normal curve (and both curves appear dotted). The final forecast curve incorporates all information leading to the forecast, including (if applicable) the ENSO state and other sea surface temperature features, and including gradual trends. When the normal and final forecast curves do not coincide, on the probability of exceedance graph the downward slope of the final forecast curve is often slightly steeper than the slope of the climatological curve, in proportion to the confidence associated with the final forecast. On the probability density graph, confidence is reflected in a narrower peak with maximum probability value that is higher than that of the normal curve.  (Several aspects of the forecast confidence are indicated in each graph, to be described below.) It is also possible for the peak to peak at about the same height as the climatological curve, but to be shifted in one direction from the climatological curve. An increase in height and narrowness of the forecast curve relative to climatological curves exists because when a forecast is thought to be relatively skillful (as for example when there is a strong ENSO event in progress and the location is one having a known, specific ENSO-related climate effect), the range of possibilities is smaller than if no useful forecast knowledge were available. The decrease in the uncertainty shows up as a narrower range of precipitation values within which the probability of exceedance changes by a given amount; hence the steeper downward slope of the forecast curve in the probability of exceedance graph. In some cases there is a shift of the forecast curve relative to the climatology curve, but without a steeper slope in the forecast curve in the probability of exceedance graph (or equivalently, without a taller, narrower peak in the probability density graph). This would indicate some confidence in the shift away from the normal, but without a decrease in the range of possibilities in the shifted climate. This could occur, for example, when a trend, or climate change relevant to the present decade as a whole, is believed to be occurring, but the variability from one year to the next is not pinned down with much confidence.

A fourth pair of curves, shown by thin red lines, represents an "error envelope". It is drawn on either side of the main final forecast curve, closely paralleling that curve, on either of the two types of graph. These lines are an estimate of the range of possible error associated with the forecast curve. The forecast curve itself already conveys uncertainty about the forecast; this is why it is shown in a probabilistic framework. In addition to this inherent uncertainty, there is also some additional uncertainty related to other aspects of the forecast. Examples of these additional error sources are (1) errors in the forecast models' indicated probabilities and in the forecasters' perception, judgement or understanding of the current climate state, (2) small errors in the most recent observed data used to determine the forecasts, and (3) imperfections in the fit of the climatological and forecast distributions to the population of actual data , the latter being unobtainable without a sample much larger than 30 years. All of these factors contibute to some error in the positioning of the forecast curve as a whole--known as a reliability error. The first factor above is thought to be the largest contribution to the reliability error. An approximation of this error, reflected in the error envelope, is provided by examination of several years of actual forecasts and their results, and also long histories of retrospective climate forecasts using similar models and methods as those used for IRI's forecasts. In particular, the frequencies of occurrence of observations are compared with the probability forecasts for specific forecast probability values. During most seasons at most locations, the deviation from climatology of the forecast probability for the favored tercile category (i.e. its difference fromf 33%) would need to be 6 to 9 percent in order for the error envelope to exclude the fitted climatological probability distribution curve over a majority of the range of the forecast curve. This implies that probabilities of 40% are just at the borderline of being far enough from the neutral 33% to be considered a reliable deviation, or signal. Displaying the error envelope is believed prudent to underscore the need for caution and conservatism on the users' part. The error envelope is smaller in percentage points at the tails of the forecast distribution than near the middle. However, it is much larger at the tails in terms of percentage of the value of the probability density itself (or, for the probability of exceedance curve, of the difference of the forecast probability from 100% for the left tail, and from 0% for the right tail). This is a reminder that conclusions based on the tails of the forecast curve are dangerous, as for example a statement that the chance of being in the highest 1% tail for a given forecast is 4 times as much as it would be climatologically. Because observations in that part of the distribution are so rare, such conclusions are extremely shaky, to say the least. The true probability of being in the highest 1% tail, rather than being 4 times the average, could easily be anywhere from 1.6 to 10 times the average, just as a possible example. The forecast's error envelope shows that quantitative statements about the probability of a truly extreme event, such as in the upper or lower 3% of the climatological distribution, are highly risky, and should be considered with great caution.

Near the bottom of the graph, below the horizontal "0%" line, the observations of the the last 15 years are shown by the last two digits of the year. (For example, "98" indicates the observation for 1998.) The left-versus-right positioning of the digits indicate the amount of precipitation for the season in question, as given on the horizontal axis. The digits provide information about the climate at the given location and season during recent years. The center of the distribution of the precipitation of the 15 recent years is indicated by an asterisk on the horizontal "0%" line. The purpose of the display is to show how the most recent observations compare with the overall 30-year distribution. In some cases the climate of the recent years may tend to be different from the "normal" shown by the entire curve--perhaps mainly lower or mainly higher. Or, there may be more extremes on both sides of the distribution's center than would be expected. If they appear to be unrepresentative of the normal in any respect, the user may wonder whether the climate to occur for the forecast period will follow the tendency of the recent years. This factor is incorporated into the IRI's forecast, and thus in the forecast curve. In some cases a strong trend is believed to exist, perhaps in association with a corresponding trend in tropical sea surface temperatures, and is clearly reflected in the forecast. In other cases a difference in the climate of recent years may be considered mainly a random occurrence. Some printed descriptive information on the right of the graphs provides some descriptive information about the tendency of the climate in the recent 15 years.

Finding the Probability that Precipitation will be between Two Amounts
(or above an Amount, or below an Amount)

Suppose one wants to calculate the estimated probability of receiving precipitation between a given set of lower and upper limts. This can be done on the probability of exceedance graph by subtracting the probability of exceedance value associated with the larger precipitation amount from the value associated with the smaller precipitation amount. In other words, it is simply  finding out how far the red curve drops between the lower and the upper precipitation values. On the probability density curve, the probability of receiving precipitation between lower and upper limts is less simple to calculate than on the cumulative density curve. In place of subtracting a smaller probability of exceedance value from a larger one, one must calculate the area under the density curve that falls between the two limits, and determine what fraction that area is of the area under the entire curve. This would require a fair amount of time. Even using the probability of exceedance graph, this is not something that can be done very accurately by eye. Therefore, it is recommended that the automatic probability evaluation utility be used, described next.

Automatic Probability Evaluation: In subtracting two probability of exceedance values to evaluate the probability of precipitation occurrence between a lower and upper limit, it is difficult to accurately determine the probability of exceedance visually from the graph. Therefore, the process is automated for users' convenience in a flexible user-specified probability look-up utility. The user selects a location (e.g. station), forecast start time (the month the forecast was issued), a forecast target period (the 3-month period being forecast), and the lower and upper limits within which a probability is to be evaluated. The probabilities are computed for both the climatological and the forecast distributions. Coming soon, a confidence interval on the forecast probability will also be provided, representing the uncertainty that is depicted by the forecast error envelope. Note that when one wants to calculate the estimated probability of precipitation being simply above a given amount, that amount can be used as the lower limit, and a very high amount can be used as as the upper limit. When the upper limit is higher than the amounts shown on the graph, the utility "understands". Probability for precipitation being below a given amount is computed in the same manner.

Quantile Table (for 2 to 10 Climatologically Equal Categories): A simple product that does a job similar to that of the automatic probability evaluation, but usually less conveniently, is a quantile table. Using the quantile table, probability forecasts for a specified location and season are given for different numbers of climatologically equally likely categories: the two halves of the climatological distribution, the three terciles (as also given by the text on each of the two graphs), the  four quartiles, five quintiles, six sextiles, seven septiles, eight octiles, nine naniles and ten deciles. Precipitation amounts forming the borderlines of the categories are shown along with the forecast probabilities. In some cases users may be able to use these "canned" forecast extensions beyond terciles, rather than having to supply precipitation amounts as inputs to the flexible probability look-up utility.

Caution Required for the Tails of the Curves: Each of the curves is constructed on the basis of historical observations, and/or the nature and strength of the impacts of the estimated current and future climate state. Near the middle of the distribution there has been plentiful data sampled, because the middle of the distribution has been observed most frequently. On the other hand, in the tails, or extremes of the distribution, there have only been a few cases. Sometimes there may have been no cases in a large portion of a tail, and just a single observation in the extreme part of that tail. Whatever the exact configuration of the observations, the tails are less certain than the middle and the shoulders of the distribution. Therefore, conclusions based on the extreme tails of the distribution are particularly dangerous, and should be made with caution. For precipitation, the probability curves are based on a flexible power-transformed Gaussian distribution. (More detail is provided about this fitting process at the end of this tutorial.) As such, the shape and length of the tails are based both on the extreme values and on the variability of the values closer to the middle of the distribution. The tails express only a rough guess of the actual extreme value probabilities, and should not be taken literally. A warning about the upper and lower tails of the curves is posted on each graph. The middle 86% of the probability distribution, ranging from the 93% to the 7% probability of exceedance values, is considered to be reasonably well sampled, in contrast with the outer 7% tails.

Numerical Information Printed on the Graphs
In the upper portion of the probability density or probability of exceedance graphs, selected summary information is printed. This information is identical  on each of the two graphs for a given station and forecast. The block of text at the upper left, in black print, shows climatological information for the station and season. The normal (or center) is the typically expected amount, based on the observations during the normal base period as discussed above. The normal would be represented by the mean for a symmetric distribution such as that for temperature at many stations, but is represented by the fitted median for precipitation. The raw (non-fitted) median is shown underneath the normal (center), or fitted median. This is the middle-ranked precipitation amount over the 30-year normal period. It is the average of the 15th and 16th highest amounts (also 15th and 16th lowest amounts). When there is an odd number of amounts, the raw (unfitted) median is simply the single middle-ranked observation. The fitted median is usually considered a better representation of the center of the precipitation distribution than the unfitted median, because the latter is affected noticeably more by sampling variations for a fairly small sample such as 30 years. Underneath the median, the skew is shown. The skew is a measure of the lack of symmetry in the climatological distribution. A skew of 0 would imply a perfectly symmetric distribution, where deviations of given amounts below the normal are observed with  equal likelihood as deviations of the same amounts above the normal. A positive skewness, which is typical for precipiation, would imply the following features: The highest amounts are farther above normal than the lowest amounts are below  normal; the mean is higher than the median; and the properly fitted distribution would have a longer tail on the upper side than the lower side of the distribution. A negative skew would indicate a distribution whose longer tail is found for values below than above the average. Negative skews are occasionally, but not commonly, found for precipitation. Regarding magnitudes of skewness, values within 0.25 of zero (-0.25 to 0.25) are considered negligible, values from 0.25 to 0.5 mild, 0.5 to 1.0 moderate, 1.0 to 2.0 large, and above 2.0 extreme. Precipitation distributions at stations that are climatologically very dry tend to have highly positive skew, since many years have very low (possibly zero) totals, while a small number of years have amounts that may be three or more times the normal. Sometimes a single year with an extreme precipitation amount may cause the skew value to be very high.

In the next block of text to the right, in blue print, information about the forecast is provided. The point forecast is the best guess of the numerical forecast. This best guess is the value that is expected to result in a minimum of the squared errors over a long period of time. The best guess is similar to  a short-range weather forecast, in the sense that an actual value is given. However, for climate forecasts, due to the usually laege amount of uncertainty, much larger errors are expected than would be expected for weather forecasts. Thus, while the point forecast is expressed as an exact number, the certainty that the climate will turn out to be this number is extremely low. That is the idea conveyed by the curves in the probability of exceedance graph, which always descend slowly to the right, and never suddenly. The gradual rate with which the curves decline implies a large uncertainty. In the probability density graph. uncertainty is expressed by the wideness of the curves. Forecasts with high certainty would be represented by a curve having a very tall, narrow peak. The point forecast is only given to represent the middle, or center, of the forecast distribution, and does NOT imply that we can make an accurate numerical forecast. By analogy, when two 6-sided dice are rolled, the midpoint of the distribution of outcomes for the total is 7. However, while an outcome of 7 is more likely than any other outcome, a 7 is expected to occur only 16.7% of the time on average, if both dice are fair. In the case of our forecasts, the probability of an outcome of exactly (to two decimal places) what our point forecast indicates is very small -- usually far less than 1%. Again, it is provided to indicate the center of a large distribution of possible outcomes, the size of which is expressed by the probability density and probability of exceedance graphs and the computations that can be done on their basis. This warning is very important because these forecasts are highly imperfect. The point forecast is given for users who need to use an exact value as input to their application.  Underneath the point forecast, the anomaly is shown, which is the departure of the point forecast from the normal value. Beneath the anomaly is the percentile (%ile) of the point forecast with respect to the climatological distribution. Percentile values lower than 50 imply that the point forecast is drier than the normal amount, and higher than 50 imply a wetter than normal forecast. Because of the relatively large amount of uncertainty in climate prediction, the percentile of the point forecast is usually between 30 and 70.

On the upper right side of the graph, confidence intervals are given for the forecast. The 50% confidence interval gives two precipitation) amounts. The lower amount corresponds to the 25%ile (75% probability of exceedance) of the forecast distribution, and the upper amount corresponds to the 75%ile (25% probability of exceedance) of the forecast distribution. The amounts that fall between these two limits form an interval believed to have a 50 percent chance of occurring. The 90% confidence interval covers a wider range of amounts, ranging from the 5%ile (95% probability of exceedance) to the 95%ile (5% probability of exceedance) of the forecast. The ranges of amounts covered by the 50% and 90% confidence intervals give an idea of the expected error associated with the point forecast. For an unskewed climatological distribution, the confidence intervals are formed by moving an equal distance on either side of the point forecast. When the distribution has a positive skew, the distance upward to the top of the confidence interval is greater than the distance downward to the lower boundary of the confidence interval. The regions below and above the confidence interval limits are considered to use up the remaining probability equally. The ranges covered by the 50% and 90% confidence intervals are typically fairly wide, in keeping with the uncertainty associated with the point forecast. Note that there is some uncertainty for the confidence intervals themselves, just as there is uncertainty for the probability density and probability of exceedance curves themselves (as indicated by the error envelope). The limits of the 90% confidence interval reach into the tails of the forecast distribution, and should therefore be considered as approximations to a greater degree than the limits of the 50% confidence interval.

Forecast Confidence
Directly below the forecast confidence intervals, three measures of confidence in the forecast are posted, both numerically and qualitatively. These are estimates of three aspects of the skill expected for the particular forecast. The first, the confidence in shift direction, is confidence that the climate will deviate from the normal in the direction indicated, without regard to the size of the deviation. The second, the confidence in the point forecast, indicates a narrowing of the forecast distribution in comparison with the width of the climatological distribution, and therefore is confidence related to how close the climate will be to the point forecast. The third, the integrated confidence, is confidence that the probability distribution as a whole will be different from the climatological probability distribution. This third measure is a combination of the first and second measure, since both of the first two are aspects of a difference between the forecast distribution and the climatological distribution. The qualitative descriptions of the confidence levels are, in ascending order: none, low, fair, moderate and high. Each of the three measures of confidence is described in more detail next, enabling the user to decide which measure(s) is most applicable to their needs.

Confidence in shift direction:
This is a measure of confidence that the climate will deviate from the normal in the direction specified, whether toward below or above the normal. The direction of deviation from the normal is positive when the point forecast exceeds the value of the normal, and negative when it is lower than the normal. The numerical value of this confidence measure is the ratio of the estimated probability that the climate will deviate in the forecast direction to the probability that it will deviate in the opposite direction. As such, it is an odds ratio. For example, if the confidence in the shift direction is 2.00, it indicates a belief that there is twice the probability of a deviation in that direction than in the opposite direction. If the forecast direction is below the normal, a 2.00 confidence would mean that the probability of below normal conditions is 66.7% and the probability of above normal conditions is 33.3%. If there is no confidence whatsoever regarding which side of the normal will occur, the ratio is at its lowest possible value of 1. Note that the ratio is not that of the probability of the more favored outer tercile to the opposite outer tercile, but rather a ratio of the forecast probabilities of occurrence of one half of the climatological distribution to the other half. The dividing line between the two halves is the numerical value of the normal (or center of the distribution), as posted near the upper left corner of the graph. When the climatology forecast is issued, the shift direction confidence is at its minimum of 1. It may also be 1, however, for a non-climatology forecast, when there is some confidence that the likelihood of the near normal category is higher than would be expected climatologically--and the chances for above normal and below normal are both equally reduced from the climatological chance. This could occur, for example, in a season and at a location having fairly high sensitivity to the state of the ENSO, in a case when the ENSO condition is expected to be very close to normal (i.e. neither an El Niño nor a La Niña tendency is expected). In such a case, although the chances of large deviations from normal are reduced compared to the case of having no predictability at all, the direction of the shift from normal is just as uncertain as it would be without any predictability. Forecasts for directional shifts may be considered somewhat useful when this confidence measure exceeds 1.5, and more clearly useful when it exceeds 1.8 or even 2 (which is uncommon). It should also be noted that a high confidence in the shift direction usually, but not always, means that the size of the shift is best-guessed to be large, as seen in the the point forecast and its percentile. Exceptions may occur in cases with high confidence in the point forecast (another type of confidence, described below), where the confidence in the shift direction may be high but the predicted size of the shift is only moderate. This is possible because the shift direction refers to any amount of shift in the indicated direction, whether large or small.

Confidence in the point forecast (contraction of forecast distribution): This confidence measure indicates how narrow, or limited, the distribution of possibilities about the point forecast value is believed to be, compared with the distribution of the historical observations about the normal value. Given our current state-of-the-art in climate prediction, confidence in the point forecast is typically small. When the forecast distribution has the same, or nearly the same, width as the climatological distribution, this indicates a relative absence of forecast knowledge that would limit the range of possibilities. The numerical value of this confidence measure, specifically, is the ratio of the width of the forecast distribution to the width of the climatological distribution. Lower values imply greater confidence. When the forecast distribution is no narrower than the observed climatological distribution, the confidence is 1. Confidence values of less than 0.9 are considered somewhat helpful, and below 0.8, while rare, are still more helpful. If the forecast had perfect confidence, such that no point forecast error were expected, this measure would have a value of zero. Forecasts of the time of the next solar or lunar eclipse approach this level of  virtually zero uncertainty. This confidence measure is relatively high in locations and seasons when climate conditions are known to be related to governing forces (such as ENSO), and the status of these forces is able to be correctly anticipated with fairly good certainty for the period being forecast. An example of this would be the precipitation during the northern winter in Florida and other regions in the southern U.S., which is partly determined by the ENSO state, given that the ENSO state itself is somewhat predictable for forecasts made after the preceding summer. Another example would be precipitation in northeast Brazil during the March-May rainy season, which is also determined by the ENSO state as well as the sea surface temperature anomaly pattern in the tropical Atlantic Ocean. In such forecasts, the possibilities for precipitation are somewhat more limited than they would be with no knowledge of the influence of ENSO or what the ENSO state would likely be during the future period being forecast. This particular confidence measure is not necessarily related to the amount of shift of the point forecast from the normal; rather, only the width of the forecast probability distribution about its own central value (the point forecast) is relevant here. Therefore, forecasts that are close to the normal still may rate highly on this confidence measure. On the other hand, in some cases there may be a noticeable shift of the point forecast from the normal, but little or no narrowing of the distribution. This could occur, for example, when there is a gradual, long-term trend that is used in determining the forecast, but when there is little or no information about differences between the climate this year and the last few years for the same season. In that case, all recent years would be affected by the general trend approximately equally, but their large year-to-year differences related to factors besides the trend are poorly forecast.

Integrated confidence: an integrated distributional difference from climatology. This is a measure of the estimated totality of all differences between the forecast distribution and the climatological distribution. It includes both distributional shifts and narrowing as discussed in the context of the two confidence parameters described above. It would also include distributional deviations of other types that may prove to be possible to predict in the future, such as a widening of the distribution (e.g., as related to an expectation of greater than normal intraseasonal variation), or asymmetric or irregular features of the distribution as may be related to specific climate conditions in certain geographical locations (e.g. involving terrain, or land vs. water). This measure, specifically, is estimated as the total of the differences in probabilities of exceedance between the climatological distribution and the forecast distribution over the 11 points on the climatological distribution corresponding to its 0.98, 0.90, 0.80, 0.70, ....., 0.20, 0.10, and 0.02 probability of exceedance values. This sum of the differences is then scaled with respect to the result that would be attained when the forecast distribution is completely separated from the climatological distribution. In the case of complete separation, the climatological probability of exceedance remains at 1 (or 100%), or at 0 (or 0%), while the forecast distribution moves through all of its intermediate values. Complete separation, which is unattainable given today's state-of-the-art in climate prediction, would produce a integrated confidence score of 1, while a total absence of separation (as in the case of the climatology forecast) would produce a score of 0. Integrated confidence values of 0.2 are considered moderately useful by today's standards, and values of 0.3 would be clearly useful. In examining the integrated confidence values that accompany the graphs, it becomes clear that distributional shifts tend to account for the majority of the integrated confidence value, while distribution narrowing usually contributes to a much lesser degree. This characteristic implies that occurrences of strong climate forcing conditions, whether related to ENSO or other sea surface temperature anomalies, strong decadal trends in progress, or other factors, represent "forecasts of opportunity", and that forecast skill (and utility) are not constant from year to year for a given location, season and lead time. Of the three confidence measures discussed here, the only one that remains nearly constant from year to year is the confidence in the point forecast, reflecting the narrowness of the forecast distribution relative to the climatological distribution. From a practical standpoint, the shift of the forecast distribution from its normal position may be more important to users than its narrowness. This becomes clearer when one considers, for example, that if there is a high probability for abnormal seasonal wetness, the actual amount of observed precipitation (and its deviation from what was forecast) may be less important to a user than the fact that the precipitation amount was correctly forecast to be above the normal. A forecast of exactly normal precipitation, even with a very narrow forecast distribution, while being welcome information, might not be as critical to the managers of energy companies, or farmers, as a forecast of deviant conditions with a wider probability distribution. Our ability to forecast likely shifts from the normal is currently greater than our ability to narrow the quantitative range of possibilities. If, at a distant future time, we become able to significantly narrow the width of the forecast distribution (as currently done in 1-day weather forecasts), this would automatically improve our shift direction confidence as well. Fortunately, our current lack of strong point forecast confidence does not prevent us from having fairly high shift direction confidence under certain circumstances. An example of such circumstances was the great El Nino of 1997-98, when correct forecasts of above normal precipitation were issued for the period of October to December 1997 in Kenya, for below normal rainfall for the same period in Indonesia, for above normal precipitation for the 1997-98 winter in the southern U.S., and for below normal rainfall for the 1997-98 summer in southern Africa.

Probabilities of Climatologically Equally Likely Categories
Underneath the confidence values are shown tercile-based forecast probabilities. These are probabilities that the seasonal precipitation total will be in the lower, middle or upper third of the climatological distribution. The precipitation amounts that divide these three sectors of the climatological distribution are given along with the forecast probabilities. When a non-climatology forecast is shown on the IRI's forecast map, the probability of the favored tercile in the present product should agree with that given on the map. For the other two categories, slight disagreements may be found due to rounding errors on the maps versus the more exact values given in this product.

Beyond terciles, one might want to see the probabilities of equi-probable categories of the climatological distribution other than for three categories. As described above, the quantile table provides probability forecasts for the same location and season for numbers of categories ranging from the two halves of the climatological distribution, all the way up to the ten deciles of the climatological distribution. Precipitation amounts forming the borderlines of the categories are shown along with the forecast probabilities, providing these amounts more accurately than could be done by eye from the probability of exceedance graph.

Trend of Recent Years
Summary statistics for the precipitation observations for the most recent 15 years are given in the lowest block of printed information. Here, the "center" refers to the fitted median, which is the representative center of the precipitation distribution over the recent 15 years. The median, shown to the right, is the center-ranked value (the 8th highest and 8th lowest) out of the 15 years of raw precipitation data. The median gives an idea of the middle value, without being affected by the extremeness of the higher and lower values.  The anomaly of the 15-year center can be inferred by comparing the center for the 15 years minus the overall (30-year) normal given at the upper left. Near the bottom of the graph, below the "0%" line on the vertical axis, the observations of the last 15 years are shown. These digits, indicating the last two digits of the year, are positioned horizontally so that they indicate the precipitation level of that year, as printed on the horizontal axis just beneath them. An asterisk on the "0%" line shows the center value (fitted median) of these recent 15 observations.

How These Graphs Should NOT be Used

The probability density graphs and probability of exceedance graphs convey the uncertainty inherent in the climatological distribution and in the forecasts that are conditioned on the current and expected climate state. The informed user is aware that in any individual case, the implications of the forecasts may be misleading, as for example when the direction of the shift from the normal turns out to be incorrect. Such "contrary" occurrences are expected at the frequency that the probabilities indicate. The value of the forecasts is likely to become visible with repeated use, as the frequency of "successes" will exceed the frequency of "failure" by an amount that is roughly conveyed by the forecast probabilities and/or the confidence estimates given with the graphs. This value may or may not show up clearly in a small set of cases. For a small sample of cases, the chances that the value of the forecasts will look better than they really are, are about equal to the chances that the value will look worse than they really are. The larger the sample of cases, the more realistic the apparent value will become. In other words, the larger the sample of cases, the less strong the role of luck can be.
To help show how this product should be used, the following are examples of how the product should NOT be used: The Fitting of Precipitation Distributions
Precipitation is fitted here using a Gaussian distribution following a power transformation of the original (skewed) precipitation data. The original data are raised to a power that is determined by the degree of skewness and by other basic features of the distribution. When skewness of the raw precipitation (indicated in the text accompanying the graphs) exceeds about 0.5, its effects become noticeable, and the need for a transformation (or a fitting scheme suited for asymmetric distributions) becomes clear. When there is a positive skew, featuring a long positive tail of the distribution, a shorter negative tail, and a mean that is higher than the median, the power to which the original data are raised is less than 1. The power is calibrated to be that which approximately eliminates the skewness, based on a large number of empirical simulations. After the data are power-transformed, Gausian statistics are applied. These statistics involve defining properly fitted climatological percentile values to the power-transformed precipitation values, and applying indicated shifting and/or narrowing of the transformed forecast distribution. Finally, the results are re-converted to their natural, skewed frame of reference by raising to the reciprocal of the power that had been used earlier. The above procedure functions similarly to use of a gamma distribution or other accommodation for skewed or otherwise non-Gaussian distribution shape. The above technique is effective for distributions that do not contain many zero amounts. The presence of a substantial number of zero amounts causes a violation of the Gaussian assumption even after the power transformation, because it represents a "floor effect", or a bunching of observations at a constant value at the lower end of the distribution. The stations selected here were ensured not to have a marked "floor effect". Moreover, stations with very low 3-month climatological rainfall totals are not provided with forecasts by IRI; these are indicated by a "dry mask" on the IRI's forecast maps.

Example
Interpreting a Probability Density and Probability of Exceedance Graph
Here we check an example of a probability density graph and its accompanying probability of exceedance graph. We consider the probability forecast made in mid-March 2004 for the 3-month period of April-May-June 2004 for Bangkok, Thailand. The months and the year the forecast is for, and the month and year it was made, are shown in red above the graphs. Still above the graph, but underneath the red headings, the name of the station is shown in black, followed by its latitude, longitude and elevation (m). Inside the graph a number of text entries appear. Text entries in black are related to the normal climatology for the station for the 3-month period being forecast, while entries in blue pertain to information about this particular forecast for that 3-month period. The text entries inside the graphs are identical between the two graphs.

First we examine the curves themselves. On the probability density graph the black curve, showing the climatological distribution, peaks between 40 and 45 cm for the three month period of April-May-June. The raw density histogram, shown in yellow, has marked irrigularities as compared with the smooth fitted black curve. For example, there is a relative lack of observations between roughly 40 and 47 cm, very near the center of the distribution. This kind of irregularity is to be expected in view of the fact that only 30 years contribute to the climatological distribution. Although the fit is rough, there are no shape misrepresentations of a more fundamental nature, such as two general peaks in the raw curve with a broad gap in between them. A more informed evaluation of the goodness of the fit is seen in a comparison between the black and yellow curves in the probability of exceedance graph. The yellow curve steps down at the precipitation amounts corresponding to each of the observations used over the 30-year period. Here it can be seen that the gap mentioned above on the probability density graph is due to a relative lack of observations between 40 and 48 cm, with only one observation at about 43 cm. Note that the yellow curve on the probability density graph is drawn as a histogram, using about 12 precipitation ranges across the whole graph, so that the exact observed amounts within each range cannot be identified. In the probability of exceedance graph, by contrast, the amounts can be identified quite accurately (in the present example, to within 1 cm). On the probability of exceedance graph, the relative gap between 40 and 48 cm is not seen to create a major misfit between the yellow step-curve and the black fitted curve. The lack of exact fitting is considered most likely due to sampling "luck of the draw" and is not considered a serious problem. Discrepancies much larger than this one would be reason for a more serious consideration of the possible inappropriateness of the fitting procedure. Once in a while such gross misfits are seen, and in such cases the user should be cautious about all of the detailed forecast information provided in this product.

Now we discuss the gist of the climate forecast. It is noted that on both graphs, the red dotted curve (the forecast) lies to the left of the black curve (the climatology). This indicates a shift of the probability distribution toward lower precipitation values. Looking at the probabilities of the lowest, middle and highest thirds (terciles) of the climatological distribution, the printed text indicates values of 45%, 34% and 21%, respectively. This describes a marked shift toward the drier portion of the climatological distribution, given that the average, or climatological, probabilities are 33.3% for each. Printed information at the upper left indicates that the normal amount of precipitation is 43.77 cm, and the best guess forecast amount (also called the "point forecast") is 41.00 cm, or 5.19 cm below the normal. Recall that the best guess, while believed to be the most likely amount, is only the center of a very large distribution of probabilities. The best guess lies at the 37.6 percentile of the climatological distribution. A feature that corresponds to this is that the blue vertical line at the 33.3 percentile mark (labelled "33%") for the climatological distribution (at 36.73 cm) is located slightly to the left (lower) than the peak of the forecast distribution in the probability density graph.  In both graphs, both sides of the forecast error envelope are separated from the black (climatological) curve across most of the range of precipitation, indicating that the probability shift is sufficient to stand out against its own possible lack of reliability. (Here "reliability" refers to the confidence in the positioning of red curves themselves--hence the interval of uncertainty formed by the error envolope.)

Shown on both graphs, the confidence measures are at the "low" level for shift direction, "low" for a narrowing (contraction) of the range of possibilities surrounding the best guess forecast, and "low" for integrated, or overall, confidence. The low confidence level for contraction is reflected in the fairly large width of the red forecast curve in the probability density graph--not much narrower than the climatological curve--and in the fact that the downward slope of the red forecast curve is only slightly steeper than the climatological curve in the probability of exceedance graph. The low confidence level for shift direction is reflected by the fact that in the probability of exceedance graph, while the upper error envelope curve is lower than the black curve over most of the range of the precipitation, it is only lower by a small amount. The forecast confidence intervals shown in the upper right part of both graphs provide the lower and upper precipitation amounts in between which the precipitation is expected to occur with 50% and 90% confidence. Due to possible skewness in the climatological precipitation distribution, the lower and upper limits are not necessarily the same distance, in cm, from the best guess forecast.

The precipitation observations for April through June of the last 15 years is shown at the bottom of the graph. The center (generally not the same as the mean) of the 15 amounts is shown by the red asterisk on the zero percent line, and shows a value very close to the 30-year normal amount (43.42 cm for the 15 years, versus 43.77 cm for the 30 years).  The pattern of the observations shows a wide variation among the precipitation amounts of the last 15 years. The wettest year among the last 15 was 1994 (nearly 80 cm), and the driest year was 1992 (about 21 cm). The variability over the most recent 15 years looks generally similar to that over the whole 30 year period. (In other stations or for other seasons, there may be a more noticeable difference between the precipitation amounts, or their variability, for the most recent 15 years versus the whole 30-year period.