A.W. Robertson, M.R. Allen, M. Ghil, & L.A. Smith
1. Introduction: We present a test of statistical significance for multichannel singular spectrum analysis (M-SSA), an empirical orthogonal function (EOF) analysis of lagged dat a, used to detect oscillatory behavior in vector timeseries (Kimoto et al. 1991, Plaut and Vautard 1994). EOF-based analyses are often hampered by the difficult y of distinguishing between "signal" and "noise" patterns. Climatic time series are inherently "red" in both space and time, so that the leading EOFs are genera lly characterized by large spatial scales and large decorrelation times that can often appear deceptively similar to large-scale low-frequency oscillatory signa ls. In the case of M-SSA, this inherent redness tends to make the leading EOFs q uasi-sinusoidal low-frequency patterns of large spatial scale. The natural null- hypothesis to test oscillatory M-SSA patterns against is, therefore, that the ti me series has been generated by a spatio-temporal AR(1) process. This extends to the multichannel case of the test developed by Allen and Smith (1994) for singl e-channel SSA. Section 2 presents the method and Section 3 its application to bo th synthetic and coupled ocean-atmosphere general circulation model (GCM) genera ted time series.
2. Method: These tests are based on the "surrogate data" approach of Theiler et al. (1992) and Smith (1992). First, we generate a large ensemble of red-noise "surrogate da ta" segments, that have the same length, number of channels, and same lag-1 temp oral and spatial auto-correlations as our sample time series. Second, we project both our sample and each of the noise surrogates onto a common basis, and compa re the power in the individual basis vectors. We thereby test whether there is s ignificantly more power in the projections of the data onto a particular basis v ector (i.e. along a particular line in phase space) than we would expect to find had the data been generated by the hypothetical noise process. If so, and if th at vector is dominated by a single frequency, this may indicate a genuine oscill ation. The choice of basis is fundamental. (M)-SSA provides an optimal data-adaptive ba sis, and so a natural choice is to use it; this basis is derived from the window ed sample via a singular value decomposition (SVD) of the augmented data matrix X. The columns of the latter consist of lagged copies of the time series, so tha t X contains a complete record of patterns which have occurred within the time-w indow (Broomhead and King 1986). Windowing the data in time is used to enhance t he signal-to-noise (S/N) ratio. SVD yields two orthogonal bases, the temporal pr incipal components (T-PCs) and temporal EOFs (T-EOFs):
X is rectangular with sides ML and N-M+1, where N is the length of the time seri es, M is the width of the time window, and L is the number of spatial channels. The T-PCs are the left-singular vectors of X, given by the columns of P; the col umns of Q are the right-singular vectors, or T-EOFs; and * is diagonal with the associated variances as its entries. For short time series, ML > N-M+1 so that o nly the PCs form a complete basis, which we use in the surrogate data test. Henc e the surrogates are of length N-M+1 and periods up to N-M+1 months can be resol ved by the test.
One drawback of the "data basis" is its very optimality. Since it is tailored to our finite segment of sample data, there will be an "artificial" compression of data variance into the leading basis vectors\121an effect that increases with t he rank of P. The surrogate noise segments, on the other hand, enjoy none of thi s advantage. This introduces a bias into the test, making it likely that high-ra nked data eigenvalues will appear significant even if the data set consists of p ure red noise. A less biased basis is given by the noise process; for an AR(1) p rocess the basis consists of sinusoids, separated in frequency by 1/2(N-M+1) (Al len 1992). Projecting onto this "null-hypothesis basis" is consistent with assum ing that the time series under investigation is noise, until we can show otherwi se. Both bases are used in the examples below.
3. Examples: To illustrate the tests, we apply them to (a) some synthetic data consisting of AR(1) noise with two sinusoids superposed, and (b) the equatorial Pacific sea-su rface temperatures (SST) of the UCLA coupled GCM. In (b), we have 312 monthly me ans of SST anomalies (deviations from the mean seasonal cycle) at 29 points alon g the equator. Anticipating the spectrum of (b), the synthetic data has its red- noise parameters approximating those of the GCM, with periodicities of 50 and 20 months in spatial wavenumber 1 added. In both cases we choose a window width of M=161 months (i.e. N-M+1=152), resolving periods up to 152 months. Selecting M involves a compromise: N-M+1 needs to be large enough to allow adequate enhancem ent of signal-to-noise, but small enough for statistical robustness; in the case of the data basis, it should be small enough to minimize the artificial varianc e-compression effect mentioned above. To make the test as stringent as possible, we have to account not only for the fact that the data are red in space and tim e, but that the largest spatial scales are associated with the largest decorrela tion times. This is accomplished most easily via a spatial EOF (S-EOF) decomposi tion of the data: we take the leading 10 spatial PCs (S-PCs: Ghil and Mo 1991) a s the channels of our sample data, instead of the 29 spatial grid points. Thus, the surrogates are constructed from 10 AR(1) processes, whose variances and lag- 1 autocorrelations are identical to the 10 input channels, so decorrelation time s are effectively spatial-scale-dependent.
Figure 1 illustrates the data-basis test applied to the synthetic data with an a verage S/N ratio of 0.1. The error bars span the 5 to 95 percentiles of 1000 rea lizations of noise. The two periodicities associated with eigenvalue pairs pass the test, but the eigenvalues of the noise background sometimes also "pair up". This occurs when the two periodicities are suppressed as well (not shown), and h ighlights the dangers inherent in using pairing criteria alone to distinguish os cillations from red noise. We naively expect 5% of the sample eigenvalues (or pr ojections) to lie respectively above and below the 5th/95th percentiles, but man y more actually fall below. This is a consequence of the artificial variance com pression associated with the data basis.
Figure 2 shows the projections onto the surrogate basis. Here, the T-PCs are equ ally spaced in frequency and contain monotonically decreasing power with period (compare Vautard and Ghil, 1989). This test is more conservative, but the signal frequencies are clearly picked out. Note that here excursions above process val ues are necessarily compensated for by excursions below because the expected tot al power of each surrogate is equal to the total power of the sample data.
Figures 3 and 4 illustrate the tests applied to our GCM's equatorial SST anomali es. This model was examined in detail by Robertson et al. (1995), who found osci llatory behavior with ENSO-like dynamics and periods of 50 and 20 months, togeth er with a peak at 9.5 months. The data basis (Fig. 3) does pick out the 50 and 9 .5 month peaks, although some other frequencies also pass the test, and again th ere is a noticeable artificial variance-compression effect. The surrogate basis test locates all three periodicities, although the level of significance is bare ly 90%. This may simply reflect the S/N ratio in this short time series, but fur ther enhancement should be possible by optimizing M. The spatial scale dependenc e of decorrelation times of equatorial SST produces the marked concavity of the spectra in Figs. 3 and 4 at lower frequencies. This is clear by contrast with Fi gs. 1 and 2, where the noise component of the synthetic data was estimated ignor ing that dependence. The 50- and 20-month peaks agree with those detected by the analysis of COADS data using SSA (Rasmusson et al. 1990), and M-SSA (Jiang et a l. 1992) with different statistical-significance tests. Confirmation of the 9.5- month peak in the GCM results by these two additional tests encourages us to loo k for it more carefully in the data.
Acknowledgments: This work was supported by INCOR (AWR), and a NOAA Postdoctoral Climate and Global Change Fellowship (MRA).
Figure 1: Data-basis test applied to synthetic data, consisting of AR(1) noise p lus sinusoids with 50- & 20-month periods.
Figure 2: As in Fig. 1, but using the null-hypothesis basis, i.e. projecting bot h the data and the noise surrogates onto an AR(1) basis.
Figure 3: Data-basis test applied to SST time series simulated by the UCLA coupl ed GCM over the equatorial Pacific.
Figure 4: As in Fig. 3, but using the null-hypothesis basis.