Response to Lewandowsky, Part IV: Heteroscedasticity and Skew

Skewed sampling from heteroscedastic distributions

Here we consider the problems that arise when a heteroscedastic distribution (one where the variance changes along the x-axis) is analysed by a simple straight line fit. We use Monte Carlo simulations to show that skewed sampling from such a distribution will lead to statistically significant trends which are entirely artefactual. Even the minor skew that arises naturally in any random sample can be large enough to generate artefactual trends.

As an illustration, suppose a dataset is sampled evenly from within the blue area in the left-hand plot in Figure 1. The y variable is heteroscedastic with respect to x, that is, the variance of y is higher at low values of x. Regression of y on x gives a horizontal line (dotted line), which is not affected by the uneven variance. Now consider non-random sampling of y, in which more samples are collected at high values of y (shaded area of the right-hand plot). Now, the heteroscedasticity matters, as the regression line is no longer horizontal, but is pulled up at the left-hand side by the greater number of samples at high values of y. That is exactly what happened in the LOG13-blogs case.

Figure 1. Schematic showing heteroscedastic distribution (a) evenly and (b) unevenly sampled.

Figure 1. Schematic showing heteroscedastic distribution (a) evenly and (b) unevenly sampled.

To show that this effect works in practice we generated 1000 points drawn at random from a heteroscedastic distribution (in which the variation in p is a function of q) for two cases: unevenly and evenly sampled.

We begin with the unevenly sampled dataset. Figure 2 shows scatter plots in both directions (p predicting q and vice versa), with summary lines drawn using a simple linear fit and using loess.

Figure 2. Uneven (skewed) sampling of a heteroscedastic distribution. (Left side, linear fits, right side, loess fits).

Figure 2. Uneven (skewed) sampling of a heteroscedastic distribution.

In Figure 2, the simple linear fit appears to reveal a clear linear relationship (left-hand plots), but that is simply a consequence of combining skewed sampling and heteroscedasticity. As the sampling becomes more skewed the linear trend becomes stronger (not shown).

The loess fit for p predicting q (top right-hand plot) shows a clear quadratic relationship reflecting the shape of the error distribution. Note that the loess fit for q predicting p does not show a non-linear relationship.

Next we turn to unskewed (uniform) sampling and show the same four plots for the evenly sampled dataset in Figure 3.

Figure 3. Uniform sampling of a heteroscedastic distribution.

Figure 3. Uniform sampling of a heteroscedastic distribution.

When the same heteroscedastic distribution was evenly sampled (Figure 3), the simple linear fits often showed a marginally significant relationship between p and q, even though the underlying relationship was simply symmetric random noise. The weak trend in the ‘unskewed’ data reflects the minor non-uniformity that naturally arises in a random sample (and looks very like the trend exhibited by the LGO13-panel dataset). The direction and significance of this trend varied from run to run as it depends on the random numbers that happened to be sampled each time.

Note that the reason for the skew in the data does not matter (it may correctly or incorrectly reflect the underlying population). Any uneven distribution of data with this type of heteroscedastic error distribution will generate spurious linear trends. The LOG13-panel data showed a (slight) negative trend because there was a slight preponderance of data on the high side of CLIM.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s