Links to specific topics

(See also under "Labels" at the bottom-left area of this blog)
[ Welcome post ] [ Installation issues ] [ WarpPLS.com ] [ Posts with YouTube links ] [ Model-driven data analytics ] [ PLS-SEM email list ]

Monday, March 15, 2010

Standard deviation is not the same as range of variation


Means and standard deviations can be generated and saved through the “Save grouped descriptive statistics into a tab-delimited .txt file” option of WarpPLS. You can choose a grouping variable, number of groups, and the variables to be grouped. This option is useful if one wants to conduct a comparison of means analysis using the software, where one variable (the grouping variable) is the predictor, and one or more variables are the criteria (the variables to be grouped).

In comparisons of means analyses, research results are normally expressed in means and standard deviations. For example, in the study reviewed in this post, it is stated that the weight of participants in a 12-week weight loss study varied from: 87.9 plus or minus 15.4 kg (at baseline, or before the 12-week intervention) to 81.7 plus or minus 16.2 kg (after the 12-week intervention).

The 87.9 and 81.7 are the average weights (a.k.a. “mean” weights), in kilograms, before and after the 12-week intervention. However, the 15.4 and 16.2 are NOT the range of variation in weights around the means before and after 12-week intervention. They are actually the ranges around the means encompassing approximately 68 percent of all of the values measured (see figure below, from www.electrical-res.com).


In the figure above, the minus and plus 15.4 and 16.2 values would be the “mean(x) – s” and “mean(x) + s” points on the horizontal axis of histograms of weights plotted before and after the 12-week intervention. This assumes that the distributions of weights are normal, or quasi-normal (i.e., similar to a bell-shaped, or normal, curve); a common assumption in this type of research.

The larger the standard deviation, the wider is the variation in the measures, and the flatter is the associated histogram (the bell-shaped curve). This property has a number of interesting implications, some of which will be discussed in other posts.

Sometimes another measure of dispersion, the variance, is reported instead of the standard deviation. The variance is the standard deviation squared.

The reason why standard deviations are reported instead of ranges of variation is that outliers (unusually high or low values) can dramatically widen the ranges. The standard deviation coefficient is much less sensitive to outliers.

No comments: