Thursday, August 28, 2014
Minimum sample size in PLS-SEM, regression, and path analyses
Based on Monte Carlo simulations, the minimum sample size in PLS-SEM can be reliably and conservatively estimated based on the inequality below:
N > ( 2.48 / Abs(bm) ) ^ 2
Extensive tests suggest that this also applies to multiple regression, and path analyses. In the latter, only single-indicator variables are included in the model, even though it looks a lot like an SEM model.
The inequality above is discussed in the article titled: "Minimum sample size estimation in PLS‐SEM: The inverse square root and gamma‐exponential methods" (). It refers to the inverse square root method. The gamma‐exponential method, also discussed in the article, is a refinement of the inverse square root method that relies on equations that are much more complex.
In the inequality above, N is the required sample size, and Abs(bm) is the absolute value of the path coefficient with the minimum expected magnitude in the model. This inequality assumes that:
- One-tailed P values are used for hypothesis testing. A previous post discusses this issue in more detail ().
- The threshold for P values is .05. That is, P values should be equal to or lower than .05.
- Effect sizes (ESs), as calculated by WarpPLS, are also used for hypothesis testing ().
- The threshold for ESs is .02. That is, ESs should be equal to or greater than .02.
- Acceptable statistical power is equal to or greater than .8.
- The latent variables in the model are not collinear, when both lateral and vertical collinearity are considered. That is, the full collinearity VIFs calculated by WarpPLS for all latent variables are equal to or lower than 3.3 ().
This inequality highlights the fact that path coefficient strength is a much stronger determinant of statistical power in Monte Carlo simulations than the configuration of the structural model.
The inequality is proposed as an alternative to the widely used (and discredited) "10 times rule". It yields minimum sample sizes that are consistent with Cohen's power tables for multiple regression.
For example, let us say one has a model where the path coefficient with the minimum expected magnitude is .3. Then the required sample size is:
N > ( 2.48 / .3 ) ^ 2 = 68.34
The minimum required sample size is thus:
Nm = 69
The above assumes a pre-analysis minimum sample size estimation, where the path coefficient with the minimum expected magnitude is set prior to the analysis.
A post-analysis minimum sample size estimation, on the other hand, would be based on the results of a full PLS-SEM analysis. Generally pre-analysis estimation is recommended over post-analysis estimation.
The latter, post-analysis estimation, can only confirm that an appropriate sample size was used.
Subscribe to:
Posts (Atom)