Links to specific topics

Thursday, August 28, 2014

Minimum sample size in PLS-SEM, regression, and path analyses

Based on Monte Carlo simulations, the minimum sample size in PLS-SEM can be reliably and conservatively estimated based on the inequality below:

N > ( 2.48 / Abs(bm) ) ^ 2

Extensive tests suggest that this also applies to multiple regression, and path analyses. In the latter, only single-indicator variables are included in the model, even though it looks a lot like an SEM model.

The inequality above is discussed in the article titled: "Minimum sample size estimation in PLS‐SEM: The inverse square root and gamma‐exponential methods" (). It refers to the inverse square root method. The gamma‐exponential method, also discussed in the article, is a refinement of the inverse square root method that relies on equations that are much more complex.

In the inequality above, N is the required sample size, and Abs(bm) is the absolute value of the path coefficient with the minimum expected magnitude in the model. This inequality assumes that:

- One-tailed P values are used for hypothesis testing. A previous post discusses this issue in more detail ().

- The threshold for P values is .05. That is, P values should be equal to or lower than .05.

- Effect sizes (ESs), as calculated by WarpPLS, are also used for hypothesis testing ().

- The threshold for ESs is .02. That is, ESs should be equal to or greater than .02.

- Acceptable statistical power is equal to or greater than .8.

- The latent variables in the model are not collinear, when both lateral and vertical collinearity are considered. That is, the full collinearity VIFs calculated by WarpPLS for all latent variables are equal to or lower than 3.3 ().

This inequality highlights the fact that path coefficient strength is a much stronger determinant of statistical power in Monte Carlo simulations than the configuration of the structural model.

The inequality is proposed as an alternative to the widely used (and discredited) "10 times rule". It yields minimum sample sizes that are consistent with Cohen's power tables for multiple regression.

For example, let us say one has a model where the path coefficient with the minimum expected magnitude is .3. Then the required sample size is:

N > ( 2.48 / .3 ) ^ 2 = 68.34

The minimum required sample size is thus:

Nm = 69

The above assumes a pre-analysis minimum sample size estimation, where the path coefficient with the minimum expected magnitude is set prior to the analysis.

A post-analysis minimum sample size estimation, on the other hand, would be based on the results of a full PLS-SEM analysis. Generally pre-analysis estimation is recommended over post-analysis estimation.

The latter, post-analysis estimation, can only confirm that an appropriate sample size was used.


AM YAKASAI said...

how do i determine Abs(bm)?

Ned Kock said...

Hi Am. Abs(bm) is the absolute value of the path coefficient with the minimum expected magnitude in the model.

Murad Moqbel said...

Thanks Ned!

gayathri said...

Thanks for this Ned. Can you provide some references to cite for this method of calculating the sample size.

Ned Kock said...

Hi gayathri. To reference, see note under copyright on the left part of this blog.

George M. said...

Thanks for this post and this blog. I found it really useful. I have one question concerning sample size. The absolute value of the path coefficient with the minimum expected magnitude ¿is the lowest significant beta coefficient we obtain in the causal model?
Thus, if the lowest beta coefficient (significant at .05) is .10, the minimum size should be around 615 cases?

darwin said...

Sir Ned,
Sir I just want to ask the minimum sample size for CB-SEM.

Ned Kock said...

Hi darwin. This is a PLS-SEM blog, focused on WarpPLS. I suggest you pose the question in a CB-SEM forum.

Anonymous said...

Hi,my study have 800 sample, could I use Wrappls?

Ned Kock said...

Generally speaking, an N=800 can be considered fairly large. With a sample size this large you may have a problem with false positives. Very small effects, associated with zero effects at the population level, turn out to be statistically significant. Therefore, with a large sample like this (N=800) you should also use effect sizes while testing hypotheses.

I hope that the materials linked below can be of use in connection with this.

Kock, N. (2014). Advanced mediating effects tests, multi-group analyses, and measurement model assessments in PLS-based SEM. International Journal of e-Collaboration, 10(3), 1-13.

(For the full text link to the above publication, see under “Publications” at:

User Manual (link to specific page):

The links above, as well as other links that may be relevant in this context, are available from:

Anonymous said...

Thank you for your prompt reply. Really help me a lot.