## Thursday, August 28, 2014

### Minimum sample size in PLS-SEM, regression, and path analyses

Based on Monte Carlo simulations, the minimum sample size in PLS-SEM can be reliably and conservatively estimated based on the inequality below:

N > ( 2.48 / Abs(bm) ) ^ 2

Extensive tests suggest that this also applies to multiple regression, and path analyses. In the latter, only single-indicator variables are included in the model, even though it looks a lot like an SEM model.

The inequality above is discussed in the article titled: "Minimum sample size estimation in PLS‐SEM: The inverse square root and gamma‐exponential methods" (). It refers to the inverse square root method. The gamma‐exponential method, also discussed in the article, is a refinement of the inverse square root method that relies on equations that are much more complex.

In the inequality above, N is the required sample size, and Abs(bm) is the absolute value of the path coefficient with the minimum expected magnitude in the model. This inequality assumes that:

- One-tailed P values are used for hypothesis testing. A previous post discusses this issue in more detail ().

- The threshold for P values is .05. That is, P values should be equal to or lower than .05.

- Effect sizes (ESs), as calculated by WarpPLS, are also used for hypothesis testing ().

- The threshold for ESs is .02. That is, ESs should be equal to or greater than .02.

- Acceptable statistical power is equal to or greater than .8.

- The latent variables in the model are not collinear, when both lateral and vertical collinearity are considered. That is, the full collinearity VIFs calculated by WarpPLS for all latent variables are equal to or lower than 3.3 ().

This inequality highlights the fact that path coefficient strength is a much stronger determinant of statistical power in Monte Carlo simulations than the configuration of the structural model.

The inequality is proposed as an alternative to the widely used (and discredited) "10 times rule". It yields minimum sample sizes that are consistent with Cohen's power tables for multiple regression.

For example, let us say one has a model where the path coefficient with the minimum expected magnitude is .3. Then the required sample size is:

N > ( 2.48 / .3 ) ^ 2 = 68.34

The minimum required sample size is thus:

Nm = 69

The above assumes a pre-analysis minimum sample size estimation, where the path coefficient with the minimum expected magnitude is set prior to the analysis.

A post-analysis minimum sample size estimation, on the other hand, would be based on the results of a full PLS-SEM analysis. Generally pre-analysis estimation is recommended over post-analysis estimation.

The latter, post-analysis estimation, can only confirm that an appropriate sample size was used.

## Monday, July 21, 2014

### One-tailed or two-tailed P values in PLS-SEM?

Should P values associated with path coefficients, as well as with other coefficients such as weights and loadings, be one-tailed or two-tailed? This question is addressed through a publication recently released as a ScriptWarp Systems research report:

Kock, N. (2014).

*One-tailed or two-tailed P values in PLS-SEM?*Laredo, TX: ScriptWarp Systems.

PDF file:

http://www.scriptwarp.com/warppls/pubs/Kock_2014_OneTwoTailedPLSSEM.pdf

Abstract:

Should P values associated with path coefficients, as well as with other coefficients such as weights and loadings, be one-tailed or two-tailed? This question is answered in the context of structural equation modeling employing the partial least squares method (PLS-SEM), based on an illustrative model of the effect of e-collaboration technology use on job performance. A one-tailed test is recommended if the coefficient is assumed to have a sign (positive or negative), which should be reflected in the hypothesis that refers to the corresponding association. If no assumptions are made about coefficient sign, a two-tailed test is recommended. These recommendations apply to many other statistical methods that employ P values; including path analyses in general, with or without latent variables, plus univariate and multivariate regression analyses.

Labels:
bootstrapping,
Monte Carlo simulation,
one-tailed,
P value,
two-tailed

## Sunday, June 1, 2014

### PLS Applications Symposium; 30 May - 1 June 2014; Montreal, Canada

PLS Applications Symposium; 30 May - 1 June 2014; Montreal, Canada

(Abstract submissions accepted: 1 August 2013 - 1 February 2014)

*** Only abstracts are needed for the submissions ***

The partial least squares (PLS) method has increasingly been used in a variety of fields of research and practice, particularly in the context of PLS-based structural equation modeling (SEM).

As an emerging method, its users often face challenges in successfully publishing PLS-based research, hence the theme of this year's Symposium: Successfully publishing PLS-based research.

The focus of this Symposium is on the application of PLS-based methods, from a multidisciplinary perspective. For types of submissions, deadlines, and other details, please visit the Symposium’s web site:

http://plsas.net

Ned Kock, Ph.D.

WarpPLS Developer

http://nedkock.com

Labels:
conference,
PLS Applications Symposium,
training,
warppls

## Thursday, May 22, 2014

### Dichotomous variables

Models with dichotomous variables can be tested with WarpPLS. Based on preliminary Monte Carlo simulations, the following combination of algorithm and P value calculation method seems to be the most advisable: PLS regression and stable.

A model with a dichotomous dependent variable can also be tested with WarpPLS; another technique that can be used is logistic regression, which is a variation of ordinary multiple regression.

Below is a model with a dichotomous dependent variable - Effe. The variable assumes two values, 0 or 1, to reflect low or high levels of "effectiveness".

The graph below shows the expected values of Effe given Effi. The latter is one of the LVs that point at Effe in the model. The values of Effe and Effi are unstandardized.

Arguably a model with a dichotomous dependent variable cannot be viably tested with ordinary multiple regression because the dependent variable is not normally distributed (as it assumes only two values).

The graph below shows a histogram with the distribution of values of Effe. This variable's skewness is -0.423 and excess kurtosis is -1.821.

This is not a problem for WarpPLS because P values are calculated via nonparametric techniques that do not assume in their underlying design that any variables in the model meet parametric expectations; such as the expectations of univariate and multivariate unimodality and normality.

If a dependent variable refers to a probability, and is expected to be associated with a predictor according to a logistic function, you should use the Warp3 or Warp3 basic inner model algorithms to relate the two variables.

## Friday, March 14, 2014

### How do I conduct a robust path analysis?

What if a researcher has only one measure for each latent variable, and still wants to perform a “robust” analysis where no parametric assumptions (e.g., univariate or multivariate normality) are made beforehand?

This would call for a new robust multivariate analysis approach – a robust path analysis. In it, the variables in the structural model would not be “latent”, and thus other assessments would have to be performed in place of a confirmatory factor analysis.

An article illustrating a robust path analysis with WarpPLS is available. To the best of our knowledge, this is the first published article employing this type of analysis. The full reference, link to full text PDF file, and abstract for the article are available below.

Kock, N., & Gaskins, L. (2014). The mediating role of voice and accountability in the relationship between Internet diffusion and government corruption in Latin America and Sub-Saharan Africa.

*Information Technology for Development*, 20(1), 23-43.

PDF file:

http://www.scriptwarp.com/warppls/pubs/Kock_Gaskins_2014_ITD_NetCorrup.pdf

*We examine relationships among Internet diffusion, voice and accountability, and government corruption based on data from 24 Latin American and 23 sub-Saharan African countries from 2006 to 2010. Our study suggests that greater levels of Internet diffusion are associated with greater levels of voice and accountability and that greater levels of voice and accountability are associated with lower levels of government corruption. Also, there seems to be an overall relationship between Internet diffusion and government corruption, which is primarily indirect and mediated by voice and accountability. Our study builds on modernization theory, and employs the method of robust path analysis, implemented through the software WarpPLS. Policy-makers in developing countries aiming at increasing voice and accountability at the national level, and thus the degree to which their citizens participate in the country’s governance, should strongly consider initiatives that broaden Internet access in their countries.*

## Friday, February 28, 2014

### Using data labels to discover moderating effects in PLS-based structural equation modeling

How can one discover moderating effects with data labels? This question is addressed through the article below:

Kock, N. (2014). Using data labels to discover moderating effects in PLS-based structural equation modeling. International Journal of e-Collaboration, 10(4), 1-14.

http://cits.tamiu.edu/kock/pubs/journals/2014JournalIJeC2/Kock_2014_IJeC_UsingDataLabelsMod.pdf

This publication refers to a sample dataset, with data and data labels, illustrating a moderating effect. This dataset is linked below as a .xlsx file. The data was created based on a Monte Carlo simulation.

http://www.scriptwarp.com/warppls/data/Kock_2014_ECollabModStudyData.xlsx

Another approach to discover moderating effects is a full latent growth analysis.

Sometimes the actual inclusion of moderating variables and corresponding links in a model leads to problems; e.g., increases in collinearity levels, and the emergence of instances of Simpson’s paradox. The WarpPLS menu option “Explore full latent growth”, available starting in version 6.0, allows you to completely avoid these problems, and estimate the effects of a latent variable or indicator on all of the links in a model (all at once), without actually including the variable in the model. Moreover, growth in coefficients associated with links among different latent variables and between a latent variable and its indicators, can be estimated; allowing for measurement invariance tests applied to loadings and/or weights.

Related YouTube video:

Explore Full Latent Growth in WarpPLS

http://youtu.be/x_2e8DVyRhE

## Sunday, January 19, 2014

### WarpPLS 4.0 upgraded to stable

Dear colleagues:

Version 4.0 of WarpPLS is now available as a stable version. You can download and install it for a free trial from:

http://warppls.com

This version was initially released as a beta version and was later upgraded to stable. It has undergone extensive testing in-house prior to its release as a beta version, and has been in the hands of users for several months prior to its upgrade to stable.

The full User Manual is also available for download from the web site above separately from the software. See this document, and the link below to a previous post, for more details about this new version.

http://bit.ly/HjSknv

Enjoy!

Subscribe to:
Posts (Atom)