Tuesday, May 18, 2010

Selecting among different WarpPLS analysis algorithms

First a quick recap of some issues already discussed in previous posts. WarpPLS offers the following analysis algorithms: Warp3 PLS Regression, Warp2 PLS Regression, PLS Regression, and Robust Path Analysis.

Many relationships in nature, including relationships involving behavioral variables, are nonlinear and follow a pattern known as U-curve (or inverted U-curve). In this pattern a variable affects another in a way that leads to a maximum or minimum value, where the effect is either maximized or minimized, respectively. This type of relationship is also referred to as a J-curve pattern; a term that is more commonly used in economics and the health sciences.

The Warp2 PLS Regression algorithm tries to identify a U-curve relationship between latent variables, and, if that relationship exists, the algorithm transforms (or “warps”) the scores of the predictor latent variables so as to better reflect the U-curve relationship in the estimated path coefficients in the model. The Warp3 PLS Regression algorithm, the default algorithm used by the software, tries to identify a relationship defined by a function whose first derivative is a U-curve. This type of relationship follows a pattern that is more similar to an S-curve (or a somewhat distorted S-curve), and can be seen as a combination of two connected U-curves, one of which is inverted.

The PLS Regression algorithm does not perform any warping of relationships. It is essentially a standard PLS regression algorithm, whereby indicators’ weights, loadings and factor scores (a.k.a. latent variable scores) are calculated based on a least squares minimization sub-algorithm, after which path coefficients are estimated using a robust path analysis algorithm. A key criterion for the calculation of the weights, observed in virtually all PLS-based algorithms, is that the regression equation expressing the relationship between the indicators and the factor scores has an error term that equals zero. In other words, the factor scores are calculated as exact linear combinations of their indicators. PLS regression is the underlying weight calculation algorithm used in both Warp3 and Warp2 PLS Regression. The warping takes place during the estimation of path coefficients, and after the estimation of all weights and loadings in the model. The weights and loadings of a model with latent variables make up what is often referred to as outer model, whereas the path coefficients among latent variables make up what is often called the inner model.

Finally, the Robust Path Analysis algorithm is a simplified algorithm in which factor scores are calculated by averaging all of the indicators associated with a latent variable; that is, in this algorithm weights are not estimated through PLS regression. This algorithm is called “Robust” Path Analysis, because, as with most robust statistics methods, the P values are calculated through resampling. If all latent variables are measured with single indicators, the Robust Path Analysis and the PLS Regression algorithms will yield identical results.

Okay, so what algorithm should you use?

Generally it will be one of these: Warp3 PLS Regression, Warp2 PLS Regression, PLS Regression. Only in a small number of instances, quite rare, will the Robust Path Analysis algorithm be the best choice.

If you analyze your dataset using different algorithms (e.g., Warp3 PLS Regression, Warp2 PLS Regression, and PLS Regression), usually the “best” algorithm will be the one leading to the most stable path coefficients. The most stable path coefficients are the ones with the lowest P values, whether the P values are obtained through bootstrapping or jackknifing. The best algorithm will also be the one leading to the highest average R-squared (ARS).

Another important consideration is theory. Does the theory underlying a hypothesized relationship between latent variables support the expectation of a U-curve or S-curve relationship? If the theory supports the expectation of a U-curve relationship, but not of an S-curve relationship, then you should favor Warp2 PLS Regression over Warp3 PLS Regression, even if the latter leads to the most stable path coefficients (i.e., with the lowest P values).