Links to specific topics

(See also under "Labels" at the bottom-left area of this blog)
[ Welcome post ] [ Installation issues ] [ WarpPLS.com ] [ Posts with YouTube links ] [ Model-driven data analytics ] [ PLS-SEM email list ]
Showing posts with label PLS regression. Show all posts
Showing posts with label PLS regression. Show all posts

Friday, October 21, 2022

Dichotomous variables


There are several ways in which a model with an endogenous dichotomous variable can be analyzed in PLS-SEM - via the logistic regression variables technique, without any additional treatment, and via the conditional probabilistic queries technique. Below we discuss the first two options.

Logistic regression variables technique

Starting in version 8.0 of WarpPLS, the menu option “Explore logistic regression” allows you to create a logistic regression variable as a new indicator that has both unstandardized and standardized values. Logistic regression is normally used to convert an endogenous variable on a non-ratio scale (e.g., dichotomous) into a variable reflecting probabilities. You need to choose the variable to be converted, which should be an endogenous variable, and its predictors. 

The new logistic regression variable is meant to be used as a replacement for the endogenous variable on which it is based. Two algorithms are available: probit and logit. The former is recommended for dichotomous variables; the latter for non-ratio variables where the number of different values (a.k.a. “distinct observations”) is greater than 2 but still significantly smaller than the sample size; e.g., 10 different values over a sample size of 100. The unstandardized values of a logistic regression variable are probabilities; going from 0 to 1. 

Since a logistic regression variable can be severely collinear with its predictors, you can set a local full collinearity VIF cap for the logistic regression variable. Predictor-criterion collinearity, or lateral collinearity, is rarely assessed or controlled in classic logistic regression algorithms. 

For more on this topic, see the links below.

Explore Logistic Regression in WarpPLS

Using Logistic Regression in PLS-SEM with Composites and Factors

Without any additional treatment

Models with dichotomous endogenous variables can also be tested with WarpPLS without any additional treatment, although this may lead to path coefficient suppression as well as other problems. Based on preliminary Monte Carlo simulations, the following combination of algorithm and P value calculation method seems to be the most advisable, if users want to analyze models with composite-based algorithms: PLS regression and stable.

Below is a model with a dichotomous dependent variable - Effe. The variable assumes two values, 0 or 1, to reflect low or high levels of "effectiveness".



The graph below shows the expected values of Effe given Effi. The latter is one of the LVs that point at Effe in the model. The values of Effe and Effi are unstandardized.



Arguably a model with a dichotomous dependent variable cannot be viably tested with ordinary multiple regression because the dependent variable is not normally distributed (as it assumes only two values).

The graph below shows a histogram with the distribution of values of Effe. This variable's skewness is -0.423 and excess kurtosis is -1.821.



This is not a problem for WarpPLS because P values are calculated via nonparametric techniques that do not assume in their underlying design that any variables in the model meet parametric expectations; such as the expectations of univariate and multivariate unimodality and normality.

If a dependent variable refers to a probability (as in logistic regression), and is expected to be associated with a predictor according to a logistic function, you should use the Warp3 or Warp3 basic inner model algorithms to relate the two variables.

Sunday, February 5, 2012

New PLS-based SEM email distribution list

A new email distribution list is available for those who share a common interest in partial least squares (PLS) regression and its use in structural equation modeling (SEM). To check it out click here.

Tuesday, June 28, 2011

WarpPLS’ treatment of formative latent variables: PLS regression is more conservative and stable


For a more detailed discussion of the issues addressed in this post, please see the  publication below.

Kock, N., & Mayfield, M. (2015). PLS-based SEM algorithms: The good neighbor assumption, collinearity, and nonlinearity. Information Management and Business Review, 7(2), 113-130.

***

WarpPLS implements a number of composite-based and factor-based algorithms. One of the composite-based algorithms is often referred to as Wold’s original “PLS regression” algorithm to calculate indicator weights, for both formative and reflective variables. PLS regression was developed by Wold, and is slightly different from the modified versions often referred to as modes A and B, which are the ones normally used in other publicly available PLS-based structural equation modeling software. These modified versions implement an underlying algorithmic assumption that Lohmöller called the "good neighbor" assumption, whereby weights are influenced by inner model links.

Generally speaking, the PLS regression algorithm generates coefficients that are more stable and robust – i.e., reliable for hypothesis testing. It also tends to minimize collinearity. On the other hand, it may be lead to a higher demand for computational power in some cases, which may be the reason why modified versions have been implemented. Lohmöller discusses multiple algorithm versions, with some characteristics placing them within broad types called “modes” – see Lohmöller (1989), the PLS "bible", for more details. Personal computers were not that powerful in the 1980s.

Moreover, the type of nonlinear treatment employed by WarpPLS is difficult to perform with Lohmöller’s underlying algorithm (the "good neighbor" assumption), whereby the outer model is influenced by the inner model. The problem is that with Lohmöller’s algorithm, as a model changes, the weights and loadings also change, even if the latent variables do not change. That is, with Lohmöller’s algorithm, two models with the same latent variables but different structures (i.e., links among latent variables) will have different weights and loadings.

The weights of formative latent variables will be essentially the same in WarpPLS as they would be if the variables were defined as reflective. That is, they will be obtained by an iterative algorithm that stops when two conditions are met: (a) the weights between indicators and latent variable are standardized partial regression coefficients calculated with the indicators as independent variables and the latent variable as the dependent variable; and (b) the regression equation expressing the latent variable as a combination of the indicators has an error term of zero.

So why should the user define a latent variable as formative or reflective? The reason are the interpretations of the outputs generated by the software. When a latent variable is formative, both the P values for the weights and the variance inflation factors for the indicators should be generally low; ideally below 0.05 and 2.5, respectively.

True formative variables are fundamentally different from true reflective variables; there are cases that can be seen as “in between” formative and reflective. True formative and reflective variables behave differently, whether the software treats them differently or not. For example, with true formative variables you would expect indicators to be significantly associated with the scores of their respective latent variable; which is indicated by low P values for their weights. However, you would not normally expect the indicators to be redundant; which is indicated by low variance inflation factors for the indicators.

The way formative variables are treated in Lohmöller’s approach leads to unstable weights, with the signs of weights frequently changing in the resample set. See Temme et al. (2006) for a discussion on this phenomenon. Lohmöller’s approach also leads to “lateral” collinearity; or collinearity between predictor and criteria latent variables. This “stealth” type of collinearity often leads to inflated path coefficients for links involving formative latent variables.

Formative variables don't "become reflective", or vice-versa, if one or another algorithm is used. This is a common misconception among users of PLS-based SEM software.

References

Kock, N., & Mayfield, M. (2015). PLS-based SEM algorithms: The good neighbor assumption, collinearity, and nonlinearity. Information Management and Business Review, 7(2), 113-130.

Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg, Germany: Physica-Verlag.

Temme, D., Kreis, H., & Hildebrandt, L. (2006). PLS path modeling – A software review. Berlin, Germany: Institute of Marketing, Humboldt University Berlin.

Thursday, January 28, 2010

Viewing and changing settings in WarpPLS 1.0 and 2.0


The blog post below refers to version 1.0 - 2.0 of WarpPLS. See this YouTube video on how to view and change settings for version 3.0. For more recent versions, see the WarpPLS User Manual and YouTube videos available from warppls.com.

***

The view or change settings window (see figure below, click on it to enlarge) allows you to select an algorithm for the SEM analysis, select a resampling method, and select the number of resamples used, if the resampling method selected was bootstrapping. The analysis algorithms available are Warp3 PLS Regression, Warp2 PLS Regression, PLS Regression, and Robust Path Analysis.


Many relationships in nature, including relationships involving behavioral variables, are nonlinear and follow a pattern known as U-curve (or inverted U-curve). In this pattern a variable affects another in a way that leads to a maximum or minimum value, where the effect is either maximized or minimized, respectively. This type of relationship is also referred to as a J-curve pattern; a term that is more commonly used in economics and the health sciences.

The Warp2 PLS Regression algorithm tries to identify a U-curve relationship between latent variables, and, if that relationship exists, the algorithm transforms (or “warps”) the scores of the predictor latent variables so as to better reflect the U-curve relationship in the estimated path coefficients in the model. The Warp3 PLS Regression algorithm, the default algorithm used by the software, tries to identify a relationship defined by a function whose first derivative is a U-curve. This type of relationship follows a pattern that is more similar to an S-curve (or a somewhat distorted S-curve), and can be seen as a combination of two connected U-curves, one of which is inverted.

The PLS Regression algorithm does not perform any warping of relationships. It is essentially a standard PLS regression algorithm, whereby indicators’ weights, loadings and factor scores (a.k.a. latent variable scores) are calculated based on a least squares minimization sub-algorithm, after which path coefficients are estimated using a robust path analysis algorithm. A key criterion for the calculation of the weights, observed in virtually all PLS-based algorithms, is that the regression equation expressing the relationship between the indicators and the factor scores has an error term that equals zero. In other words, the factor scores are calculated as exact linear combinations of their indicators. PLS regression is the underlying weight calculation algorithm used in both Warp3 and Warp2 PLS Regression. The warping takes place during the estimation of path coefficients, and after the estimation of all weights and loadings in the model. The weights and loadings of a model with latent variables make up what is often referred to as outer model, whereas the path coefficients among latent variables make up what is often called the inner model.

Finally, the Robust Path Analysis algorithm is a simplified algorithm in which factor scores are calculated by averaging all of the indicators associated with a latent variable; that is, in this algorithm weights are not estimated through PLS regression. This algorithm is called “Robust” Path Analysis, because, as with most robust statistics methods, the P values are calculated through resampling. If all latent variables are measured with single indicators, the Robust Path Analysis and the PLS Regression algorithms will yield identical results.

One of two resampling methods may be selected: bootstrapping or jackknifing. Bootstrapping, the software’s default, is a resampling algorithm that creates a number of resamples (a number that can be selected by the user), by a method known as “resampling with replacement”. This means that each resample contains a random arrangement of the rows of the original dataset, where some rows may be repeated. (The commonly used analogy of a deck of cards being reshuffled, leading to many resample decks, is a good one, but not entirely correct because in bootstrapping the same card may appear more than once in each of the resample decks.) Jacknifing, on the other hand, creates a number of resamples that equals the original sample size, and each resample has one row removed. That is, the sample size of each resample is the original sample size minus 1. Thus, the choice of number of resamples has no effect on jackknifing, and is only relevant in the context of bootstrapping.