Links to specific topics

(See also under "Labels" at the bottom-left area of this blog)
[ Welcome post ] [ Installation issues ] [ WarpPLS.com ] [ Posts with YouTube links ] [ Model-driven data analytics ] [ PLS-SEM email list ]

Friday, October 21, 2022

Model with endogenous dichotomous variable


There are two main ways in which a model with an endogenous dichotomous variable can be analyzed in PLS-SEM - via the logistic regression variables technique, and via the conditional probabilistic queries technique.

Logistic regression variables technique

Starting in version 8.0 of WarpPLS, the menu option “Explore logistic regression” allows you to create a logistic regression variable as a new indicator that has both unstandardized and standardized values. Logistic regression is normally used to convert an endogenous variable on a non-ratio scale (e.g., dichotomous) into a variable reflecting probabilities. You need to choose the variable to be converted, which should be an endogenous variable, and its predictors. 

The new logistic regression variable is meant to be used as a replacement for the endogenous variable on which it is based. Two algorithms are available: probit and logit. The former is recommended for dichotomous variables; the latter for non-ratio variables where the number of different values (a.k.a. “distinct observations”) is greater than 2 but still significantly smaller than the sample size; e.g., 10 different values over a sample size of 100. The unstandardized values of a logistic regression variable are probabilities; going from 0 to 1. 

Since a logistic regression variable can be severely collinear with its predictors, you can set a local full collinearity VIF cap for the logistic regression variable. Predictor-criterion collinearity, or lateral collinearity, is rarely assessed or controlled in classic logistic regression algorithms. 

For more on this topic, see the links below.

Explore Logistic Regression in WarpPLS

Using Logistic Regression in PLS-SEM with Composites and Factors

Conditional probabilistic queries technique

How do we interpret the results of a model with an endogenous dichotomous variable, using the conditional probabilistic queries technique? Let us use the model below to illustrate the answer to this question. In this model we have one endogenous dichotomous variable “Success” that is significantly caused in a direct way by two predictors: “Projmgt” and “JSat”. The direct effect of a third predictor, namely "ECollab", is relatively small and borderline significant.



Let us assume that the unit of analysis is a team of people. The variable “Success” is coded as 0 or 1, meaning that a team is either successful or not. After standardization, the 0 and 1 will be converted into a negative and a positive number. The standardized version of the variable “Success” will have a mean of zero and a standard deviation of 1.

One way to interpret the results is the following. The probability that a team will be successful (i.e., that “Success” > 0) is significantly affected by increases in the variables “Projmgt” and “JSat”.

WarpPLS users are able, starting in version 6.0, to calculate conditional probabilities as shown below, without having to resort to transformations based on assumed underlying functions, such as those performed by logistic regression. In this screen shot, only latent variables are used, and they are all assumed to be standardized.



In the screen shot above, we can see that the probability that a team will be successful (i.e., that “Success” > 0), if “Projmgt” > 1 and “JSat” > 1, is 52.2 percent. Stated differently, if “Projmgt” and “JSat” are high (greater than 1 standard deviation above the mean), then the probability of success is slightly greater than chance.

A probability of 52.2 percent is not that high. The reason why it is not higher, in the context of the conditional probabilistic query above, is that we are not including the variable "ECollab" in the mix. Still, it does not seem like “Projmgt” and “JSat” being high are sufficient conditions for success, although they may be necessary conditions.

Consider a different set of conditional probabilities. If a team is successful (i.e., if “Success” > 0), what is the probability that “Projmgt” and “JSat” are low for that team. The answer, shown in the screen below, is 1.3 percent. That is a very low probability, suggesting that “Projmgt” and “JSat” matter as necessary but not sufficient elements for success.



These are among the conditional probabilistic queries that users are able to make starting in version 6.0 of WarpPLS. Bayes’ theorem is used to produce the answers to the queries.

3 comments:

Unknown said...

This is great, I look forward to using version 6!!!

Anonymous said...

Dear Prof Kock
I hope you are well.
I am about to submit a paper using binary variables with Warppls. I am tempted to report probabilistic queries but since this is the first time I use them, I would like to check with you first whether I am doing it right.

In short, my model looks at R&Din (Yes/No), R&Dout (Yes/No) --> Product innovation (Yes/No) and process innovation (Yes/No) --> Export intensity (percentage).

For the first path, I have the below outcome

******************************************
* Probabilistic query results *
******************************************

The absolute probability that (top expression):
* ui:PrcInno > 0
Is:
* 0.029 (2.9 percent)

The absolute probability that (bottom expression):
* ui:R&Din > 0 & ui:R&Dout > 0
Is:
* 0.058 (5.8 percent)

The absolute probability that (top expression):
* ui:PrcInno > 0
And (bottom expression):
* ui:R&Din > 0 & ui:R&Dout > 0
Is:
* 0.007 (0.7 percent)

The conditional probability that:
* ui:PrcInno > 0
If:
* ui:R&Din > 0 & ui:R&Dout > 0
Is:
0.115 (11.5 percent)

Am I correct to say that, the probability of a company to have process innovation will increase by 11.5% when they they implement both R&Din and R&Dout?


For the second path, since export intensity is continuous, I did the following


******************************************
* Probabilistic query results *
******************************************

The absolute probability that (top expression):
* ui:Direct_Exports > 0
Is:
* 0.197 (19.7 percent)

The absolute probability that (bottom expression):
* ui:PdtInno > 0 & ui:PrcInno > 0
Is:
* 0.004 (0.4 percent)

The absolute probability that (top expression):
* ui:Direct_Exports > 0
And (bottom expression):
* ui:PdtInno > 0 & ui:PrcInno > 0
Is:
* 0.002 (0.2 percent)

The conditional probability that:
* ui:Direct_Exports > 0
If:
* ui:PdtInno > 0 & ui:PrcInno > 0
Is:
0.500 (50.0 percent)


Am I right to say that companies' probability to export (have sales exceeding 0%) will improve by 52.9% if they have product and process innovation?

Thank you so much for your help!
Mohamed

Ned Kock said...

Hi Mohamed.

The correct interpretation would be that if ui:PdtInno > 0 & ui:PrcInno > 0, then ui:Direct_Exports > 0 with a probability of 50%.

This does not mean much, since a 50% probability is essentially a "coin toss".

You may want to use std. values, focusing on > 1 for predictors - this would be well above average. Or, used ustd. values, but focus on higher values than simply > 0.