Links to specific topics

(See also under "Labels" at the bottom-left area of this blog)
[ Welcome post ] [ Installation issues ] [ WarpPLS.com ] [ Posts with YouTube links ] [ Model-driven data analytics ] [ PLS-SEM email list ]

Thursday, October 5, 2023

Using logistic regression in PLS-SEM: Dichotomous endogenous variables


The article below discusses how one can use logistic regression with the probit approach, to avoid the problems associated with having dichotomous endogenous variables, in the context of structural equation modeling via partial least squares (PLS-SEM).

Kock, N. (2023). Using logistic regression in PLS-SEM: Dichotomous endogenous variables. Data Analysis Perspectives Journal, 4(4), 1-6.

Link to full-text file for this and other DAPJ articles:

https://scriptwarp.com/dapj/#Published_Articles

Abstract:

A dichotomous endogenous variable would be impossible to occur at the population level, which an empirical sample is assumed to represent, because the structural error term associated with the endogenous variable is expected to be a random variable with many distinct values. Consequently, the endogenous variable is also expected to have many distinct values. This paper discusses how to address this problem, using logistic regression with the probit approach, in the context of structural equation modeling via partial least squares (PLS-SEM). Our discussion is based on an illustrative model analyzed with the software WarpPLS.

Best regards to all!

7 comments:

Zinnia said...

Dear Professor Kock,

I recently discovered WarpPLS while searching for a method to analyze binary variables in my model. Thank you for providing such a promising tool.
As this is my first time using it, I want to ensure that my model is compatible with WarpPLS.

My structural equation model is as follows:
DX (exogenous variable) → BI (mediation variable 1) → IP (dependent variable)
↘ ↓ ↗
↘ PI (mediation variable 2) ↗


DX: 5 measured variables - dx1, dx2, dx3, dx4, dx5 (each measured on a 5-point Likert scale)
BI: 7 measured variables - bi1, bi2, bi3, bi4, bi5, bi6, bi7 (each yes/no binary)
PI: 2 measured variables - pi1, pi2 (each yes/no binary)
IP: 3 measured variables - ip1, ip2, ip3 (ip1 is a ratio scale, and ip2 & ip3 are yes/no binary)


My questions are:
1. Is it possible to analyze the structural equation model of the endogenous variable(BI) with 7 measured variables(each yes/no binary) and the endogenous variable(PI) with 2 measured variables(each yes/no binary) using WarpPLS as described in the model above?
2. Can WarpPLS analyze a dependent variable that has 3 measured variables of different natures? (ip1 is a ratio scale, and ip2 and ip3 are binary variables)
3. If my model can be analyzed with WarpPLS, can I also test the mediating effects of BI and PI?

I'm currently living in South Korea. I would like to incorporate WarpPLS into my research to promote its usage in my country.
Thank you so much for your assistance!

Best regards,
Zinnia

Ned Kock said...

Hi Zinnia. Yes, I believe so. You may want to review the materials here:

warppls.com

Zinnia said...

Dear Professor Kock,

I have a question about the logistic transformation of endogenous variables with multiple dichotomous measures.

For example, consider a model like A -> B -> C :
- Measured variables B1 and B2 in B(latent variable) are both dichotomous(0,1)
- Measured variables C1, C2, and C3 in C(latent variable) are all dichotomous(0,1)

If I create B and C as new logistic regression variables in the above model, in the "Explore logistic regression" window, under the "Variables to be converted" section, can I just put the latent variables B and C, respectively? Or do I need to create two new variables (Ir_B2, IrB2) for B and three new variables (Ir_C1, Ir_C2, Ir_C3) for C, using the measured variables B1, B2, C1, C2, C3, respectively, and assign them as new measured variables for the latent variables B and C?

Thanks, as always, for your help.

Best regards,
Zinnia

Ned Kock said...

What I recommend is that you use the original indicators for B and C to generate LVs using one of the factor-based algorithms (e.g., REG2). Next save B and C as new LVs, and use these in a new model. Then obtain a lr_C, and use it in a third model. Here you should use logit, because C will not be dichotomous (although it will have fewer distinct values than it should). I don't recommend doing the same with a lr_B, because you would then have a model with probabilities causing probabilities - unless theory tells you, for some reason, to have such a relationship in the model.

Josh said...

Dear Ned,
I have been trying to understand the process of conversion from a binary variable to a logistic regression variable. I have read your documentation but would like some clarification if that is okay. It states that the predictors of the variable converted should be defined as the direct and indirectly associated latent variables (LV) in the model. Does the probit model calculate scores for the exogenous LVs using traditional PLS-PM before creating the probability variable? Or are all indicators of the LV considered as regular predictors in the probit model?
I hope my question makes sense!
Many thanks,
Josh

Ned Kock said...

Hi Josh. The probit option creates a logistic regression variable as a new indicator that has both unstandardized and standardized values. The unstandardized values of the logistic regression variable are probabilities; going from 0 to 1. They are calculated based on the exogenous and endogenous variable that the logistic regression variable is meant to replace. These are obtained via one of the outer model algorithms, which include factor-based PLS-SEM algorithms (and not only PLS-PM algorithms).

Ned Kock said...

Btw, you can use indicators to create a logistic regression variable as well. That could be used as an indicator in an LV. Having said that, most applications that I've seen use exogenous LVs and one endogenous LV to create a logistic regression variable that is used as a replacement for the endogenous LV.