Links to specific topics

Sunday, February 14, 2010

How do I control for the effects of one or more demographic variables in an SEM analysis?


As part of an SEM analysis using WarpPLS, a researcher may want to control for the effects of one ore more variables. This is typically the case with what are called “demographic variables”, or variables that measure attributes of a given unit of analysis that are (usually) not expected to influence the results of the SEM analysis.

For example, let us assume that one wants to assess the effect of a technology, whose intensity of use is measured by a latent variable T, on a behavioral variable measured by B. The unit of analysis for B is the individual user; that is, each row in the dataset refers to an individual user of the technology. The researcher hypothesizes that the association between T and B is significant, so a direct link between T and B is included in the model.

If the researcher wants to control for age (A) and gender (G), which have also been collected for each individual, in relation to B, all that is needed is to include the variables A and G in the model, with direct links pointing at B. No hypotheses are made. For that to work, gender (G) has to be included in the dataset as a numeric variable. For example, the gender "male" may be replaced with 1 and "female" with 2, in which case the variable G will essentially measure the "degree of femaleness" of each individual. Sounds odd, but works.

After the analysis is conducted, let us assume that the path coefficient between T and B is found to be statistically significant, with the variables A and G included in the model as described above. In this case, the researcher can say that the association between T and B is significant, “regardless of A and G” or “when the effects of A and G are controlled for”.

In other words, the technology (T) affects behavior (B) in the hypothesized way regardless of age (A) and gender (B). This conclusion would remain the same whether the path coefficients between A and/or G and B were significant, because the focus of the analysis is on B, the main dependent variable of the model.

The discussion above is expanded in the publication below, which also contains a graphical representation of a model including control variables.

Kock, N. (2011). Using WarpPLS in e-collaboration studies: Mediating effects, control and second order variables, and algorithm choices. International Journal of e-Collaboration, 7(3), 1-13.

http://www.scriptwarp.com/warppls/pubs/Kock_2011_IJeC_WarpPLSEcollab3.pdf

Some special considerations and related analysis decisions usually have to be made in more complex models, with multiple endogenous variables, and also regarding the fit indices.

35 comments:

Anonymous said...

Hi Professor Kock,

Thanks for the post. I would like to ask you the following, please?

1) Even the path coefficients between A and/or G and B, were not significant, we can still conclude that: "The technology (T) affects behavior (b) in the hypothesized way regardless of age (A) and gender (B)"?

2) Is it possible for you to discuss the special considerations and related analysis decisions in more complex models, with multiple endogenous variables, please?

3) Do you have some papers related to that issue, please?

Thanks so much

Ana Sánchez

Anonymous said...

Hi Professor Kock,

Thanks for the post. I would like to ask you the following, please?

1) Even the path coefficients between A and/or G and B, were not significant, we can still conclude that: "The technology (T) affects behavior (b) in the hypothesized way regardless of age (A) and gender (B)"?

2) Is it possible for you to discuss the special considerations and related analysis decisions in more complex models, with multiple endogenous variables, please?

3) Do you have some papers related to that issue, please?

Thanks so much

Ana Sánchez

Ned Kock said...

Hi Ana.

Re. 1), yes you can conclude that, even if those paths are insignificant. The important consideration is really the effect of those control paths on the other paths.

On 2), I need to prepare another post. One thing to bear in mind is that, with multiple endogenous LVs, you may want to add controls to all of them. This will, in turn, artificially reduce you APC (model fit index), even thought your ARS (another model fit index) will most certainly go up. Anyway, I need another post to go into a bit more detail about the choices you can make.

On 3), the paper below controls for several variables using this approach, but it uses PLS-Graph:

Kock, N., Chatelain-Jardón, R. and Carmona, J. (2008), An Experimental Study of Simulated Web-based Threats and Their Impact on Knowledge Communication Effectiveness, IEEE Transactions on Professional Communication, V.51, No.2, pp. 183-197.

http://cits.tamiu.edu/kock/pubs/journals/2008JournalIEEETPC3/Kock_etal_2008_IEEETPC.pdf

卡人加油 said...

Hi Professor:

The theoretical part is quite useful, but I wonder how to implement it in the SEM. Just simply draw an arrow between the latent variable (T) and the observed variable (Age)? They are two concepts (construct and item) at different level, so I worry if we could do that?

Thanks.

best,
Karen

Ned Kock said...

Hi Karen. That’s what multiple regression software tools that allow you to enter control variables do. They control for their effects by treating them as independent variables. That is usually done is a “hidden” way; not explicitly by the user.

In WarpPLS you simply do it explicitly.

Nearly all behavioral analysis techniques are special cases of structural equation modeling (SEM), which is (i.e., SEM) what WarpPLS implements in a nonlinear and variance-based fashion. Those techniques include path analysis, multiple regression, and ANCOVA; all of which allow for control variables to be added to a model.

Ellen Woods said...

Hi Professor Kock,

I am doing a SEM model. I have three control variables. I am not sure when should I add the control variables in the model. Should I add them at the beginning of initial model? Or, should I add them after I identify a final model without control variables first and see how the control variables affect the paths in the final model?

Thank you.

Ellen Woods

Ned Kock said...

Hi Ellen.

Typically you would do that in Step 4, before Step 5 (the analysis) is completed.

Anonymous said...

Hi Professor Kock

I would really appreciate if you could help me with following questions:
1-For the first time I want to test a model related to customer satisfaction by PLS. This model has many interdependent variables which finally affect customer satisfaction. Do I have to add control variables (socio-demographics) for all variables in my model or the control should merely be add for the customer satisfaction as the main dependent variable?
2-Is it possible to compare satisfaction among multiple groups by PLS? Or the ANOVA test should be done by another software?

Thanks

Ned Kock said...
This comment has been removed by the author.
Ned Kock said...

Hi Anon.

A more strict view of inclusion of control variables is that only real confounders should be included:

http://en.wikipedia.org/wiki/Confounder

A less strict view is that you should include control variables that you want to exclude as potential confounders, even if they are not real confounders.

For example, you may want to control for age, so that you can state that your findings hold regardless of age.

Multiple group analyses can be conducted using the multi-group analysis spreadsheet under resources on warppls.com.

K said...

Hi Professor Kock,
Thanks for the post. I, too, am interested in controlling for a variable in a more complex reflective model. Do you have a post related to this?
Cheers,
Kim

Ned Kock said...

Hi K. Can you describe your scenario?

Husameddin Dawoud said...

Dear Prof. Ned, Thanks a lot for the explanation.

I tried to make a diagram based in your example for possible scenarios, I think we have only 5 scenarios as shown in this diagram. The interpretation shown in the bottom boxes.
https://dl.dropboxusercontent.com/u/38977872/control.jpg

My questions are:
1-Let's say we have scenario (1). should we try to delete one of the control variables and check whether the relationship between IV and DV will change accordingly? or just go directly to the conclusion?
2- should we delete insignificant control variables in further stages of analysis? If we do so, other relationships in the model will have some changes.
Thanks in advance,
Hosam

Ned Kock said...
This comment has been removed by the author.
Ned Kock said...

Hi Hosam. Regarding control variables, you may find the article linked to the following WarpPLS blog post useful:

http://warppls.blogspot.com/2011/08/using-warppls-in-e-collaboration.html

I should note that the interpretation of individual Simpson’s paradox instances can be difficult with demographic variables when these are included in the model as control variables, suggesting what may appear to be unlikely or impossible reverse directions of causality. For example, let us say that a negative path-correlation sign occurs when we include the control variable “Age” (time from birth, measured in years) into a model pointing at the variable “Job performance” (self-assessed, measured through multiple indicators on Likert-type scales). This may be interpreted as suggesting that “Job performance” causes “Age” in the sense that increased job performance causes someone to age, or causes time to pass faster.

Anonymous said...

Thank you doctor for the post.
Actually, I would like to ask you, do we do the same way through SmartPLS? In other words, is the same way control variables are handled in SmartPLS?

Thank you

Abdulkarim Kanaan Jebna said...

Adding to the previous post, apart adding control variables is the same way in SmartPLS, I have added the control variable. I noticed that R2 has increased dramatically. In this case, can I say that the IV the has an impact on DV by this new R2 value??

Thank you

Ned Kock said...

I restrict my answers to WarpPLS, as there are others with expertise in SmartPLS that are better positioned to answer questions regarding that software.

Those would probably be found in SmartPLS-related forums, not this one. This blog is, as the name implies, focused on WarpPLS - the name of this blog is "WarpPLS".

Anonymous said...

Dear professor Kock,

What exactly do you mean with degree of femaleness"?

Regards Hayden

Ned Kock said...

Hi Hayden. The variation of G, going from 1 or 2.

Anonymous said...

Hi professor Kock,

From my analyses: the path coefficient from GENDER -> Attachment = 0.0143..

Hence the effect size of gender is 0.0413. But can I say that female are more attached than men then?

Thank you so much!!

Ned Kock said...

The path coefficient for the link seems too low to be significant. What is the path-correlation ratio for the link?

Anonymous said...

Hi professor Kock,

I don't have a path-correlation ratio.. But if it is below 0.2,it would be insignificant. If the case is GENDER -> Attachment = 0.3..

Would it then be possible to say that female are more attached than men then?

Thank you so much!

Anonymous said...

Hi prof Kock! How do I treat gender (a moderator ): reflective or formative?

Ned Kock said...

Gender is not usually measured through multiple indicators.

Anonymous said...

Thanks prof for the prompt response. How then should I use gender as a moderator in WarpPLS?

Ned Kock said...

Hi Anon. The following links may be useful:

http://youtu.be/d8j-OOPHMFk

http://warppls.blogspot.com/2010/02/how-do-i-control-for-effects-of-one-or.html

Anonymous said...

Hi

The new SmartPLS 3 software (http://www.smartpls.com) offers extended modeling options of moderator variables.

Do you know how the orthogonalizing approach works? Is it advantageous compared to the product-indicator and two-stage approach.

Thanks and kind regards,
Tom

Ned Kock said...

The orthog. approach (OA) entails re-scaling scores so that the moderating variable is uncorrelated with the variables involved in the moderation. This is a bit problematic because scores are more often than not correlated, and those correlations are accounted for by multivariate adjustments.

With WarpPLS, the product-indicator and the two-stage approaches can be implemented, both of which allow for correlations and perform multivariate adjustments.

Please note that the new factor-based algorithms can make a difference in the estimation of moderating effects. See the video linked below.

http://youtu.be/PvXuD5COezU

A new feature planned for the future is a growth analysis, which should help in the identification of moderating effects.

pedro alexandre Martins said...

Dear Professor Kock,

Thank you for your explanations!

I am performing a path analysis to analyse Intention to Innovate regarding some continous variables, but gender is also one of the dependent variables (it is not a control variable, as the theory behind it takes gender into consideration).
Is it the case of including gender as a numerical variable too?
I could not find much on this subject in the literature.

Thank you!
Best regards, Pedro

Ned Kock said...

Hi Pedro. To be included in WarpPLS analyses, categorical variables must be converted to numeric. On a different note, I wonder how gender can be an endogenous variable. Do you mean to say that, in your model, gender is caused by one or more variables? If your model is not a biological model of gender development, or something like that, it is difficult for me to see the possibility of gender being an effect.

pedro alexandre Martins said...

Dear Kock,
I meant that gender is also one of the variables, but it is indeed a independent variable (exogenous) in the model, not a dependent, as I said earlier.
Nevertheless, it must still be converted to numeric, right? And is it enought to use the central limit teorem to justify that gender can be seen as continous, once my sample is over 1000 observations?
Thank you!

Ned Kock said...

Hi Pedro, again. A categorical variable would have to be converted to numeric to be included in WarpPLS analyses. I am not sure I understand your other question.

pedro alexandre Martins said...

Hi again!
My question is how to justify that the conversion from categorical to numeric variable can be made.
Should the justification be made with the Central Limit Theory?

Ned Kock said...

Usually when one converts cat. to num. the resulting num. variable is non-normal, sometimes severely non-normal. This can be checked through the normality test outputs provided by WarpPLS. The justification for the use of WarpPLS in these contexts is that it performs well with non-normal data, including severely non-normal data:

http://warppls.blogspot.com/2016/06/pls-sem-performance-with-non-normal-data.html