Links to specific topics

(See also under "Labels" at the bottom-left area of this blog)
[ Welcome post ] [ Installation issues ] [ WarpPLS.com ] [ Posts with YouTube links ] [ Model-driven data analytics ] [ PLS-SEM email list ]

Sunday, February 14, 2010

How do I control for the effects of one or more demographic variables in an SEM analysis?


As part of an SEM analysis using WarpPLS, a researcher may want to control for the effects of one ore more variables. This is typically the case with what are called “demographic variables”, or variables that measure attributes of a given unit of analysis that are (usually) not expected to influence the results of the SEM analysis.

For example, let us assume that one wants to assess the effect of a technology, whose intensity of use is measured by a latent variable T, on a behavioral variable measured by B. The unit of analysis for B is the individual user; that is, each row in the dataset refers to an individual user of the technology. The researcher hypothesizes that the association between T and B is significant, so a direct link between T and B is included in the model.

If the researcher wants to control for age (A) and gender (G), which have also been collected for each individual, in relation to B, all that is needed is to include the variables A and G in the model, with direct links pointing at B. No hypotheses are made. For that to work, gender (G) has to be included in the dataset as a numeric variable. For example, the gender "male" may be replaced with 1 and "female" with 2, in which case the variable G will essentially measure the "degree of femaleness" of each individual. Sounds odd, but works.

After the analysis is conducted, let us assume that the path coefficient between T and B is found to be statistically significant, with the variables A and G included in the model as described above. In this case, the researcher can say that the association between T and B is significant, “regardless of A and G” or “when the effects of A and G are controlled for”.

In other words, the technology (T) affects behavior (B) in the hypothesized way regardless of age (A) and gender (B). This conclusion would remain the same whether the path coefficients between A and/or G and B were significant, because the focus of the analysis is on B, the main dependent variable of the model.

The discussion above is expanded in the publication below, which also contains a graphical representation of a model including control variables.

Kock, N. (2011). Using WarpPLS in e-collaboration studies: Mediating effects, control and second order variables, and algorithm choices. International Journal of e-Collaboration, 7(3), 1-13.

http://www.scriptwarp.com/warppls/pubs/Kock_2011_IJeC_WarpPLSEcollab3.pdf

Some special considerations and related analysis decisions usually have to be made in more complex models, with multiple endogenous variables, and also regarding the fit indices.

42 comments:

Anonymous said...

Hi Professor Kock,

Thanks for the post. I would like to ask you the following, please?

1) Even the path coefficients between A and/or G and B, were not significant, we can still conclude that: "The technology (T) affects behavior (b) in the hypothesized way regardless of age (A) and gender (B)"?

2) Is it possible for you to discuss the special considerations and related analysis decisions in more complex models, with multiple endogenous variables, please?

3) Do you have some papers related to that issue, please?

Thanks so much

Ana Sánchez

Anonymous said...

Hi Professor Kock,

Thanks for the post. I would like to ask you the following, please?

1) Even the path coefficients between A and/or G and B, were not significant, we can still conclude that: "The technology (T) affects behavior (b) in the hypothesized way regardless of age (A) and gender (B)"?

2) Is it possible for you to discuss the special considerations and related analysis decisions in more complex models, with multiple endogenous variables, please?

3) Do you have some papers related to that issue, please?

Thanks so much

Ana Sánchez

Ned Kock said...

Hi Ana.

Re. 1), yes you can conclude that, even if those paths are insignificant. The important consideration is really the effect of those control paths on the other paths.

On 2), I need to prepare another post. One thing to bear in mind is that, with multiple endogenous LVs, you may want to add controls to all of them. This will, in turn, artificially reduce you APC (model fit index), even thought your ARS (another model fit index) will most certainly go up. Anyway, I need another post to go into a bit more detail about the choices you can make.

On 3), the paper below controls for several variables using this approach, but it uses PLS-Graph:

Kock, N., Chatelain-Jardón, R. and Carmona, J. (2008), An Experimental Study of Simulated Web-based Threats and Their Impact on Knowledge Communication Effectiveness, IEEE Transactions on Professional Communication, V.51, No.2, pp. 183-197.

http://cits.tamiu.edu/kock/pubs/journals/2008JournalIEEETPC3/Kock_etal_2008_IEEETPC.pdf

Karen's Ivory Tower said...

Hi Professor:

The theoretical part is quite useful, but I wonder how to implement it in the SEM. Just simply draw an arrow between the latent variable (T) and the observed variable (Age)? They are two concepts (construct and item) at different level, so I worry if we could do that?

Thanks.

best,
Karen

Ned Kock said...

Hi Karen. That’s what multiple regression software tools that allow you to enter control variables do. They control for their effects by treating them as independent variables. That is usually done is a “hidden” way; not explicitly by the user.

In WarpPLS you simply do it explicitly.

Nearly all behavioral analysis techniques are special cases of structural equation modeling (SEM), which is (i.e., SEM) what WarpPLS implements in a nonlinear and variance-based fashion. Those techniques include path analysis, multiple regression, and ANCOVA; all of which allow for control variables to be added to a model.

Ellen Woods said...

Hi Professor Kock,

I am doing a SEM model. I have three control variables. I am not sure when should I add the control variables in the model. Should I add them at the beginning of initial model? Or, should I add them after I identify a final model without control variables first and see how the control variables affect the paths in the final model?

Thank you.

Ellen Woods

Ned Kock said...

Hi Ellen.

Typically you would do that in Step 4, before Step 5 (the analysis) is completed.

Anonymous said...

Hi Professor Kock

I would really appreciate if you could help me with following questions:
1-For the first time I want to test a model related to customer satisfaction by PLS. This model has many interdependent variables which finally affect customer satisfaction. Do I have to add control variables (socio-demographics) for all variables in my model or the control should merely be add for the customer satisfaction as the main dependent variable?
2-Is it possible to compare satisfaction among multiple groups by PLS? Or the ANOVA test should be done by another software?

Thanks

Ned Kock said...
This comment has been removed by the author.
Ned Kock said...

Hi Anon.

A more strict view of inclusion of control variables is that only real confounders should be included:

http://en.wikipedia.org/wiki/Confounder

A less strict view is that you should include control variables that you want to exclude as potential confounders, even if they are not real confounders.

For example, you may want to control for age, so that you can state that your findings hold regardless of age.

Multiple group analyses can be conducted using the multi-group analysis spreadsheet under resources on warppls.com.

K said...

Hi Professor Kock,
Thanks for the post. I, too, am interested in controlling for a variable in a more complex reflective model. Do you have a post related to this?
Cheers,
Kim

Ned Kock said...

Hi K. Can you describe your scenario?

Unknown said...

Dear Prof. Ned, Thanks a lot for the explanation.

I tried to make a diagram based in your example for possible scenarios, I think we have only 5 scenarios as shown in this diagram. The interpretation shown in the bottom boxes.
https://dl.dropboxusercontent.com/u/38977872/control.jpg

My questions are:
1-Let's say we have scenario (1). should we try to delete one of the control variables and check whether the relationship between IV and DV will change accordingly? or just go directly to the conclusion?
2- should we delete insignificant control variables in further stages of analysis? If we do so, other relationships in the model will have some changes.
Thanks in advance,
Hosam

Ned Kock said...
This comment has been removed by the author.
Ned Kock said...

Hi Hosam. Regarding control variables, you may find the article linked to the following WarpPLS blog post useful:

http://warppls.blogspot.com/2011/08/using-warppls-in-e-collaboration.html

I should note that the interpretation of individual Simpson’s paradox instances can be difficult with demographic variables when these are included in the model as control variables, suggesting what may appear to be unlikely or impossible reverse directions of causality. For example, let us say that a negative path-correlation sign occurs when we include the control variable “Age” (time from birth, measured in years) into a model pointing at the variable “Job performance” (self-assessed, measured through multiple indicators on Likert-type scales). This may be interpreted as suggesting that “Job performance” causes “Age” in the sense that increased job performance causes someone to age, or causes time to pass faster.

Anonymous said...

Thank you doctor for the post.
Actually, I would like to ask you, do we do the same way through SmartPLS? In other words, is the same way control variables are handled in SmartPLS?

Thank you

Abdulkarim Kanaan Jebna said...

Adding to the previous post, apart adding control variables is the same way in SmartPLS, I have added the control variable. I noticed that R2 has increased dramatically. In this case, can I say that the IV the has an impact on DV by this new R2 value??

Thank you

Ned Kock said...

I restrict my answers to WarpPLS, as there are others with expertise in SmartPLS that are better positioned to answer questions regarding that software.

Those would probably be found in SmartPLS-related forums, not this one. This blog is, as the name implies, focused on WarpPLS - the name of this blog is "WarpPLS".

Anonymous said...

Dear professor Kock,

What exactly do you mean with degree of femaleness"?

Regards Hayden

Ned Kock said...

Hi Hayden. The variation of G, going from 1 or 2.

Anonymous said...

Hi professor Kock,

From my analyses: the path coefficient from GENDER -> Attachment = 0.0143..

Hence the effect size of gender is 0.0413. But can I say that female are more attached than men then?

Thank you so much!!

Ned Kock said...

The path coefficient for the link seems too low to be significant. What is the path-correlation ratio for the link?

Anonymous said...

Hi professor Kock,

I don't have a path-correlation ratio.. But if it is below 0.2,it would be insignificant. If the case is GENDER -> Attachment = 0.3..

Would it then be possible to say that female are more attached than men then?

Thank you so much!

Anonymous said...

Hi prof Kock! How do I treat gender (a moderator ): reflective or formative?

Ned Kock said...

Gender is not usually measured through multiple indicators.

Anonymous said...

Thanks prof for the prompt response. How then should I use gender as a moderator in WarpPLS?

Ned Kock said...

Hi Anon. The following links may be useful:

http://youtu.be/d8j-OOPHMFk

http://warppls.blogspot.com/2010/02/how-do-i-control-for-effects-of-one-or.html

Anonymous said...

Hi

The new SmartPLS 3 software (http://www.smartpls.com) offers extended modeling options of moderator variables.

Do you know how the orthogonalizing approach works? Is it advantageous compared to the product-indicator and two-stage approach.

Thanks and kind regards,
Tom

Ned Kock said...

The orthog. approach (OA) entails re-scaling scores so that the moderating variable is uncorrelated with the variables involved in the moderation. This is a bit problematic because scores are more often than not correlated, and those correlations are accounted for by multivariate adjustments.

With WarpPLS, the product-indicator and the two-stage approaches can be implemented, both of which allow for correlations and perform multivariate adjustments.

Please note that the new factor-based algorithms can make a difference in the estimation of moderating effects. See the video linked below.

http://youtu.be/PvXuD5COezU

A new feature planned for the future is a growth analysis, which should help in the identification of moderating effects.

Unknown said...

Dear Professor Kock,

Thank you for your explanations!

I am performing a path analysis to analyse Intention to Innovate regarding some continous variables, but gender is also one of the dependent variables (it is not a control variable, as the theory behind it takes gender into consideration).
Is it the case of including gender as a numerical variable too?
I could not find much on this subject in the literature.

Thank you!
Best regards, Pedro

Ned Kock said...

Hi Pedro. To be included in WarpPLS analyses, categorical variables must be converted to numeric. On a different note, I wonder how gender can be an endogenous variable. Do you mean to say that, in your model, gender is caused by one or more variables? If your model is not a biological model of gender development, or something like that, it is difficult for me to see the possibility of gender being an effect.

Unknown said...

Dear Kock,
I meant that gender is also one of the variables, but it is indeed a independent variable (exogenous) in the model, not a dependent, as I said earlier.
Nevertheless, it must still be converted to numeric, right? And is it enought to use the central limit teorem to justify that gender can be seen as continous, once my sample is over 1000 observations?
Thank you!

Ned Kock said...

Hi Pedro, again. A categorical variable would have to be converted to numeric to be included in WarpPLS analyses. I am not sure I understand your other question.

Unknown said...

Hi again!
My question is how to justify that the conversion from categorical to numeric variable can be made.
Should the justification be made with the Central Limit Theory?

Ned Kock said...

Usually when one converts cat. to num. the resulting num. variable is non-normal, sometimes severely non-normal. This can be checked through the normality test outputs provided by WarpPLS. The justification for the use of WarpPLS in these contexts is that it performs well with non-normal data, including severely non-normal data:

http://warppls.blogspot.com/2016/06/pls-sem-performance-with-non-normal-data.html

Sophie said...

Hello sir Kock,

I have a question about conducting a multi-group analysis. I have an indepenten variable that is divided into two groups. Group 0 = respondents that have seen video 1 and group 1 = respondents that have seen video 2. My dependent variables are 1) affective assessment, 2) cognitive assessment and 3) intention to take action. I also have a possible moderator for these three relationships: social group involvement. Now, I would like to test all these relationships, but I would like to compare these relationships for group 1 and group 2. Because I expect a difference in the assessments of these groups. How do i do this? I already added two columns in my Excel file where I gave all the respondents that saw video 1 a '1' en the respondents who did not a '0'. I did the same for respondents who saw video 2, in te second column. But when I did the SEM-analysis (first for video 1 and then for video 2) my two models appear to be almost exactly the same? The only difference is that some relationships have a negative number in the first analysis and a positive in the second (or vice versa). But the effect sizes and p-values are all the same? Please help me. I can not find a YouTube video who explains this well. Hope to hear from you soon!

Ned Kock said...

Hi Sophie. Have you considered using cat-2-num conversion?

😊 said...

Hi Professor Ned,

Nice to know about you.

I'm Jessica from Indonesia, just now I also add your LinkedIn.
I'm very enthusiastic about the articles that you share.

I'm currently writing my thesis (target to pass this year) and studying data analysis techniques using WarpPLS.

Regarding to your article, what method of analysis is appropriate to use in WarpPLS 7.0 if my research is moderated using demographic variables (age, gender, education, marital status)?

or, is it better to make each of them like this?
- Age: 1 = 18-25 y.o; 2 = 26-35 y.o; 3 = 36-45 y.o; 4 = 46-55 y.o.
- Gender: 1 = Male; 2 = Female.
- Education: 1 = SeniorHighSchool; 2 = Diploma; 3 = Bachelor'sDegree; 4 = Master'sDegree.
- MaritalStatus: 1 = Single; 2 = Married.

Please advice.
Hope to hear from you soon.

Thank you.

Warm Regards,
Jessica Angela

Ned Kock said...

Hi Jessica.

Thanks for your kind words.

Is this a PhD thesis?

Do you have a committee?

😊 said...

Hi Professor Ned,

Thanks for the response, I've sent the detail via LinkedIn message 🙏

Anonymous said...

Dear Professor,

Thank you for your good work! I have been using warp pls for my analysis and I love it because its user friendly. I would like to know if I have been doing it right or wrong. I just finished watching a video titled "Explore categorical-to-numeric conversion in warppls" on youtube but I realized my situation is a bit different. Below is an explanation of what I have done. Kindly let me know if I have done it wrongly.

So my IV has three levels: IV1, IV2 and IV3. They are different scenarios presented to respondents to read after which behavioural questions were given to them to answer. The IVs were converted to dummy in excel before importing to warppls software. IV1 is the base and basically just want to see how IV2 and IV 3 compares to IV1. IV2 column in excel sheet has 0 if respondents were not exposed to IV2 scenario in the study but has 1 if they were exposed to IV2 scenraio (random assignment was done). I did the same for the IV3 dummy column. So in warppls model, I specified two IVs, basically IV2 and IV3 and linked them to mediators. Also, I have three moderators that are also dummy(variable is consumer segment and there are three consumer segments). I also created two dummy variables as I did with the IV. Then, linked them to the link between IV and mediators. The mediators and DV were measured with likert scale. Did I do this correctly or I did something wrong? Thank you!

Ned Kock said...

Dummy coding with more than 2 categories is likely to lead to biases, and often those can be significant. Have you considered entering the variables into WarpPLS as data labels and then use anchor-factorial with variation sharing cat2num conversions?