Sunday, February 5, 2012
New PLS-based SEM email distribution list
A new email distribution list is available for those who share a common interest in partial least squares (PLS) regression and its use in structural equation modeling (SEM). To check it out click here.
Monday, January 23, 2012
Version 3.0 of WarpPLS is coming soon, with several new features
Version 3.0 of WarpPLS is currently undergoing a battery of tests, and will be made available soon. Among the new features is the calculation of indirect and total effects, which are exemplified in this health data analysis post based on the China Study II dataset. Here is a comprehensive list of new features in this version:
- Addition of latent variables as indicators. Users now have the option of adding latent variable scores to the set of standardized indicators used in an SEM analysis. This option is useful in the removal of outliers, through the use of restricted ranges for latent variable scores, particularly for outliers that are clearly visible on the plots depicting associations among latent variables. This option is also useful in hierarchical analysis, where users define second-order (and higher order) latent variables, and then conduct analyses with different models including latent variables of different orders.
- Blindfolding. Users now have the option of using a third resampling algorithm, namely blindfolding, in addition to bootstrapping and jackknifing. Blindfolding is a resampling algorithm that creates a number of resamples (a number that can be selected by the user), where each resample has a certain number of rows replaced with the means of the respective columns. The number of rows modified in this way in each resample equals the sample size divided by the number of resamples. For example, if the sample size is 200 and the number of resamples selected is 100, then each resample will have 2 rows modified. If a user chooses a number of resamples that is greater than the sample size, the number of resamples is automatically set to the sample size (as with jackknifing).
- Effect sizes. Cohen’s (1988) f-squared effect size coefficients are now calculated and shown for all path coefficients. These are calculated as the absolute values of the individual contributions of the corresponding predictor latent variables to the R-square coefficients of the criterion latent variable in each latent variable block. With these effect sizes users can ascertain whether the effects indicated by path coefficients are small, medium, or large. The values usually recommended are 0.02, 0.15, and 0.35; respectively (Cohen, 1988). Values below 0.02 suggest effects that are too weak to be considered relevant from a practical point of view, even when the corresponding P values are statistically significant; a situation that may occur with large sample sizes.
- Full colinearity VIFs. VIFs are now shown for all latent variables, separately from the VIFs calculated for predictor latent variables in individual latent variable blocks. These new VIFs are calculated based on a full colinearity test, which identifies not only vertical but also lateral colinearity, and allows for a test of colinearity involving all latent variables in a model. Vertical, or classic, colinearity is predictor-predictor latent variable colinearity in individual blocks. Lateral colinearity is a new term that refers to predictor-criterion latent variable colinearity; a type of colinearity that can lead to particularly misleading results. Full colinearity VIFs can also be used for common method (Lindell & Whitney, 2001) bias tests that are more conservative than, and arguably superior to, the traditionally used tests relying on exploratory factor analyses.
- Incremental code optimization. At several points the code was optimized for speed, which led to incremental gains even as a significant number of new features were added. Several of these new features required new and complex calculations, mostly to generate coefficients that were not available before.
- Indirect and total effects. Indirect and total effects are now calculated and shown, together with the corresponding P values, standard errors, and effect sizes. The calculation of indirect and total effects can be critical in the evaluation of downstream effects of latent variables that are mediated by other latent variables, especially in complex models with multiple mediating effects in concurrent paths. Indirect effects also allow for direct estimations, via resampling, of the P values associated with mediating effects that have traditionally relied on time-consuming and not fully automated calculations based on linear (Preacher & Hayes, 2004) and nonlinear (Hayes & Preacher, 2010) assumptions.
- P values for all weights and loadings. P values are now shown for all weights and loadings, including those associated with indicators that make up moderating variables. With these P values, users can check whether moderating latent variables satisfy validity and reliability criteria for either reflective or formative measurement. This can help users demonstrate validity and reliability in hierarchical analyses involving moderating effects, where double, triple etc. moderating effects are tested. For instance, moderating latent variables can be created, added to the model as standardized indicators, and then their effects modeled as being moderated by other latent variables; an example of double moderation.
- Predictive validity. Stone-Geisser Q-squared coefficients (Geisser, 1974; Stone, 1974) are now calculated and shown for each endogenous variable in an SEM model. The Q-squared coefficient is a nonparametric measure traditionally calculated via blindfolding. It is used for the assessment of the predictive validity (or relevance) associated with each latent variable block in the model, through the endogenous latent variable that is the criterion variable in the block. Sometimes referred to as a resampling analog of the R-squared, it is often similar in value to that measure; even though, unlike the R-squared coefficient, the Q-squared coefficient can assume negative values. Acceptable predictive validity in connection with an endogenous latent variable is suggested by a Q-squared coefficient greater than zero.
- Ranked data. Users can now select an option to conduct their analyses with only ranked data, whereby all the data is automatically ranked prior to the SEM analysis (the original data is retained in unranked format). When data is ranked, typically the value distances that typify outliers are significantly reduced, effectively eliminating outliers without any decrease in sample size. A concomitant increase in colinearity is usually observed, but not to the point of threatening the credibility of the results. This option can be very useful in assessments of whether the presence of outliers significantly affects path coefficients and respective P values, especially when outliers are not believed to be due to measurement error.
- Restricted ranges. Users can now run their analyses with subsamples defined by a range restriction variable, which may be standardized or unstandardized. This option is useful in multi-group analyses, whereby separate analyses are conducted for each subsample and the results then compared with one another. One example would be a multi-country analysis, with each country being treated as a subsample, but without separate datasets for each country having to be provided as inputs. This range restriction feature is also useful in situations where outliers are causing instability in a resample set, which can lead to abnormally high standard errors and thus inflated P values. Users can remove outliers by restricting the values assumed by a variable to a range that excludes the outliers, without having to modify and re-read a dataset.
- Standard errors for all weights and loadings. Standard errors are now shown for all loadings and weights. Among other purposes, these standard errors can be used in multi-group analyses, with the same model but different subsamples. In these cases, users may want to compare the measurement models to ascertain equivalence, using a multi-group comparison technique such as the one documented by Keil et al. (2000), and thus ensure that any observed differences in structural model coefficients are not due to measurement model differences.
- VIFs for all indicators. VIFs are now shown for all indicators, including those associated with moderating latent variables. With these VIFs, users can check whether moderating latent variables satisfy criteria for formative measurement, in case they do not satisfy validity and reliability criteria for reflective measurement. This can be particularly helpful in hierarchical analyses involving moderating effects, where formative latent variables are frequently employed, including cases where double, triple etc. moderating effects are tested. Here moderating latent variables can be created, added to the model as standardized indicators, and then their effects modeled as being moderated by other latent variables; with this process being repeated at different levels.
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.
Geisser, S. (1974). A predictive approach to the random effects model. Biometrika, 61(1), 101-107.
Hayes, A. F., & Preacher, K. J. (2010). Quantifying and testing indirect effects in simple mediation models when the constituent paths are nonlinear. Multivariate Behavioral Research, 45(4), 627-660.
Keil, M., Tan, B.C., Wei, K.-K., Saarinen, T., Tuunainen, V., & Wassenaar, A. (2000). A cross-cultural study on escalation of commitment behavior in software projects. MIS Quarterly, 24(2), 299–325.
Lindell, M., & Whitney, D. (2001). Accounting for common method variance in cross-sectional research designs. Journal of Applied Psychology, 86(1), 114-121.
Preacher, K.J., & Hayes, A.F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36 (4), 717-731.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36(1), 111–147.
- Addition of latent variables as indicators. Users now have the option of adding latent variable scores to the set of standardized indicators used in an SEM analysis. This option is useful in the removal of outliers, through the use of restricted ranges for latent variable scores, particularly for outliers that are clearly visible on the plots depicting associations among latent variables. This option is also useful in hierarchical analysis, where users define second-order (and higher order) latent variables, and then conduct analyses with different models including latent variables of different orders.
- Blindfolding. Users now have the option of using a third resampling algorithm, namely blindfolding, in addition to bootstrapping and jackknifing. Blindfolding is a resampling algorithm that creates a number of resamples (a number that can be selected by the user), where each resample has a certain number of rows replaced with the means of the respective columns. The number of rows modified in this way in each resample equals the sample size divided by the number of resamples. For example, if the sample size is 200 and the number of resamples selected is 100, then each resample will have 2 rows modified. If a user chooses a number of resamples that is greater than the sample size, the number of resamples is automatically set to the sample size (as with jackknifing).
- Effect sizes. Cohen’s (1988) f-squared effect size coefficients are now calculated and shown for all path coefficients. These are calculated as the absolute values of the individual contributions of the corresponding predictor latent variables to the R-square coefficients of the criterion latent variable in each latent variable block. With these effect sizes users can ascertain whether the effects indicated by path coefficients are small, medium, or large. The values usually recommended are 0.02, 0.15, and 0.35; respectively (Cohen, 1988). Values below 0.02 suggest effects that are too weak to be considered relevant from a practical point of view, even when the corresponding P values are statistically significant; a situation that may occur with large sample sizes.
- Full colinearity VIFs. VIFs are now shown for all latent variables, separately from the VIFs calculated for predictor latent variables in individual latent variable blocks. These new VIFs are calculated based on a full colinearity test, which identifies not only vertical but also lateral colinearity, and allows for a test of colinearity involving all latent variables in a model. Vertical, or classic, colinearity is predictor-predictor latent variable colinearity in individual blocks. Lateral colinearity is a new term that refers to predictor-criterion latent variable colinearity; a type of colinearity that can lead to particularly misleading results. Full colinearity VIFs can also be used for common method (Lindell & Whitney, 2001) bias tests that are more conservative than, and arguably superior to, the traditionally used tests relying on exploratory factor analyses.
- Incremental code optimization. At several points the code was optimized for speed, which led to incremental gains even as a significant number of new features were added. Several of these new features required new and complex calculations, mostly to generate coefficients that were not available before.
- Indirect and total effects. Indirect and total effects are now calculated and shown, together with the corresponding P values, standard errors, and effect sizes. The calculation of indirect and total effects can be critical in the evaluation of downstream effects of latent variables that are mediated by other latent variables, especially in complex models with multiple mediating effects in concurrent paths. Indirect effects also allow for direct estimations, via resampling, of the P values associated with mediating effects that have traditionally relied on time-consuming and not fully automated calculations based on linear (Preacher & Hayes, 2004) and nonlinear (Hayes & Preacher, 2010) assumptions.
- P values for all weights and loadings. P values are now shown for all weights and loadings, including those associated with indicators that make up moderating variables. With these P values, users can check whether moderating latent variables satisfy validity and reliability criteria for either reflective or formative measurement. This can help users demonstrate validity and reliability in hierarchical analyses involving moderating effects, where double, triple etc. moderating effects are tested. For instance, moderating latent variables can be created, added to the model as standardized indicators, and then their effects modeled as being moderated by other latent variables; an example of double moderation.
- Predictive validity. Stone-Geisser Q-squared coefficients (Geisser, 1974; Stone, 1974) are now calculated and shown for each endogenous variable in an SEM model. The Q-squared coefficient is a nonparametric measure traditionally calculated via blindfolding. It is used for the assessment of the predictive validity (or relevance) associated with each latent variable block in the model, through the endogenous latent variable that is the criterion variable in the block. Sometimes referred to as a resampling analog of the R-squared, it is often similar in value to that measure; even though, unlike the R-squared coefficient, the Q-squared coefficient can assume negative values. Acceptable predictive validity in connection with an endogenous latent variable is suggested by a Q-squared coefficient greater than zero.
- Ranked data. Users can now select an option to conduct their analyses with only ranked data, whereby all the data is automatically ranked prior to the SEM analysis (the original data is retained in unranked format). When data is ranked, typically the value distances that typify outliers are significantly reduced, effectively eliminating outliers without any decrease in sample size. A concomitant increase in colinearity is usually observed, but not to the point of threatening the credibility of the results. This option can be very useful in assessments of whether the presence of outliers significantly affects path coefficients and respective P values, especially when outliers are not believed to be due to measurement error.
- Restricted ranges. Users can now run their analyses with subsamples defined by a range restriction variable, which may be standardized or unstandardized. This option is useful in multi-group analyses, whereby separate analyses are conducted for each subsample and the results then compared with one another. One example would be a multi-country analysis, with each country being treated as a subsample, but without separate datasets for each country having to be provided as inputs. This range restriction feature is also useful in situations where outliers are causing instability in a resample set, which can lead to abnormally high standard errors and thus inflated P values. Users can remove outliers by restricting the values assumed by a variable to a range that excludes the outliers, without having to modify and re-read a dataset.
- Standard errors for all weights and loadings. Standard errors are now shown for all loadings and weights. Among other purposes, these standard errors can be used in multi-group analyses, with the same model but different subsamples. In these cases, users may want to compare the measurement models to ascertain equivalence, using a multi-group comparison technique such as the one documented by Keil et al. (2000), and thus ensure that any observed differences in structural model coefficients are not due to measurement model differences.
- VIFs for all indicators. VIFs are now shown for all indicators, including those associated with moderating latent variables. With these VIFs, users can check whether moderating latent variables satisfy criteria for formative measurement, in case they do not satisfy validity and reliability criteria for reflective measurement. This can be particularly helpful in hierarchical analyses involving moderating effects, where formative latent variables are frequently employed, including cases where double, triple etc. moderating effects are tested. Here moderating latent variables can be created, added to the model as standardized indicators, and then their effects modeled as being moderated by other latent variables; with this process being repeated at different levels.
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.
Geisser, S. (1974). A predictive approach to the random effects model. Biometrika, 61(1), 101-107.
Hayes, A. F., & Preacher, K. J. (2010). Quantifying and testing indirect effects in simple mediation models when the constituent paths are nonlinear. Multivariate Behavioral Research, 45(4), 627-660.
Keil, M., Tan, B.C., Wei, K.-K., Saarinen, T., Tuunainen, V., & Wassenaar, A. (2000). A cross-cultural study on escalation of commitment behavior in software projects. MIS Quarterly, 24(2), 299–325.
Lindell, M., & Whitney, D. (2001). Accounting for common method variance in cross-sectional research designs. Journal of Applied Psychology, 86(1), 114-121.
Preacher, K.J., & Hayes, A.F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36 (4), 717-731.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36(1), 111–147.
Monday, October 10, 2011
Hands-On Workshop on WarpPLS; 6-7 January 2012; San Antonio, Texas
Two-Day Hands-On Workshop on WarpPLS: SEM Fundamentals with Linear and Nonlinear Applications
*** Registration and additional details ***
http://bit.ly/oqoG5C
or
http://scriptwarp.com/warppls/prjs/2012_WarpPLSwkshp_Jan_SanAntonio
*** Instructor ***
Ned Kock, Ph.D.
WarpPLS Developer
http://nedkock.com
*** Location and dates ***
Our Lady of the Lake University
San Antonio, Texas
6-7 January 2012 (Fri-Sat), 8 am–5 pm
*** Workshop program at a glance ***
The main goal of this workshop is to give participants a practical understanding of how to use the software WarpPLS to conduct variance-based structural equation modeling (SEM). The workshop is very hands-on and covers linear and nonlinear applications.
Day 1 of workshop
• Overview of workshop and formation of teams
• Overview of web resources: Video clips, blog, publications, spreadsheets, and templates
• Overview of steps 1 to 5 of a complete SEM analysis
• Hands-on exercise: Steps 1 to 5 of a complete SEM analysis
• Resampling as shuffling multiple decks of cards
• Choosing the right resampling method
• Hands-on exercise: Changing the resampling method
• Choosing the right warping (i.e., nonlinear) algorithm
• Viewing plots of linear and nonlinear relationships
• Hands-on exercise: Changing the warping algorithm and viewing plots
• Charting non-standardized data
• Hands-on exercise: Charting non-standardized data
• Reading discussion: Kock (2011) – WarpPLS 2.0 User Manual
Day 2 of workshop
• Testing a mediating effect using the Baron & Kenny approach
• Hands-on exercise: Testing a mediating effect using the Baron & Kenny approach
• Testing a mediating effect using the Preacher & Hayes approach
• Hands-on exercise: Testing a mediating effect using the Preacher & Hayes approach
• Reading discussion: Kock et al. (2009) – Communication flow orientation article
• Testing a moderating effect
• Hands-on exercise: Testing a moderating effect
• Adding control variables into an analysis
• Conducting a multi-group analysis
• Conducting a full collinearity test
• Reading discussion: Zhang et al. (2010) – Organizing software testing article
• Hands-on exercise: Team project using participant’s own data
• Presentation of results from team project
*** Registration and additional details ***
http://bit.ly/oqoG5C
or
http://scriptwarp.com/warppls/prjs/2012_WarpPLSwkshp_Jan_SanAntonio
*** Instructor ***
Ned Kock, Ph.D.
WarpPLS Developer
http://nedkock.com
*** Location and dates ***
Our Lady of the Lake University
San Antonio, Texas
6-7 January 2012 (Fri-Sat), 8 am–5 pm
*** Workshop program at a glance ***
The main goal of this workshop is to give participants a practical understanding of how to use the software WarpPLS to conduct variance-based structural equation modeling (SEM). The workshop is very hands-on and covers linear and nonlinear applications.
Day 1 of workshop
• Overview of workshop and formation of teams
• Overview of web resources: Video clips, blog, publications, spreadsheets, and templates
• Overview of steps 1 to 5 of a complete SEM analysis
• Hands-on exercise: Steps 1 to 5 of a complete SEM analysis
• Resampling as shuffling multiple decks of cards
• Choosing the right resampling method
• Hands-on exercise: Changing the resampling method
• Choosing the right warping (i.e., nonlinear) algorithm
• Viewing plots of linear and nonlinear relationships
• Hands-on exercise: Changing the warping algorithm and viewing plots
• Charting non-standardized data
• Hands-on exercise: Charting non-standardized data
• Reading discussion: Kock (2011) – WarpPLS 2.0 User Manual
Day 2 of workshop
• Testing a mediating effect using the Baron & Kenny approach
• Hands-on exercise: Testing a mediating effect using the Baron & Kenny approach
• Testing a mediating effect using the Preacher & Hayes approach
• Hands-on exercise: Testing a mediating effect using the Preacher & Hayes approach
• Reading discussion: Kock et al. (2009) – Communication flow orientation article
• Testing a moderating effect
• Hands-on exercise: Testing a moderating effect
• Adding control variables into an analysis
• Conducting a multi-group analysis
• Conducting a full collinearity test
• Reading discussion: Zhang et al. (2010) – Organizing software testing article
• Hands-on exercise: Team project using participant’s own data
• Presentation of results from team project
Monday, August 29, 2011
Using WarpPLS in E-Collaboration Studies: Mediating Effects, Control and Second Order Variables, and Algorithm Choices
A new article discussing WarpPLS is available. The article is titled “Using WarpPLS in E-Collaboration Studies: Mediating Effects, Control and Second Order Variables, and Algorithm Choices”. It has been recently published in the International Journal of e-Collaboration. A full text version of the article is available here as a PDF file. Below is the abstract of the article.
This is a follow-up on two previous articles on WarpPLS and e-collaboration. The first discussed the five main steps through which a variance-based nonlinear structural equation modeling analysis could be conducted with the software WarpPLS (Kock, 2010b). The second covered specific features related to grouped descriptive statistics, viewing and changing analysis algorithm and resampling settings, and viewing and saving various results (Kock, 2011). This and the previous articles use data from the same e-collaboration study as a basis for the discussion of important WarpPLS features. Unlike the previous articles, the focus here is on a brief discussion of more advanced issues, such as: testing the significance of mediating effects, including control variables in an analysis, using second order latent variables, choosing the right warping algorithm, and using bootstrapping and jackknifing in combination.
This is a follow-up on two previous articles on WarpPLS and e-collaboration. The first discussed the five main steps through which a variance-based nonlinear structural equation modeling analysis could be conducted with the software WarpPLS (Kock, 2010b). The second covered specific features related to grouped descriptive statistics, viewing and changing analysis algorithm and resampling settings, and viewing and saving various results (Kock, 2011). This and the previous articles use data from the same e-collaboration study as a basis for the discussion of important WarpPLS features. Unlike the previous articles, the focus here is on a brief discussion of more advanced issues, such as: testing the significance of mediating effects, including control variables in an analysis, using second order latent variables, choosing the right warping algorithm, and using bootstrapping and jackknifing in combination.
Monday, July 18, 2011
WarpPLS workshop at Fundação Getúlio Vargas in June 2011: Details and some photos
Below are some photos from the June 2011 WarpPLS workshop at Fundação Getúlio Vargas, one of the highest ranked and most prestigious universities in Brazil in its main areas of focus. FGV’s main foci are business management and public administration. The workshop was in Rio de Janeiro. The workshop participants included faculty, doctoral students, and masters’ students at FGV.
This was a very hands-on workshop, as the participants had taken a course in structural equation modeling prior to it. They used Amos in that course, which was great because the workshop then highlighted the power of WarpPLS vis-à-vis a well established and also very useful tool for multivariate analyses with latent variables (Amos). We had about 15 contact hours for this workshop. Activities included commentaries based on video clips, live demonstrations, discussions of selected readings, and practical assignments focusing on linear and nonlinear empirical data analyses.
About 30 percent of the workshop was set aside for “free data analyses”, building on data that the participants brought into the workshop. That is, the participants had time to analyze their own data, and solve specific problems with my help. (There are always issues that are specific to a given dataset; e.g., problems with indicator loadings and interpretation of nonlinear results.) There was also a team workshop project, where participant teams presented an independent empirical study with analyses employing WarpPLS.
Some of the participants were faculty members from other universities in Rio de Janeiro, as well as employees of a few major research and training organizations in Brazil. Among these organizations were Fundação Oswaldo Cruz (a.k.a. FIOCRUZ), and the Escola de Comando e Estado Maior do Exército (ECEME). FIOCRUZ is one of the world’s foremost public health organizations, known for its strengths in various related areas, including epidemiological research. ECEME is an education institution that prepares officers of the Brazilian Army to take up command positions at the rank of General.
This was a very hands-on workshop, as the participants had taken a course in structural equation modeling prior to it. They used Amos in that course, which was great because the workshop then highlighted the power of WarpPLS vis-à-vis a well established and also very useful tool for multivariate analyses with latent variables (Amos). We had about 15 contact hours for this workshop. Activities included commentaries based on video clips, live demonstrations, discussions of selected readings, and practical assignments focusing on linear and nonlinear empirical data analyses.
About 30 percent of the workshop was set aside for “free data analyses”, building on data that the participants brought into the workshop. That is, the participants had time to analyze their own data, and solve specific problems with my help. (There are always issues that are specific to a given dataset; e.g., problems with indicator loadings and interpretation of nonlinear results.) There was also a team workshop project, where participant teams presented an independent empirical study with analyses employing WarpPLS.
Some of the participants were faculty members from other universities in Rio de Janeiro, as well as employees of a few major research and training organizations in Brazil. Among these organizations were Fundação Oswaldo Cruz (a.k.a. FIOCRUZ), and the Escola de Comando e Estado Maior do Exército (ECEME). FIOCRUZ is one of the world’s foremost public health organizations, known for its strengths in various related areas, including epidemiological research. ECEME is an education institution that prepares officers of the Brazilian Army to take up command positions at the rank of General.
Tuesday, June 28, 2011
WarpPLS’ treatment of formative latent variables: More in line with Wold than Lohmöller
WarpPLS uses what is often referred to as Wold’s original “PLS regression” algorithm to calculate indicator weights, for both formative and reflective variables. PLS regression was developed by Wold, and is slightly different from the modification of Wold’s algorithm developed by Lohmöller, which is the one normally used in other publicly available PLS-based structural equation modeling software.
Generally speaking, the PLS regression algorithm generates coefficients that are more stable and robust – i.e., reliable for hypothesis testing. It also tends to minimize collinearity. On the other hand, it is more computing intensive, which is probably one of the reasons why Lohmöller developed a modified version, with sub-versions called “modes” – see Lohmöller (1989) for more details. Personal computers were not that powerful in the 1980s.
Moreover, the type of nonlinear treatment employed by WarpPLS cannot be properly performed with Lohmöller’s algorithm. The problem is that with Lohmöller’s algorithm, as a model changes, the weights and loadings also change, even if the latent variables do not change. That is, with Lohmöller’s algorithm, two models with the same latent variables but different structures (i.e., links among latent variables) will have different weights and loadings.
The weights of formative latent variables will be essentially the same in WarpPLS as they would be if the variables were defined as reflective. That is, they will be obtained by an iterative algorithm that stops when two conditions are met: (a) the weights between indicators and latent variable are standardized partial regression coefficients calculated with the indicators as independent variables and the latent variable as the dependent variable; and (b) the regression equation expressing the latent variable as a combination of the indicators has an error term of zero.
So why should the user define a latent variable as formative or reflective? The reason are the interpretations of the outputs generated by the software. When a latent variable is formative, both the P values for the weights and the variance inflation factors for the indicators should be generally low; ideally below 0.05 and 2.5, respectively.
True formative variables are fundamentally different from true reflective variables; there are cases that can be seen as “in between” formative and reflective. True formative and reflective variables behave differently, whether the software treats them differently or not. For example, with true formative variables you would expect indicators to be significantly associated with the scores of their respective latent variable; which is indicated by low P values for their weights. However, you would not normally expect the indicators to be redundant; which is indicated by low variance inflation factors for the indicators.
The way formative variables are treated in Lohmöller’s approach leads to unstable weights, with the signs of weights frequently changing in the resample set. See Temme et al. (2006) for a discussion on this phenomenon. Lohmöller’s approach also leads to “lateral” collinearity; or collinearity between predictor and criteria latent variables. This “stealth” type of collinearity often leads to inflated path coefficients for links involving formative latent variables.
References
Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg, Germany: Physica-Verlag.
Temme, D., Kreis, H., & Hildebrandt, L. (2006). PLS path modeling – A software review. Berlin, Germany: Institute of Marketing, Humboldt University Berlin.
Generally speaking, the PLS regression algorithm generates coefficients that are more stable and robust – i.e., reliable for hypothesis testing. It also tends to minimize collinearity. On the other hand, it is more computing intensive, which is probably one of the reasons why Lohmöller developed a modified version, with sub-versions called “modes” – see Lohmöller (1989) for more details. Personal computers were not that powerful in the 1980s.
Moreover, the type of nonlinear treatment employed by WarpPLS cannot be properly performed with Lohmöller’s algorithm. The problem is that with Lohmöller’s algorithm, as a model changes, the weights and loadings also change, even if the latent variables do not change. That is, with Lohmöller’s algorithm, two models with the same latent variables but different structures (i.e., links among latent variables) will have different weights and loadings.
The weights of formative latent variables will be essentially the same in WarpPLS as they would be if the variables were defined as reflective. That is, they will be obtained by an iterative algorithm that stops when two conditions are met: (a) the weights between indicators and latent variable are standardized partial regression coefficients calculated with the indicators as independent variables and the latent variable as the dependent variable; and (b) the regression equation expressing the latent variable as a combination of the indicators has an error term of zero.
So why should the user define a latent variable as formative or reflective? The reason are the interpretations of the outputs generated by the software. When a latent variable is formative, both the P values for the weights and the variance inflation factors for the indicators should be generally low; ideally below 0.05 and 2.5, respectively.
True formative variables are fundamentally different from true reflective variables; there are cases that can be seen as “in between” formative and reflective. True formative and reflective variables behave differently, whether the software treats them differently or not. For example, with true formative variables you would expect indicators to be significantly associated with the scores of their respective latent variable; which is indicated by low P values for their weights. However, you would not normally expect the indicators to be redundant; which is indicated by low variance inflation factors for the indicators.
The way formative variables are treated in Lohmöller’s approach leads to unstable weights, with the signs of weights frequently changing in the resample set. See Temme et al. (2006) for a discussion on this phenomenon. Lohmöller’s approach also leads to “lateral” collinearity; or collinearity between predictor and criteria latent variables. This “stealth” type of collinearity often leads to inflated path coefficients for links involving formative latent variables.
References
Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg, Germany: Physica-Verlag.
Temme, D., Kreis, H., & Hildebrandt, L. (2006). PLS path modeling – A software review. Berlin, Germany: Institute of Marketing, Humboldt University Berlin.
Saturday, June 25, 2011
Dealing with country-specific number punctuation systems
WarpPLS users in countries that adopt number punctuation systems different from that adopted in the USA may have problems when using Excel to manipulate WarpPLS files.
For instance, in Brazil a comma is used to separate the integer from the fractional part of a real number (e.g., 1,431), whereas in the USA a period is used for that purpose (e.g., 1.431).
Because of that, a coefficient calculated by WarpPLS and exported into a .txt file as “1.431” may be read by a Brazilian version of Excel as one thousand four hundred and thirty-one, and not as one plus the 431/1000 fraction.
This tends to happen in certain types of analyses, such as second order latent variable analyses, where WarpPLS outputs are used as inputs after manipulation with country-specific versions of Excel.
A simple way to solve this problem is to use Excel, Notepad, or another simple text editing tool and replace the offending punctuation items, all points with commas (or vice-versa) for example, before using the inputs for other purposes.
For instance, in Brazil a comma is used to separate the integer from the fractional part of a real number (e.g., 1,431), whereas in the USA a period is used for that purpose (e.g., 1.431).
Because of that, a coefficient calculated by WarpPLS and exported into a .txt file as “1.431” may be read by a Brazilian version of Excel as one thousand four hundred and thirty-one, and not as one plus the 431/1000 fraction.
This tends to happen in certain types of analyses, such as second order latent variable analyses, where WarpPLS outputs are used as inputs after manipulation with country-specific versions of Excel.
A simple way to solve this problem is to use Excel, Notepad, or another simple text editing tool and replace the offending punctuation items, all points with commas (or vice-versa) for example, before using the inputs for other purposes.
Subscribe to:
Posts (Atom)