Hierarchical Regression without Well-established Theory
Hierarchical regression is a theory-driven method. However, when an exploratory study can benefit from this method, it is rarely discussed what should be done if no well-established theories can be identified.
Hierarchical regression should be recognised as a theory-driven method, although exploratory in its purpose. That is because the order of entry is of utmost importance in hierarchical regression. For instance, from Figure 1, the Venn diagram signifies the relationships among three variables, IV1, IV2, and DV. In this diagram, the overlapped part entails the shared variance of all the three variables. The area A is the shared variance between IV1 and DV, and the area B is that between IV2 and DV.
Figure 1: Venn Diagram for Hierarchical Regression
Provided that IV1 was firstly entered into the formal model, the variance of area A and area C will be attributed to IV1, leaving the variance of area B to IV2 (i.e., the R-square Change represents area B). Differently, if we entered IV2 in the hierarchical model first, the variance of area B and C will be attributed to IV2, leaving that of area A to IV1 (i.e., the R-square Change represents area A).
In this respect, multiple scholars have proposed many perspectives to decide the correct order (e.g., based on static-manipulable sequence, time precedence, or perceived importance, see Keith, 2015). Nevertheless, it is rarely discussed what should be done if no well-established theories can be identified, when the study could benefit from hierarchical regression. In one recently published paper (Wei et al., 2020), the authors proposed that ‘in the absence of well-established theories, researchers attempt all possible sequences and provide a range of effect sizes, rather than one single effect size, for each predictor’. In their study, if there are four predictors, a total of 24 (4x3x2x1) possible sequences were to be evaluated, based on which a range of effect sizes can be generated.
This is an interesting and possibly useful starting point with a confidence-interval-like paradigm. This short article is to simplify the procedure of this seemingly complicated method (consider what will happen if you have more than four variables). The simplest version of their method is to use the range between zero-order correlation and semi-partial correlation. That is because the zero-order correlation always represents the highest effect size between the IVX and DV; and the semi-partial correlation denotes the correlation between IVX and DV, with the effects of IV1, IV2, …, IVX-1 removed from DV.
If the scholars would like to generate an R2-based index, the simple square of the zero-order correlation and semi-partial correlation would be useful. However, it should be noticed that you may obtain a higher squared semi-partial correlation than the ΔR2 generated by the hierarchical regression when you place this IV in the final level. Normally, it is because the direction of the correlation changes from one side to another. For example, the zero-order correlation is .20 (the corresponding R-square are .04) but the semi-partial correlation is -.03 (the corresponding R-square Change might be 0.000 or ‘<.0005’ but the simple squared value is .001). In this scenario, it is always a good practice to check the multicollinearity between your IVs. Also, I would suggest that you should consider the parsimony of the hierarchical model. If all the assumptions and suggestions were taken into account, this is an interesting and possibly potential result, showing that the effect of this particular IV on DV could be explained by the combination of all the other IVs. Based on this phenomenon, you could elaborate on your own results.
References:
Keith, T. Z. (2015). Multiple regression and beyond: An introduction to multiple regression and structural equation modelling (2nd edition). Routledge, Taylor & Francis Group.
Wei, R., Liu, H., & Wang, S. (2020). Exploring L2 Grit in the Chinese EFL Context. System, 93. https://doi.org/10.1016/j.system.2020.102295