# Linear Regression Models Term Paper

*Shalabh*

**shalab@iitk.ac.in**

**shalabh1@yahoo.com**

Department of Mathematics & Statistics

Indian of , - 208016 ()

Department of Mathematics & Statistics

Indian of , - 208016 ()

**HOME PAGE**

*Books *

*Research Papers *

1. Srivastava, A.K. and Shalabh (1995): "Predictions in Linear Regression Models With Measurement Errors", *Indian Journal of Applied Economics,* Vol. 4, No. 2, pp. 1-14.

2. Shalabh (1995): "Performance of Stein - rule Procedure for Simultaneous Prediction of Actual and Average Values of Study Variable in Linear Regression Model", *Bulletin of the International Statistical Institute, The ,* pp. 1375-1390.

3. Rao, B. and Shalabh (1995): "Unit Roots, Cointegration and the Demand for Money in ", *Applied Economic Letters*, 2, 397-399.

4. Srivastava, A.K. and Shalabh (1996): "Properties of a Consistent Estimation Procedure in Ultrastructural Model when Reliability Ratio is Known", *Microelectronics and Reliability*, Vol. 36, No. 9, pp. 1249-1252.

5. Srivastava, A.K. and Shalabh (1996): "Efficiency Properties of Least Squares and Stein-Rule Predictions in Linear Regression Model", *Journal of Applied Statistical Science*, Vol. 4, No. 2/3, pp. 141-145.

6. Srivastava, A.K. and Shalabh (1996): "A Composite Target Function for Prediction in Economic Models", *Indian Journal of Applied Economics,* Vol. 5, No. 5, pp. 251-257.

7. Toutenburg, H. and Shalabh (1996): "Predictive Performance of the Methods of Restricted and Mixed Regression Estimators", *Biometrical Journal,* 38, 8, pp. 951-959.

8. Srivastava, A.K. and Shalabh (1997): "A New Property of Stein Procedure in Measurement Error Model", *Statistics and Probability Letters,* 32, pp. 231-234.

9. Shalabh (1997): "Ratio Method of Estimation in the Presence of Measurement Errors", *Indian Journal of Agricultural Statistics,* Vol. 50, No.2, pp. 150-155.

10. Srivastava, A.K. and Shalabh (1997): "Improved Estimation of Slope Parameter in a Linear Ultrastructural Model when Measurement Errors are not Necessarily Normal", *Journal of Econometrics,* 78, pp. 153-157.

11. Shalabh (1997): "On Efficient Forecasting in Linear Regression Models", *Journal of Quantitative Economics*, Vol. 36, No. 2, pp. 133-140.

12. Srivastava, A.K. and Shalabh (1997): "Consistent Estimation for the Non-normal Ultrastructural Model", *Statistics and Probability Letters,* 34, pp. 67-73.

13. Srivastava, A.K. and Shalabh (1997): "Asymptotic Efficiency Properties of Least Squares Estimation in Ultrastructural Model", *TEST*, Vol. 6, No. 2, pp. 419-431.

14. Shalabh (1998): "Unbiased Prediction in Linear Regression Model with Equicorrelated Responses", *Statistical Papers*, Vol. 39, No. 2, pp.237-244.

15. Shalabh (1998): "Improved Estimation in Measurement Error Models Through Stein-rule Procedure", *Journal of Multivariate Analysis,* 67, 35-48. , Corrigendum : *Journal of Multivariate Analysis,* 74, p. 162, (2000).

16. Toutenburg, H. and Shalabh (1998) : "Prediction of Response Values in Linear Regression Models from Replicated Experiments", *SFB Discussion Paper 112*, of , .

17. Toutenburg, H. and Shalabh (1998) : "Use of minimum risk approach in the estimation of regression models with missing observation", *SFB Discussion Paper 118*, of , .

18. Toutenburg, H. and Shalabh (1998) : "Improved Predictions in Linear Regression Models with Stochastic Linear Constraints", *SFB Discussion Paper 124*, of , .

19. Shalabh (1999): "Improving the Predictions in Linear Regression Models", *Journal of Statistical Research*, Vol. 33, No. 1.

20. Toutenburg, H. and Shalabh (1999) : "Improving the Estimation of Incomplete Regression Models through Pilot Investigations and Repeated Studies", *SFB Discussion Paper 154*, of , .

21. Toutenburg, H. and Shalabh (1999) : "Estimation of Regression Coefficients Subject to Exact Linear Restrictions when some Observations are Missing and Balanced Loss Function is Used", *SFB Discussion Paper 163*, of , .

22. Toutenburg, H. and Shalabh (1999) : "Estimation of Regression Models with Equi-correlated Responses when some Observations on the Response Variable are Missing", *SFB Discussion Paper 174*, of , .

23. Toutenburg, H. and Shalabh (2000): "Improved Prediction in Linear Regression Model with Stochastic Linear Constraints", *Biometrical Journal,* 42, 1, 71-86.

24. Shalabh (2000): "Prediction of Values of Variables in Linear Measurement Error Model", *Journal of Applied Statistics*, 27, 4, 475-482.

25. Shalabh (2000): "Note on a Family of Unbiased Predictors for the Equi-correlated Responses in Linear Regression Models", *Statistical Papers*, Vol. 41, 2, pp. 237-241.

26. Shalabh and A.T.K. Wan (2000) : "Stein-rule Estimation in Mixed Regression Models", *Biometrical Journal ,* Vol 42, pp.203-214.

27. Srivastava A.K. and Shalabh (2000) : "On the Choice of Direction for Minimization of Residuals in Ultrastructural Model", *Statistica*, annoLX, n.1, 97-107.

28. Toutenburg, H. and Shalabh (2001) : "Use of Minimum Risk Approach in the Estimation of Regression Models with Missing Observations", *Metrika*, 54, 247-249.

29. Shalabh (2001): "Consistent Estimation through Weighted Harmonic Mean of Inconsistent Estimators in Replicated Measurement Error Models", *Econometric Reviews, *Vol. 20, 4, 507-510.

30. Toutenburg, H. and Shalabh (2001) : "A note on the comparison of minimax linear and mixed regression estimation of regression coefficients when prior estimates are available", *SFB Discussion Paper* 238, of , .

31. Toutenburg, H. and Shalabh (2001) : "Estimation of Linear Models with Missing Data: The role of Stochastic Linear Constraints", *SFB Discussion Paper* 239, of , .

32. Ullah, A., Shalabh and D. Mukherjee (2001): "Consistent Estimation of Regression Coefficients in Replicated data with non-normal Measurement Errors", *Annals of Economics and Finance*, 2, 249-264.

33. Toutenburg, H. and Shalabh (2001): "Use of Prior Information in the form of interval constraints for the Improved Estimation of Linear Regression Models with some Missing Responses", *SFB Discussion Paper *240, of , .

34. Srivastava, A.,K. and Shalabh (2001): "Effect of Measurement Errors On the Regression Method of Estimation in Survey Sampling", *Journal of Statistical Research*, Vol. 35, No. 2 , pp. 35-44.

35. Shalabh (2001) : "Estimation of Bias and Standard Error of An Improved Estimator of Mean", *Metrika*, 54, 43-51.

36. Toutenburg, H. and Shalabh (2001) : "Synthesizing the Classical and Inverse Methods in Linear Calibration", *SFB Discussion Paper* 252, of , .

37. Shalabh (2001): "Least Squares Estimators in Measurement Error Model under the Balanced Loss Function", *TEST,* Vol. 10, 2, 301-308.

38. Shalabh (2001) : "Pitman Closeness Comparison of Least Squares and Stein-rule Estimators in Linear Regression Models with Non-normal Disturbances", *The American Journal of Mathematical and Management Sciences (AJMMS),* Vol. 21, No. 1 , pp. 89-100.

39. Shalabh (2002) : "Effects of a Trended Regressor on the Efficiency Properties of the Least Squares and Stein-rule Estimation of Regression Coefficients", *Handbook of Applied Econometrics and Statistical Inference,* Editors: A. Ullah, A. Wan and A. Chaturvedi, Marcell Dekker, pp. 327-346.

40. Toutenburg, H. and Shalabh (2002) : "Prediction of Response Values in Linear Regression Models from Replicated Experiments", *Statistical Papers,* 43, pp. 423-433.

41. Shalabh and R. Chandra (2002): "Prediction in Restricted Regression Models", *Journal of Combinatorics, Information System and Sciences, *Vol. 29, Nos. 1-4, pp. 229-238.

42. Toutenburg, H. and Shalabh (2003) : "Pseudo Minimax Linear and Mixed Regression Estimation of Regression Coefficients when Prior Estimates are available", *Statistics and Probability Letters*, 63, pp. 35-39.

43. Toutenburg, H. and Shalabh (2003): "Estimation of Regression Models with Equi-correlated Responses when Some Observations on Response Variable are Missing", *Statistical Papers, *Vol. 44, No. 10, pp. 217-232.

44. Shalabh (2003): "Consistent Estimation of Coefficients in Measurement Error Models with Replicated Observations", *Journal of Multivariate Analysis*, Vol. 86, No. 2, pp. 227-241.

45. Schaffrin, B., H. Toutenburg and Shalabh (2003): "On the Impact of Missing Values on the Reliability Measures in a Linear Model", Journal of Statistical Research, (Invited paper for Special Volume in Honor of Professor A.K.Md.E. Saleh) , 37, 2, pp. 251-260.

46. Chaturvedi, A. and Shalabh (2004): "Risk and Pitman Closeness Properties of Feasible Generalized Double k-class estimators in Linear Regression Models with Non-spherical Disturbances under Balanced Loss Function", *Journal of Multivariate Analysis*, 90, 229-256.

47. J. Gleser and Ori Rosen (2004): "On the Usefulness of Knowledge of Error Variances in the Consistent Estimation of an Unreplicated Ultrastructural Model", *Journal of Statistical Computation & Simulation,* 74, 6, pp. 391-417.

48. Shalabh and H. Toutenburg (2005): "Consequences of Departure from Normality on the Properties of Calibration Estimators", Discussion paper 441, of , .

49. Shalabh and H. Toutenburg (2005): "On the regression method of estimation of population mean from incomplete survey data through imputation", Discussion paper 442, of , .

50. Toutenburg, H. and Shalabh (2005): "Estimation of Linear Models with Missing Data: The Role of Stochastic Linear Constraints", *Communications in Statistics - Theory and Methods* Volume 34, 2, pp. 375-387.

51. Toutenburg, H. and Shalabh (2005): "Estimation of Regression Coefficients subject to Exact Linear Restrictions when some observations are missing and Balanced Loss Function is used", *TEST*, Vol. 14, No. 2, pp. 385-396.

52. Toutenburg, H., V.K. Srivastava, Shalabh and C. Heumann (2005): "Estimation of Parameters in Multiple Regression With Missing Covariates using a Modified First Order Regression Procedure", *Annals of Economics and Finance*, 6, pp. 289-301.

53. H. Schneeweiss and Shalabh (2006): " On the Estimation of the Linear Relation when the Error Variances are known", Discussion paper 493, of , .

54. Shalabh, H. Toutenburg and C. Heumann (2006): "Risk Performance Of Stein-Rule Estimators Over The Least Squares Estimators Of Regression Coefficients Under Quadratic Loss Structures", Discussion paper 495, of , .

55. Shalabh, H. Toutenburg and C. Heumann (2006): " Mean squared error matrix comparison of least squares and Stein-rule estimators for regression coefficients under non-normal disturbances", Discussion paper 496, of , .

56. Shalabh, H. Toutenburg and C. Heumann (2006): " Performance of Double *k*-class Estimators for Coefficients in Linear Regression Models with Non Spherical Disturbances under Asymmetric Losses", Discussion paper 509, of , .

57. Toutenburg, H., V.K. Srivastava and Shalabh (2006): "Estimation of Linear regression Models with Missingness of Observations on Both the Explanatory and Study Variables", *Quality Technology and Quality Management, *Vol. 3, No. 2*, *pp. 179-189.

58. Toutenburg, H., Shalabh and C. Heumann (2006) : "Use of Prior Information in the Form of Interval Constraints for Improved Estimation of Linear Regression Models with Some Missing Responses", *Journal of Statistical Planning and Inference,* Vol. 136, No. 8, pp. 2430-2445.

59. Shalabh and H. Toutenburg (2006): "Consequence of Departure from Normality on the Properties of Calibration Estimators", *Journal of Statistical Planning and Inference*, Vol. 136, No. 12, pp. 4385-4396.

60. A. Kukush, A. Malenko, H. Schneeweiss and Shalabh (2007): "Optimality of Quasi-Score in the Multivariate Mean-Variance Model with an Application to the Zero-Inflated Poisson Model with Measurement Errors", Discussion paper 498, of , .

61. Shalabh and Pen-Hwang Liau (2007): "Consistent Estimation of Regression Coefficient Through Weighted Arithmetic Mean of Inconsistent Estimators in Replicated Ultrastructural Model", *Communications in Statistics**(Theory and Methods)*, Volume 36, Issue 5, pp. 955-960.

62. C. Heumann and Shalabh (2007): "Weighted Mixed Regression Estimation Under Biased Stochastic Restrictions", Technical Report No. 10, Department of Statistics, of , .

63. M. Wissmann, H. Toutenburg and Shalabh (2007): "Role of Categorical Variables in Multicollinearity in Linear Regression Model", Technical Report No. 8, Department of Statistics, of , .

64. Shalabh, H. Toutenburg and C. Heumann (2007): "Stein-Rule Estimation under an Extended Balanced Loss Function", Technical Report No. 7, Department of Statistics, of , .

65. H. Schneeweiss and Shalabh (2007): "On the Estimation of the Linear Relation when the Error Variances are known", *Computational Statistics and Data Analysis*, Vol. 52, pp. 1143 -1148.

66. Shalabh, Gaurav Garg and Neeraj Misra (2007): "Restricted Regression Estimation in Measurement Error Models", *Computational Statistics and Data Analysis *52*, *pp. 1149 -1166.

67. Singh, H.P. and Shalabh (2007): "Estimation of population mean through estimated coefficient of variation", *Journal of Applied Statistical Science, *Volume 15, Issue 4, pp. 425-429.

68. Shalabh, H. Toutenburg and C. Heumann (2007): "Risk Performance of Stein-Rule Estimators over the Least Squares Estimators of Regression Coefficients under Quadratic Loss Structures", * Journal of Statistical Studies *( *Invited paper for the special issue in honor of *75*th birthday of Professor A.K.Md.E. Saleh*)Vol. 26, pp. 97-103.

69. Shalabh and Alan Wan (2007): ``A Class of Estimators of Regression Coefficient for Sign Change Problem in Measurement Error Models'', * Journal of Statistical Research,* Vol. 41, No. 2, pp. 63-72.

70. Toutenburg, H. and Shalabh (2008): "Improving the Estimation of Incomplete Regression Models through Pilot Investigations and Repeated Studies", *Journal of Applied Statistical Science, *Volume 16, No. 1, pp. 127-145.

71. Shalabh, C.M. Paudel and (2008): "Simultaneous Prediction of Actual and Average Values of Response Variable in Replicated Measurement Error Models " in * Recent Advances In Linear Models and Related Areas *(Springer) (Editors: Shalabh and C. Heumann), pp. 105-133.

72. Toutenburg, H., V.K. Srivastava and Shalabh (2008): "Amputation versus imputation of missing values through ratio method in sample surveys", *Statistical Papers, *Vol. 49, No. 2, pp. 237-247.

73. C. Heumann and Shalabh (2008): "Weighted Mixed Regression Estimation Under Biased Stochastic Restrictions" in * Recent Advances In Linear Models and Related Areas *(Springer) (Editors: Shalabh and C. Heumann), pp. 401-416.

74. Gaurav Garg and Shalabh (2008): "Stein-rule Estimation in Ultrastructural Model Under Exact Linear Restrictions", *Journal of Statistical Research ( Invited paper for the special issue in honor of Professor Mir Maswood Ali) *Vol. 42, No. 2, pp. 159-180.

75. Shalabh, H. Toutenburg and C. Heumann (2008): "Mean Squared Error Matrix comparison of Least Squares and Stein-Rule Estimators for Regression Coefficients under Non-normal Disturbances", *Metron, *Vol. LXVI, No. 3, pp. 285-298.

76. H. Toutenburg, Shalabh and C. Heumann (2009): "Optimal Estimation in a Linear Regression Model Using Incomplete Prior Information'' in ** Statistical Inference, Econometric Analysis and Matrix Algebra**(Springer) (Editors: Bernhard Schipp and Walter Kraemer), pp. 185-200.

77. Pen-Hwang Liau and Shalabh (2009): "Confidence Interval Estimation in Ultrastructural Model", *Communications in Statistics**(Theory & Methods)*, 38:5, pp. 675-681.

78. Shalabh, C.M. Paudel and N. Kumar (2009): "Consistent estimation of regression parameter under replicated ultrastructural model with non-normal errors", * Journal of Statistical Computation & Simulation*, Vol. 79, No. 3, pp. 251-274.

79. Shalabh, Gaurav Garg and Neeraj Misra (2009): "Use of Prior Information in the Consistent Estimation of Regression Coefficients in a Measurement Error Model", *Journal of Multivariate Analysis*, Vol. 100, pp. 1498-1520.

80. Shalabh, H. Toutenburg and C. Heumann (2009): "Stein-Rule Estimation under an Extended Balanced Loss Function", *Journal of Statistical Computation & Simulation*, Vol. 79, No. 10, pp. 1259-1273.

81. A. Kukush, A. Malenko, H. Schneeweiss and Shalabh (2010): "Optimality of Quasi-Score in the Multivariate Mean-Variance Model with an Application to the Zero-Inflated Poisson Model with Measurement Errors", *Statistics*, Vol. 44, No. 4, pp. 381-396.

82. Shalabh, Gaurav Garg and Neeraj Misra (2010): "Consistent Estimation of Regression Coefficients in Measurement Error Model Using Stochastic Apriori Information", *Statistical Papers*, Vol. 51, pp.717-748.

83. Shalabh, H. Toutenburg and A. Fieger (2010): "Using Diagnostic Measures to Detect Non-MCAR Processes in Linear Regression Models With Missing Covariates" *Journal of Statistical Research*, Vol. 44, No. 2, pp. 233-242 (Invited paper in honor of Professor Bradley Efron).

84. Shalabh and C. Heumann (2011): "Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators", Technical Report No. 104, Department of Statistics, University of Munich, Munich, Germany.

85. Shalabh, Gaurav Garg and Neeraj Misra (2011): Estimation of Regression Coefficients in a Restricted Measurement Error Model using Instrumental Variables", *Communications in Statistics (Theory & Methods), *Vol. 40, pp. 3614-3629.

86. Gaurav Garg and Shalabh (2011): "Simultaneous Predictions under Exact Restrictions in Ultrastructural Model'', *Journal of Statistical Research* (in Special Volume on Measurement Error Models) Vol. 45, No. 2, pp. 139-154.

87. M. Wissmann, H. Toutenburg and Shalabh (2011): "Role of Categorical Variables in Multicollinearity in Linear Regression Model", *Journal of Applied Statistical Science, *Volume 19, Issue 1, pp. 99-113.

88. Karthikeyan, G., J. Ramkumar and Shalabh (2012): "Performance Analysis of mu-ED-Milling Process Using Various Statistical Techniques'', *International Journal of Machining and Machinability of Materials,* 123, pp. 183-203.

89. Shalabh and C. Heumann (2012): "Simultaneous Prediction of Actual and Average Values of Study variable Using Stein-rule Estimators" in *Some Recent Developments in Statistical Theory and Application*, (Editors: K. Kumar and A. Chaturvedi), pp. 68-81, Brown Walker Press, U.S.A.

90. Sangita Kulathinal, Shalabh and Bijoy Joseph (2012): "Analysis of Pooled Time Series and Spatial Data with an Application to Water Level Data'', *Journal of Applied Statistical Science, *Vol. 18, No. 3, pp. 419-430.

91. Shalabh, G. Garg and C. Heumann (2012): "Performance of Double k-class Estimators for Coefficients in Linear Regression Models with Non Spherical Disturbances under Asymmetric Losses'', *Journal of Multivariate Analysis*, 112, pp. 35-47.

92. Shalabh (2013): "A revisit to the efficient forecasting in linear regression models'', *Journal of Multivariate Analysis*, 114, pp. 161-169.

93. A.K.Md.E. Saleh and Shalabh (2014): "Ridge Regression Estimation Approach to Measurement Error Model", *Journal of Multivariate Analysis, *123, pp. 68-84. Extended version of the paper [*This paper is in the category of ``Most Downloaded paper'' from JMVA in January 2014*]

Corrigendum: *Journal of Multivariate Analysis, *2014, 127, pp. 214.

94. C.L. Cheng, Shalabh and G. Garg (2014): "Coefficient of Determination for Multiple Measurement Error Models", *Journal of Multivariate Analysis, * 123, pp. 137-152. [*This paper is in the category of ``Most Downloaded paper'' from JMVA in January 2014*.]

95. Anoop Chaturvedi and Shalabh (2014): "Bayesian Estimation of Regression Coefficients under Extended Balanced Loss Function", * Communications in Statistics - Theory and Methods*, Vol. 43, pp. 4253-4264.

96. C.L. Cheng, Shalabh and G. Garg (2016) : "Goodness of Fit in Restricted Measurement Error Models", *Journal of Multivariate Analysis, *145, pp. 101-116.

97. Shalabh and C. Heumann (2017): "Use of Regression Method for Estimating Population Mean from Incomplete Survey Data through Imputation", *Journal of Applied Statistical Science*, Vol. 22, No. 3-4, pp. 407-427.

98. Shalabh and Jia-Ren Tsai (2017): "Ratio and Product Methods of Estimation of Population Mean in the Presence of Correlated Measurement Errors'', * Communications in Statistics (Simulation and Computation), *Vol. 46, No. 7, pp. 5566-5593.

99. Shalabh, Jia-Ren Tsai and Pen-Hwang Liau (2016): "Immaculating the Inconsistent Estimator of Slope Parameter in Measurement Error Model with Replicated Data", *Journal of Statistical Computation and Simulation*, (In press).

1. G. Karthikeyan, J. Ramkumar and Shalabh (2009): ``Estimation of Diameter Machining of Tungsten Electrode by Micro Block EDG Process'', Proceedings of IPRoMM 2009 (National Conference on Design and Manufacturing Issues in Automotive and Allied Industries), 10-11 July 2009, Chennai, India, Eds. R. Gnanamoorthy, M. Kamraj and M. Sreekumar.

2. Shalabh and G. Garg (2013): ``Coefficient of Determination for Multiple Measurement Error Models'', Proceedings of the 59th ISI (International Statistical Institute) World Statistics Congress, 25-30 August 2013, Hong Kong (Session STS044).

Regression analysis is a family of statistical tools that can help sociologists better understand and predict the way that people act and interact. Regression analysis is used to build mathematical models to predict the value of one variable from knowledge of another. Although statistical methods of correlation offer researchers techniques to help them better understand the degree to which two variables are consistently related, such knowledge alone is typically insufficient to predict behavior. Simple linear regression allows the value of one dependent variable to be predicted from the knowledge of one independent variable. Multiple linear regression can be used to develop models to predict the value of a dependent variable from the knowledge of the value of more than one independent variable.

Research Methods

### Overview

Regression analysis is a family of statistical tools that can help sociologists better understand the way that people act and interact in groups and society. Regression analysis allows researchers to build mathematical models that can be used to predict the value of one variable from knowledge of another. There are a number of specific regression techniques that can be used by sociologists to model real-world behavior. These include:

* Simple linear regression analysis, which allows the modeling of two variables, one independent and one dependent

* Multiple linear regression analysis, which allows the modeling of two or more independent variables to predict one dependent variable

* Multiple curvilinear regression, where the relationship between variables is nonlinear (e.g., quadratic)

* Multivariate linear regression, which allows the simultaneous examination of several dependent variables

* Multivariate polynomial regression, which can be used to account for nonlinear relationships

The most commonly used of these techniques, simple linear regression and multiple linear regression, are discussed in the following sections.

### Simple Linear Regression

Statistics offers sociology researchers a number of correlation techniques to help them better understand the degree to which two variables are consistently related. For example, correlation can help one understand the relationship between educational level and income level. Correlation coefficients show the degree of relationship between two variables with a value between zero and one. A correlation of 1.0 shows that the variables are completely related and a change in the value of one variable will signify a corresponding change in the other, while a correlation of 0.0 shows that there is no relationship between the two variables and that knowing the value of one variable will tell us nothing about the value of the other.

In addition to signifying the degree of relationship between two variables, a correlation coefficient also shows how the two variables are related. A positive correlation means that as the value of one variable increases, so does the value of the other variable. A negative correlation, on the other hand, means that as the value of one variable increases, the value of the other variable decreases. An example of a high positive correlation would be the relationship of weight to age for healthy children: the older the child is, the more he or she will probably weigh. An example of a high negative correlation would be the relationship between temperature and the likelihood of snow: the higher the temperature is, the less likely it is to snow.

However, as helpful as knowing what the correlation between two variables is, that knowledge alone does not necessarily give us sufficient information to predict behavior. For example, although we may know that people who do their grocery shopping when they are hungry are more likely to buy impulse items than those who are not, we cannot necessarily accurately predict that just because a person is hungry, he or she will purchase unneeded items at the grocery store. Merely knowing that there is a positive correlation between these two variables is insufficient to allow us to predict whether a given person or type of person is more likely to exhibit this behavior. In situations where one needs to be able to predict the value of one variable from knowledge of another variable based on the data, one needs to use simple linear regression.

Simple linear regression is a bivariate statistical tool that allows the value of one dependent variable to be predicted from the knowledge of one independent variable. Examples of sociological applications of simple linear regression include predicting the crime rate from population density, voting behavior in an election from voting behavior in the primary, and relative income based on gender. The pairs of data used in linear regression analysis are typically graphed on a scatter plot that shows the values of the points for two-variable numerical data. A line of best fit is superimposed on the scatter plot and used to predict the value of the dependent variable based on different values of the independent variable. A sample scatter plot with line of best fit is shown in Figure 1.

The equation for the regression line is determined by the statistics equivalent of the linear slope-intercept equation from basic algebra, y = mx + b:

ŷ = β0 + β1x + ∈

where

ŷ = the predicted value of y

β0 = the population y intercept

β1 = the population slope

∈ = the error term

For example, a sociologist interested in the behavior of small groups might want to determine whether or not the efficacy of the decisions made in small groups could be predicted from the number of people in the group. Although larger group size could mean that there are more ideas, more contribution to the thinking process, and a larger potential for synergistic thinking, a larger group could also mean that more time would be required to reach a decision, the competition of ideas could lead to confusion, and coalitions could form within the group and make it harder to resolve disagreements. A predictive model for group size versus efficacy of decision making could be developed by setting up an experiment that compared the efficacy of decision making on the same problem for groups of various sizes. The slope of the line of best fit passing through the data points on the scatter plot could be mathematically calculated, using these data points to determine the equation of the simple regression line. This equation could then be used by the sociologist to recommend optimal group size for similar types of decisions or projects based on the single variable of number of group members.

The problem with drawing a line of best fit through a scatter plot, of course, is that unless all the pairs of data fall on one straight line, it is possible to draw multiple lines through a data set. The question faced by the researcher is how to determine which of these possible lines will yield the best predictions of the dependent variable from the independent variable. This can be accomplished mathematically through residual analysis.

In regression analysis, a residual is defined as the difference between the actual y values and the predicted y values, or y - y^. To find the line of best fit, it is important to reduce the distance between the points on the scatter plot and the line. This is done by minimizing the sum of the squares of the residuals in order to find the line of best fit. By looking at the residuals, a researcher can better understand how well the regression line fits past data in order to estimate how well it will predict future data.

Standard regression analysis techniques make several Assumptions, including that the model is correct and that the data are good. Unfortunately, the types of real-world data needed by sociologists tend to be messy. As a result, these assumptions are rarely met in practice. Many factors can contribute to the problems in regression analysis, including the use of the incorrect functional form, which is used for the regression function; correlation of variables; inconstant variance; sample data with outliers; and multicollinearity among subsets of the input variables such that they exhibit nearly identical linear relations. If one or more of these problems occur, the entire analysis may be invalidated. This risk is complicated by the fact that there are few indications in standard statistics to indicate when these problems have occurred. Although there are other indicators and potential remedies for these situations, they must be used...

## 0 Replies to “Linear Regression Models Term Paper”