Monday, April 25, 2016

WEEK_11: Multiple Regression using R

Hi there!

Today we will discuss about Multiple Regression using R. In the previous post, we have discussed about Linear Regression using  R. You need to know about Linear Regression to understand Multiple Regression better. If you had missed my previous post, find it here.

Multiple regression is an extension of linear regression into relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regression we have more than one predictor variable and one response variable.

The general mathematical equation for multiple regression is:

y = a + b1x1 + b2x2 +...bnxn
 
y is the response variable.
a, b1, b2 ... bn are coefficients.
and x1, x2, ... xn are predictor variables.

We use same lm() function(which we used for Linear Regression) to create the regression model.
Here we use lm function with different parameters.
Basic syntax is:
 
lm(y ~ x1+x2+x3..., data)
 
Let us consider one example. Here we will analyze a data set which contains information used to estimate undergraduate enrollment at the University of New Mexico.Download the data set here.

#read data into variable
datavar <- read.csv("dataset_enrollmentForecast.csv")
 
#attach data variable
attach(datavar)
 
#predict the fall enrollment (ROLL) using the unemployment rate (UNEM) and number #of spring high school graduates (HGRAD).
twoPredictorModel <- lm(ROLL ~ UNEM + HGRAD, datavar)

#display model
twoPredictorModel


twoPredictorModel

From this output, we can determine that the intercept is -8255.8, the coefficient for the unemployment rate is 698.2, and the coefficient for number of spring high school graduates is 0.9. Therefore, the complete regression equation is Fall Enrollment = -8255.8 + 698.2 * Unemployment Rate + 0.9 * Number of Spring High School Graduates. This equation tells us that the predicted fall enrollment for the University of New Mexico will increase by 698.2 students for every one percent increase in the unemployment rate and 0.9 students for every one high school graduate.

#predict the fall enrollment (ROLL) using the unemployment rate (UNEM), number of #spring high school graduates (HGRAD), and per capita income (INC)
threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, datavar)

#display model
threePredictorModel



threePredictorModel

From this output, we can determine that the intercept is -9153.3, the coefficient for the unemployment rate is 450.1, the coefficient for number of spring high school graduates is 0.4, and the coefficient for per capita income is 4.3. Therefore, the complete regression equation is Fall Enrollment = -9153.3 + 450.1 * Unemployment Rate + 0.4 * Number of Spring High School Graduates + 4.3 * Per Capita Income. This equation tells us that the predicted fall enrollment for the University of New Mexico will increase by 450.1 students for every one percent increase in the unemployment rate, 0.4 students for every one high school graduate, and 4.3 students for every one dollar of per capita income.

#generate model summaries
summary(twoPredictorModel)


Summary of twoPredictorModel


summary(threePredictorModel)

 

Summary of threePredictorModel
 
#Meaning of these output values are same ad that of Linear Regression model. 
#Please refer my previous post for more info.  
 
Thanks for visiting my blog.  I always love to hear constructive feedback.  Please give your feedback in the comment section below or write to me personally here.

2 comments:

  1. Hello Sharath,
    The Article on Multiple Regression using R in the Data Science is nice. It give detail information about it .Thanks for Sharing the information about it. hire data scientists

    ReplyDelete
  2. Obviously I like your web site, but you have to take a look at the spelling on quite a few of your posts. Several of them are rife with spelling problems and I find it very bothersome to inform you. Nevertheless I will surely come again again! pandas reset_index

    ReplyDelete