MR_A: Model Specification
Multiple linear regression
Multiple vs simple regression
Simple linear regression Multiple linear regression
1 independent variable x K x-variables, k>1
Example Hamburger Chain
Research question: To assess the effect of different price structures and different levels of advertising
expenditure on the sales, the management sets different prices, and spends varying amounts on
advertising, in different cities. Does an increase in advertising expenditure lead to an increase in sales?
If so, is increase in sales sufficient to justify the increased expenditure?
Random experiment: pick a random store of a chain in a random city.
- Y=Sales: monthly sales (in 1000$)
- x1=Price: ‘average’ price for products (in$)
- x2=Advert: monthly advertising expenditure (in 1000$)
Multiple linear regression
As in simple linear regression, the model consists of
Y = β0 + β1 x1 +… + βk xk + ε w/ 2 parts: systematic part (linear fun.) & random part (error term, st.dev)
- A systematic part that provides us with information on how a combination of x-outcomes results in an
average value for Y: μY|x.
- A random error term ε to account for the fact that Y|x (Y variable for some x variable) is a random
variable. (For the example, depending on which place you pick, you get a different outcome.)
Graphically
Multiple linear regression is not represented by a line anymore. It can
be visualized using a (hyper)plane. (If more than 2 x variables, not
representable anymore.)
Example Hamburger Chain
- Y=Sales
- x1=Price
- x2=Advert
Classical multiple linear regression
The assumptions that were introduced for simple linear regression remain. In addition, in assumption
A4 now we make two assumptions about the explanatory variables.
Classical assumptions for multiple linear regression
1) ̅) zero for all x)
A1: μY|x = β0 + β1 x1 + … + βK xk (ε has mean (𝒙
2) A2: ε has constant standard deviation σ, also called homoskedasticity.
3) A3: cov(εi,εj)=cov(Yi,Yj) =0. → The error term for different observations is independent.
4) A4: Variables xi are non-random (which can be relaxed to the assumption that x is not correlated with
the error term) and are not exact linear functions of the other explanatory variables. This means that
the x’s should all measure different things, meaning I cannot find one x based on another (few) x’s.
5) A5: (optional) ε is normally distributed. (If sample is large enough, we can depend on Central Limit
Theory.)
1
, Interpretation of the parameters
- Intercept β0: This is the value we get if all the x-values are turned to 0 (Average value for Y if all x=0).
However, in a linear function, this is often not relevant because in most situation, an x value of zero
has no direct interpretation. However, except in very special cases, we always include an intercept in
the model, even if it has no direct economic interpretation. Omitting it can lead to a model that fits
the data poorly and that does not predict well.
- Coefficients βi: a slope in the xi direction, measures the effect of a change in the variable xi upon the
expected value of y. Ceteris paribus = if all other variables held constant. (IMPORTANT TERM!!!)
𝜕𝐸(𝑌)
➔ As such, it is linked to the partial derivative
𝜕𝑥𝑖
Example Hamburger Chain: Y = Sales; x1 = price; x2 = advert
- β0: interpretation for price=0 and advert=0, resulting in your sales but this is not realistic.
- β1: the change in monthly Sales ($1000) when the price index Price is increased by one unit ($1) and
advertising expenditure Advert is held constant.
Model specification
It is important to carefully think about the regression model specification: What functional form? μY|x
= f(x).
- Linear function versus non-linear functions
- How to account for qualitative x-variables?
- How to account for interaction effects between x-variables?
- Choice of explanatory variables?
Non-linear models
- As in simple linear regression, non-linear relationships can be modeled using a multiple ‘linear’
regression model through the use of appropriate transformations.
- What appropriate transformations should be used should be led by economic theory, experts,
considering. Example: slope properties.
- Does model provide a good fit for the data?
Example Hamburger Chain: We initially hypothesized that sales revenue is linearly related to price and
advertising expenditure: SALES = β0+β1 PRICE + β2 ADVERT. However, economic theory can show us
that the sales are actually not linear dependent on the advertisement but have a slight hill. Therefore,
the impact of the advertisement is different between low and high investment. But is this a good
choice? When having a non-constant slope, it is not good to use the linear function. Remember that
before we suggested that adding ADVERT² or using the logarithm of ADVERT might be a good idea.
Transformations
- The logarithmic transformation is a common transformation in economical applications. It solves a lot
of non-linear functions. (Example Hamburger Chain: Economic theory can show us that the sales are
actually not linear dependent on the advertisement but have a slight hill. Then it could be useful to use
a logarithmic transformation which changes the formula to SALES = β0+β1 PRICE + β2 ln(ADVERT).)
- Polynomial functions: When we studied these models with the simple regression model, we were
constrained by the need to have only one right-hand-side variable, such as Y = β0 + β1 x². Now, within
the framework of the multiple regression model, we can consider unconstrained polynomials with all
their terms included.
Example Hamburger Chain: The use of a polynomial function would change the formula to SALES =
β0+β1 PRICE + β2 ADVERT + β3 ADVERT².
➔ SEE EXAMPLES BELOW
2
, Log transformations
Example Hamburger chain
Consider model with ln(Advert).
Sales = β0 + β1 Price + β2 ln(Advert) + ε.
β2 = 3.456, this indicates the effect that ln(Advert) has on Sales. When the advertisement expenditure
increases on average by 1%, the sales increase approximately by 0.03456 * (1000$), ceteris paribus
(meaning that we keep the cost of the price basket constant).
Linear-log (linear = sales; log is advertisement) model is used in this example. Meaning: 1% change in
x leads on average to approximately a β
unit change in y → 3.456/100 = 0.03456.
Polynomial model
Example Hamburger Chain
Consider the quadratic model:
Sales = β0 + β1 Price + β2 Advert + β3 Advert² + ε.
What sign do you expect for β2, β3?
β2 = 12.151; β3 = -2.768
Graphically, this is parabola (y = ax² + bx + c; a > 0: happy smiley & a < 0: sad smiley). The a of x² is
negative (β3 = -2.768; this number accompanies the x² in our equation). As sad smiley, we only use first
half of the parabola (realistic setting).
When Advertising is increased by 1 unit ($1000), this does not always
have the same effect on the Sales. In this case the effect is positive, but
the effect becomes smaller as Advert increases.
Dummy variables
- X variables with only 2 outcomes are called indicator variables = dummy variables.
- Usually, the 2 outcomes are coded by 1 or 0, to indicate the presence or absence of a characteristic
or to indicate whether a condition is true or false.
if characteristic is present
if characteristic is not present
- The value D = 0 defines the reference group of elements for which the characteristic is not present.
Example Price House: One of the main important factors is the location of the property. This means if
it located in a desirable neighborhood or not. D = 1 (if property in desirable neighborhood); D = 0 (else).
Reference group: houses not in the desirable neighborhood.
Qualitative variables
- Dummy variables are used to account for qualitative factors in econometric models.
- Even if numbers are used to code the outcomes of qualitative factors, do NOT use these codes as
such in the regression model. Introduce dummy variables!
Example Price Houses: Y =Price; x1=SQFT=Area measured in square feet; x2= variable indicating
whether house in desirable neighborhood.
➔ Create dummy variable D (1 = desirable neighborhood; 0 = else)
➔ PRICE = β0 + β1 SQFT + β2 D + ε. This means that the price depends linear on the sqft of the area
but also on the dummy variable.
3
, Interpret coefficient linked to dummy
Write down separate regression models for the outcomes
of the qualitative variable.
Example Price Houses: Price = β0 + β1 SQFT + β2 D + ε
β2 = 50.058 aka (In table: D;B)
- ̂ = 20.543 +50.058 + 0.123 SQFT → D = 50.058 * 1 due to desirable
D=1 (desirable neighborhood): 𝑝𝑟𝑖𝑐𝑒
neighborhood.
- ̂ = 20.543 + 0.123 SQFT → D disappears due to 50.058 * 0 as
D=0 (not desirable neighborhood): 𝑝𝑟𝑖𝑐𝑒
D is not true.
A house in the desirable neighborhood will have a price that is on average 50.058 units higher than a
house with the same SQFT which is not in the desirable neighborhood, ceteris paribus (the same as
same SQFT).
➔ These conclusions always give the comparison to the reference group (Bold).
Graphically
Adding the dummy variable D to a simple regression
model causes a parallel shift in the relationship by the
amount β2.
Example price houses: Price = β0 + β1 SQFT + β2 D + ε
Effect of an extra square foot is the same in both
neighborhoods.
Qualitative r.v. with several categories
If a qualitative variable has M>2 outcomes, one must introduce M-1 dummy variables.
Examples test result: Y = test score; x1 = study time in hours; x2 = highest diploma (1=ma, 2=ba, 3= hs)
X2 is the dummy variable but has 3 outcomes. Therefore, we take a reference group (own choice). The
other ones are dummy variables (0 or 1). Resulting in the following with reference group (high school):
DB = 1 if bachelor, 0 else & DM = 1 if master, 0 else. If we fill this in the columns, DB is dependable on x2
meaning that if x2 is 1, the DB is 0 due to it being a bachelor. When looking at DM, this will result into 1
as the requirement master is fulfilled. You can do this for the other possible outcomes of x2.
➔ Due to the appliance of dummy variables, we do not add
x2 in our equation but the dummy variables. This results
into Y = β0 + β1x1 + β2 DB + β3 DM + ε.
SPSS: Transform – Recode into different variables
Interpret coefficients of dummies
In case you want to interpret the dummies, write down separate regression models for the outcomes
of the qualitative variable.
Example test result: Let Y=5.2 + 3.5X1 + 0.7DB + 1.5DM+ε
- Master diploma (DB =0, DM=1): Y=5.2 + 1.5 + 3.5X1 + ε → DM is 1, DB is 0
- Bachelor diploma (DB =1, DM=0): Y=5.2 + 0.7 + 3.5X1 + ε → DB is 1, DM is 00
- High school diploma (DB =0, DM=0): Y=5.2 + 3.5X1 + ε (REFERENCE GROUP) → DM is 0, DB is 0
➔ Conclusion: An individual with a bachelor diploma scores on average 0.7 units higher than an
individual with a high school diploma, ceteris paribus (if the number of hours is being held
constant).
4