Important to limit the number of factors in the model for 2 reasons:
o Overfitting – When the number of factors is close to or larger than the number of data
points the model might fit too closely to random effects
o Simplicity – on aggregate simple models are better than complex ones. Using...
- Important to limit the number of factors in the model for 2 reasons:
o Overfitting – When the number of factors is close to or larger than the number of data
points the model might fit too closely to random effects
o Simplicity – on aggregate simple models are better than complex ones. Using less factors
means that less data is required and the is a smaller chance of including insignificant
factors. Interpretability is also crucial. Some factors are even illegal to use such as race
and gender in addition to factors that are also predictive of these attributes.
- Forward Selection: A method of variable selection method where we start with a model
containing no factors. At each step individual step, we find the best new factor to add to the
model via iteration. When there is no longer another factor that meets quality thresholds, or we
reach a max number of factors then we stop iterating and arrive at the final model.
- Backward Elimination: This process is the opposite of forward selection as we start with a full
model where at each step, we remove insignificant variables until we arrive at a satisfying
model.
- Stepwise Regression: Combination of both forward selection and backward elimination. There
are two types backwards which starts with a full model or forward which starts with the null
model. Then implements a hybrid approach of the two adding and selecting variables iteratively
to return a satisfying model.
- Each of the stepwise approaches are known as greedy algorithms as each decision is made at
each step with only enough consideration for the immediate result of the step and not the global
state or future steps. At each step takes the one thing that looks like the immediate best
decision. Future options are not considered.
- Lasso Approach: A more modern optimized approach to variable selection using global
optimization. Add a constraint to the standard regression equation which sets a budget on the
sum of the models’ coefficients. This constraint in effect limits the size of coefficients thus
making our model a lot more of this coefficient size budget to the most important coefficients /
variables. All non-important variables will be allotted zero in the coefficient budget which thus
leaves them out of the new selection. Since we are implementing a global coefficient budget it is
important that we use scaled data as the budget needs to treat the scale of variables the same
otherwise magnitude of variables would impact the models budget allotment.
o Min ∑ni=1 (yi – (a0 + a1x1i + a2x2i + … + aixji))2
o S.t. ∑ji=1 |ai| ≤ T
- The lasso approach requires the tuning parameter T of the model to decide the size and quality
of variables.
- Elastic regression: takes the general same approach as lasso regression however, instead of just
constraining just the absolute value of the coefficients, we constrain a combination of the
absolute values of the coefficients and their squares. This is the hybrid of ridge and lasso
regression which brings with it the advantages of both as well as the bias disadvantages of both.
o Min ∑ni=1 (yi – (a0 + a1x1i + a2x2i + … + aixji))2
, o S.t. L* ∑ji=1 |ai|+ (1-L) * ∑ji=1 ai2≤ T
- Ridge Regression: A special form of Elastic Nets which results from taking out the absolute value
within the Elastic Net constraints with the L or lambda value of 1. Ridge Regression is not a
variable selection approach per se but can be used in model selection. In ridge the coefficients
shrink toward 0 to reduce variance in the estimate instead of reducing completely to zero as
with lasso. However, this introduces a given amount of bias as the coefficients that are still very
small are still within the model.
o Min ∑ni=1 (yi – (a0 + a1x1i + a2x2i + … + aixji))2
o S.t. ∑ji=1 ai2 ≤ T
- Greedy methods like forward selection, backwards elimination, and stepwise regression are easy
to implement methods which are good for initial data analysis, but often don’t perform well on
other data. They all tend to yield a set of variables that fit more to random effects than ideal
which all lead to misleadingly high r squared values. When these models are then tested on
different outside data they perform poorly.
- Lasso and Elastic nets are usually slower and harder to compute than the general step models.
However, they tend to give far better results on predictive models.
- Elastic Net Advantages:
o Variables selection benefits of Lasso
o Predictive benefits of Ridge
- Elastic Net Disadvantages:
o Arbitrarily rules out some correlated variables like Lasso
o Underestimates coefficients of very predictive variables like Ridge Regression
- Note a True rule of thumb for choosing between them all if you can try one you can probably try
all then select the best representation.
Week 9 Design of Experiments:
- The process of dealing with data collection constraints how to design an experiment in such a
way to collect data in a minimal and quick way but still large and deep enough to model on.
- Dealing with practical constraints of data collection such as surveying. If a survey is optimized to
be demographically representative, then how are we certain than sub combinations are not
incorporated into the data?
- All in all, there are two important concepts comparison and control. Some factors in order to
gain insight need to be compared but on comparable terms
- Blocking: a blocking factor can be created to create variation or account for variability via
category of another feature. Think variation of price of cars by color controlled for other factors.
Then subset out via blocking the type of car ex. Sports car, family van, sedan, etc. This creates
variation in the sample with the goal of attributing more overall variation in the model to being
explained rather than by chance.
- A/B Testing: Design of experiment approach to choosing between 2 alternatives. Put out both
alternatives on a smaller scale and test performance of each then determining if either is
statistically better or worse than the other. Thus we could actually do the hypothesis testing of
the alternatives in real time and halt the test when the difference between the alternatives
, becomes statistically significant/extreme enough to determine which is better. The following
things need to be true to use A/B testing:
o We need to be able to collect a lot of data quickly
o We need data that is representative of the population
o The amount of data must be small compared to the whole population
- Factorial Design: answers the question which factors within alternatives are important? Full
factorial design would take alternatives, break them into combinations of factors and test them
all. Using Anova analysis will all we can determine statistically which factors are important.
However, this is only possible when the number of combinations is reasonably small. Instead we
can choose a subset of combination which is known as fractional or partial factorial design. A
balanced design would test each choice of feature the same number of times and each pair of
choices the same number of times.
- Independent Factor approach: Tests a subset of combinations and uses regression to estimate
effects of features. This is only possible if we have come to believe that the factors are
modellable via interaction terms or solely independent.
- All these approaches are/can be effective when they are used before modeling and even before
collecting data.
- Exploration vs. Exploitation: If we are faced with the choice of numerous alternatives, what is the
risk that at some point we arrive at the best model but continue to test which in turn creates
wasted time, wasted samples, wasted resources. At what point should it be mathematically
viable to exploit a known solution even if we are not certain it is the best as continuing to
explore solutions would incur far more cost.
- Every time we are presented with the opportunity to show an ad (example) we have to strike a
balance between the benefits of getting more information along with its cost as well as the
tradeoff with immediate value our current ad contains.
- Multi-armed Bandit Approach: Dubbed the general approach to the problem of exploitation vs.
exploration problem. Suppose we test K alternatives each of which we have no knowledge of so
generally we assume they each have equal probabilities. Then we choose and test and
alternative and update the probabilities of the other K-1 bandits being best. We then continue
this test update our estimates until we arrive at a situation where statistically it is likely we know
the best alternative and can abandon testing all together. In this way we are exploiting our
knowledge of past information while exploring only additional possibilities according to
previously exploited information. Along the way we are getting more reward for picking
alternatives that are more likely to be better. We can alter several parameters in the multi-armed
bandit approach such as:
o Number of tests between recalculating probabilities
o How to update probabilities
o How to pick an alternative to test based on probabilities and or expected values.
o No simple rule but better than running a fixed large number of testes
- Overall, the multi-armed bandit approach has no simple rule but is largely better than traditional
approaches such as running a fixed large number of tests. The approach is thus worthwhile due
to its ability to learn on the fly and create more incremental value in iteration.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller PossibleA. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.39. You're not tied to anything after your purchase.