Reducible error - correct answer ✔✔Error from f^ not being a perfect estimate of f. Can potentially
improve the accuracy of f by using the most appropriate statistical learning technique to estimate f.
Irreducible Error - correct answer ✔✔Error that can't be reduced by even a perfect model
Prediction - correct answer ✔✔Simply finding the output of the function. What is y from f(x).
Inference - correct answer ✔✔Understanding the association between Y and X-variables (predictors)
Questions to ask for inference - correct answer ✔✔Which predictors are associated with the response?
What is the relationship between the response and each predictor?
Can the relationship between Y and each predictor be adequately summarized using a linear equation or
is the relationship more complicated?
Linear Models - correct answer ✔✔Allow for relatively simple and interpretable inference, but may not
yield as accurate predictions as some other approaches.
training data - correct answer ✔✔data that is used to train a predictive model and that therefore must
have known values for the target variable of the model. Goal is to apply a statistical learning method in
order to estimate the unknown function f
Parametric - correct answer ✔✔Assumes f is linear. Uses training data to fit or train the model to
estimate parameters
Disadvantages of parametric approach - correct answer ✔✔Model chosen will generally not match the
true form of f. If the chosen model is too far from the true f, then estimation will be poor. Can try to
address this by fitting a more flexible model, which requires estimating more parameters which can lead
to overfitting.
,Non-parametric - correct answer ✔✔Do not make explicit assumptions about the functional form of f.
Seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly.
Advantages of non-parametric - correct answer ✔✔Potential to accurately fit a wider range of possible
shapes for f. Can fit more closely to the true f since no assumptions are made.
Disadvantage of non-parametric - correct answer ✔✔A very large number of observations is needed
since it does not reduce the problem of estimating f to a small number of parameters.
Why would we ever choose to use a more restrictive model instead of a very flexible approach? - correct
answer ✔✔If inference is the goal since restrictive models are much more interpretable.
Models with low flexibility and high interpretability - correct answer ✔✔Subset selection, lasso, and
least squares
Models with high flexibility and low interpretability - correct answer ✔✔Bagging and boosting
Supervised Learning - correct answer ✔✔For each observation of the predictor measurements, x, thee is
an associated response measurement p, y.
Examples of Supervised Learning - correct answer ✔✔Linear and logistic regression, boosting and
bagging, partial least squares, KNN, trees
Regression problems - correct answer ✔✔problems with a quantitative response
Classification problems - correct answer ✔✔problems with a qualitative response
These spastically methods can be used in the case of either quantitative or qualitative responses -
correct answer ✔✔K-nearest neighbors and boosting
Unsupervised Learning - correct answer ✔✔For every observation we observe a vector of measurement,
x, but no associated response y.
, Examples of unsupervised learning - correct answer ✔✔Cluster, PCA/PCR
cluster analysis - correct answer ✔✔Goal is to ascertain, on the basis of x1...xn whether the observations
fall into relatively distinct groups.
Models with moderate flexibility and moderate interpretability - correct answer ✔✔Trees
Which is less flexible, lasso or least squares linear regression - correct answer ✔✔Lasso bc it is more
restrictive in estimating the coefficients and sets a number of them to 0
Examples of non-linear models - correct answer ✔✔Bagging and boosting, K-Nearest Neighbors
regression
Shape of test MSE graphically - correct answer ✔✔U-shaped, caused by bias, variance, and error
Variance - correct answer ✔✔Amount by which f^ would change if we estimated it using a different
training data set.
What results by making small changes to a method with high variance? - correct answer ✔✔Large
changes in f^
Relationship between flexibility, variance, and bias - correct answer ✔✔High flexibility=high
variance=low bias and vice versa
Bias - correct answer ✔✔Refers to the error that is introduced by approximating a real-life problem,
which may be extremely complicated, by a much simpler model.
Shape of training MSE graphically - correct answer ✔✔Decreases monotonically
K-Nearest Neighbor - correct answer ✔✔A supervised learning technique that classifies a new
observation by finding similarities ("nearness") between this new observation and the existing data.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller BravelRadon. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $17.99. You're not tied to anything after your purchase.