K-Means Clustering - ANSWER -randomly assign a cluster to each observation.
This serves as the initial cluster assignments
-algorithm needs to be repeated for each K
-number of cluster must be pre-specified
Hierarchical Clustering - ANSWER -number of cluster does not need to be
pre-specified
-the algorithm only needs to be performed once for any number of clusters
-does not require random assignments
-results of clustering depends on the choice of number of clusters, dissimilarity
measure, and linkage
Principal Component Analysis - ANSWER -provides low-dimensional linear
surfaces that are closest to the observations
-the first principal component is the line in the p-dimensional space that is the
closest to the observations
-finds a lower dimension representation of a dataset that contains as much
variation as possible
-serves as a tool for data visualization
-uses all variables
Principal components - ANSWER -the proportional of variance explained by an
additional principal component decreases as more principal components are
added
-the cumulative proportion of variance explained increases as more principal
components are added
-the least number of principal component provides the best understanding of the
data
-a scree plot provides a method for determining the number of principal
components to use
,Which is most appropriate to model if a personal is hospitalized or not -
ANSWER - Binomial distribution
-logit link function ( restricts values to range 0 to 1) like binary classification
Alternative fitting procedure - ANSWER - removes irrelevant variables from
the predictor, thus leads to a simpler model
-results are easier to interpret
-accuracy will improve due to reduction in variance
Random Forest - ANSWER -if the number of predictors used at each split is
equal to the total number of available predictors, the result is the same as using
bagging
-when building a specific tree, a new subset of predictor variables is used at each
split
-improvement over bagging because the trees are decorrelated
Linear regression - ANSWER -considered inflexible because the number of
possible models is restricted to a certain form
-allows the analyst discretion regarding adding or removing variables
Lasso Regression - ANSWER -less flexible than a linear regression
-determines the subset of variables to use while linear regression allows the
analyst discretion regarding adding or removing variables
-performs variable selection, because it is possible for the coefficient estimates to
be exactly 0
- as tuning parameter increases, flexibility decrease
-irreducible error will remain constant
Bagging - ANSWER -provides additional flexibility
Flexibility & easy to interpret - ANSWER there is a trade off between flexibility
and easy to interpret
simple linear regression - ANSWER y = B0 + B1x+e
- if e= 0 then the confidence interval equals the prediction interval because the
, prediction interval includes the irreducible error
- the prediction interval is always at least as wide as the confidence interval
- the confidence interval quantifies the possible range for E(y I x)
Bias-Variance Tradeoff - ANSWER -bias refers to the error arising from the
assumption in the statistical learning tool
-variance refers to the error arising from the sensitivity of the training data set
-as model flexibility increases, squared bias decreases and variance increases
For K-nearest neighbors classifier, as K increase - ANSWER - squared bias
increases
-variance decrease
-flexibility decreases
regression problems - ANSWER problems with a quantitative response
classification problems - ANSWER problems with a qualitative response
supervised problems - ANSWER -problems with a response
-Boosting
-K-nearest neighbors
-Regression tree
-logistic regression
-ridge regression
unsupervised problems - ANSWER -problems without a clear response
-cluster analysis
-K-means clustering
Best Model - ANSWER Model with the lowest MSE
For an statistical learning method, as flexibility increase - ANSWER -the
interpretability decreases
-the training MSE decreases
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller luzlinkuz. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $10.49. You're not tied to anything after your purchase.