2. What are the Step 1: Divide data into testing and training data.
steps to evaluate Random subsampling: allocate a specific percentage of
prediction accu- data randomly to each (e.g., 75% training and 25% testing)
racy? K-fold cross-validation: divide data into K folds of of ap-
proximately equal sizes then allocate (K-1) folds for train-
ing and 1-fold for testing
Random subsampling is computationally more expensive
than k-fold CV but less subjective to random folds in the
data The larger the k, the less bias, but the more variance
Step 2: Fit regression model to training data.
Step 3: Evaluate prediction accuracy based on testing
data using an accuracy
Step 4: Apply Steps 1-3 multiple times then average the
prediction accuracy measure over all repetitions.
3. What is regres- A non-deterministic modeling technique that we use to
sion? analyze and estimate the values of a (response) variable
by using other variables that it's correlated with
By using a regression model, we try to explain and predict
the total variability of y (response/dependent variable) us-
ing x (predictor/independent/explanatory variable)
Linear models are simple to understand and tend to work
well even if they don't fully represent reality
4. What are the two Response (dependent): variable that we are interested in
types of vari- understanding or modelling, usually represented as Y. Re-
ables in regres- sponse variable is random variable, varies with changes
sion? in the predictor/s along with other random changes
Predicting/explanatory variables: set of variables that we
think might be useful in predicting or modelling the re-
sponse variable (say the price of the product, competitor's
, ISYE 6414 Final Exam
price, etc. )
Predicting variables are fixed variables, do not change
with the response, but it is set fixed before the response
is measured
5. What are the 1. Prediction of response variable
three objectives 2. Modelling the relationship between the response vari-
in regression able and the explanatory variables
analysis? 3. Testing hypotheses of association relationships
6. What are the 4 as- o Linearity/ mean zero assumption
sumptions of lin- o Constant variance assumption
ear regression? o Independence assumption: are independent random
variables
o Later we assume that errors are normally distributed
7. What are the un- · The unknown parameters are intercept, slope, and vari-
known parame- ance
ters in linear re- o Unknown regardless how much data is observed
gression? o Estimated given the model assumptions
o Estimated based on data
8. How do we de- Line that minimizes the sum of squared errors
fine the "best fit"
linear regression
line?
9. What is the vari- · Variance sampling follows a chi-squared distribution with
ance sampling n-2 degrees of freedom
distribution for Estimator of the variance of the error terms is estimated
SLR? to be the variance: SSE/ (n-2)
N-2: lose two degrees of freedom because we are estimat-
ing two variables (beta_0, beta_1)
10. How do we inter- ²_1 > 0:direct relationship between x and y
pret model para- ²_1 < 0:inverse relationship between x and y
meters comput- ²_1 ~ 0:not significant association between x and y
ed in linear re- "²" _1is the estimated expected change in the response
gression? variable associated with one unit of change in the predict-
ing variable
, ISYE 6414 Final Exam
"²" _0is the estimated expected value of the response
variable when the predicting variable equals zero.
11. Explain the dif- · If x* is one of the observations for the predicting variable,
ference between then we use estimation. Estimated regression line for the
estimating vs. value x* is interpreted as the average estimated mean re-
predicting sponse for all settings under which the predicting variable
is equal to x*
· If x* is a new observation of the predicting variables, then
we use prediction. Predicted regression line for the value
x* is interpreted as the estimated mean response for one
setting under which the predicting variable is equal to x*
12. Explain the dif- Prediction interval is used to provide an interval estimate
ference between for a prediction of y for one member of the population with
confidence and a particular value of x*
prediction inter- Confidence interval is used to provide an interval estimate
vals for the true average value of y for all members of the
population with a particular value of x*
Confidence interval is for all members of population, pre-
diction member is for one member of population
Prediction interval tends to be wider than confidence inter-
val
13. Let's assume No, a model without an intercept gives worse results even
that we run sta- if passing through the origin makes sense. Also gives us
tistical inference an imprecise CI and PI
on the intercept
of a regression
model and we
find it is insignif-
icant. Should we
remove the inter-
cept?
14. How do we check Plot X vs. Y and examine for linearity
the linearity as- May see a bi-modal distribution but this could just be due
sumption? to the variables having qualitative predictors
Plot predicting values (X) vs. residuals (multiple linear
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur TheAlphanurse. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour $14.99. Vous n'êtes lié à rien après votre achat.