Studysonlinesatshttps://quizlet.com/_4hd1o1
1. What do descriptive questions ask?: What happened? (e.g., which customers a
s s s s s s s s s s
re most alike)
s s
2. What do predictive questions ask?: What will happen? (e.g., what will Google's st
s s s s s s s s s s s s
ock price be?) s s
3. What do prescriptive questions ask?: What action(s) would be best? (e.g., w
s s s s s s s s s s s
here to put traffic lights)
s s s s
4. What is a model?: Real-life situation expressed as math.
s s s s s s s s
5. What do classifiers help you do?: differentiate
s s s s s s
6. What is a soft classifier and when is it used?: In some cases, there won't be a lin
s s s s s s s s s s s s s s s s s
e that separates all of the labeled examples. So we use a classifier that minimizes the n
s s s s s s s s s s s s s s s s
umber of mistakes. s s
7. What does it mean when the classifier/decision boundary is almost parallel t
s s s s s s s s s s s
o the vertical x-axis?: The horizontal attribute is all that is needed.
s s s s s s s s s s s
8. What does it mean when the classifier/decision boundary is almost parallel t
s s s s s s s s s s s
o the horizontal y-axis?: The vertical attribute is all that is needed.
s s s s s s s s s s s
9. What is time- s s
series data?: The same data recorded over time often recorded at equal intervals
s s s s s s s s s s s s
10. What is quantitative data?: Number with a meaning: higher means more, lower m
s s s s s s s s s s s s
eans less (e.g., age, sales, temperature, income)
s s s s s s
11. What is categorical data?: Numbers w/o meaning (e.g., zip codes), non-nu-
s s s s s s s s s s
smeric (e.g., hair color), binary data (e.g., male/female, yes/no, on/off)
s s s s s s s s s
12. Which of these is time series data? s s s s s s
A. The average cost of a house in the United States every year since 1820
s s s s s s s s s s s s s
B. The height of each professional basketball player in the NBA at the start of t
s s s s s s s s s s s s s s
he season: As s
13. Which of these is structured data? s s s s s
A. The contents of a person's Twitter feed
s s s s s s
B. The amount of money in a person's bank account: B
s s s s s s s s s
14. What is structured data?: Data that can be stores in a structured way
s s s s s s s s s s s s
15. What is unstructured data?: Data that is not easily described and stored (e.g., w
s s s s s s s s s s s s s
ritten text) s
16. A survey of 25 people recorded each person's family size and type of car.
s s s s s s s s s s s s s s
Which of these is a data point? s s s s s s
A. The 14th person's family size and car type
s s s s s s s
B. The 14th person's family sizes s s s
C. The car type of each person: A.
s s s s s s
A data point is all the information about one observation
s s s s s s s s s
17. The farther the wrongly classified point is from the line
s s s s s s s s s s
: The bigger the mistake we've made
s s s s s s
1s/s19
, ISYE 6501 - Midterm 1 s s s s
Studysonlinesatshttps://quizlet.com/_4hd1o1
18. The term including the margin gets larger so the importance of a large mar-
s s s s s s s s s s s s s
sgin out weights avoiding mistakes and classifying known data s amples.: As
s s s s s s s s s s s
lambda gets larger
s
s s
19. That term also drops towards zero, so the importance of minimizing mis-
s s s s s s s s s s s
stakes and classifying known data points outweighs having a larg e margin.: As
s s s s s s s s s s s s
lambda drops towards zero
s
s s s
20. What can SVMs be used for: to find a classifier with maximum seperation or m
s s s s s s s s s s s s s s
argin between the two sets of points?
s s s s s s
21. When to use SVM?: If it's impossible to avoid classification errors, SVM can find a
s s s s s s s s s s s s s s s
classifier that trades off reducing errors and enlarging the margin.
s s s s s s s s s
22. Error for data point j: What does this formula describe?
s s s s s s s s s
23. Total error: What does this formula describe ?
s s s s s s s s
24. To maximize the distance between the two lines what do we need to
s s s s s s s s s s s s
minimize?:
25. m_j > 1: What value do we give for more costly errors
s s s s s s s s s s s s
26. Giving a bad loan is twice as costly as withholding a good loan ?: What does
s s s s s s s s s s s s s s
this mean in the context of giving a loan?
s s s s s s s s
27. m_j < 1: What value do we give for less costly errors?
s s s s s s s s s s s s
28. Why is it important to scale our data when using SVM?: We're looking to min
s s s s s s s s s s s s s s
imize the sum of the squares of the coefficients, but if our data has very different scales
s s s s s s s s s s s s s s s s
sa small change in one could swamp a huge change in the other.
s s s s s s s s s s s s
29. what does it signify when a coefficient for a classifier is close to zero: it m
s s s s s s s s s s s s s s s
eans the corresponding attribute is probably not relevant
s s s s s s s
30. What do kernel methods allow for in SVMs: nonlinear classifiers
s s s s s s s s s
31. What is the common range for scaled data?: between 0 and 1
s s s s s s s s s s s
32. What is the formula for min-max scaling?: find min and max fo r a factor
s s s s s s s s s s s s s
33. what is common standardization and its formula?: scaling to t a normal dis-
s s s s s s s s s s s
ribution with a mean of 0 and standard deviation of 1.
s s s s s s s s s s
34. what is the formula for general scaling between b and a:
s s s s s s s s s s s
35. When do you use scaling?: Data in a bounded range (e.g., neural networks, R
s s s s s s s s s s s s s
GB values, SAT scores, batting averages)
s s s s s
36. When do you use standardization?: PCA or clustering
s s s s s s s
37. When is KNN used?: Used for solving classification problems in which there a
s s s s s s s s s s s s
re more than two classes.
s s s s
2s/s19
, ISYE 6501 - Midterm 1 s s s s
Studysonlinesatshttps://quizlet.com/_4hd1o1
38. How do you deal with attributes that might be more important than others i
s s s s s s s s s s s s s
n KNN?: You weight each dimension's distance different. The larger the weight the hi
s s s s s s s s s s s s s
gher the impact. s s
39. A large value of K will lead to: a large variance in predictios
s s s s s s s s s s s s
40. Setting a large value of k will ...: lead to a large model bias. s s s s s s s s s s s s s
41. What are real effects?: Real relationships between attributes and responses. T
s s s s s s s s s s
hey are the same in all data sets,
s s s s s s s
42. What are random effects?: They are random but look like real effects.They are di
s s s s s s s s s s s s s s
fferent in all data sets. s s s s
43. Why can't we measure a model's effectiveness on data it was trained on?:
s s s s s s s s s s s s s
The model's performance on its training data is usually too optimistic, the model is fit
s s s s s s s s s s s s s s s
to both real and random pattenrs in the data, so it becomes overly specialized to the s
s s s s s s s s s s s s s s s s
pecific randomness in the training set, that doesn't exist in other data.
s s s s s s s s s s s
44. If we use the same data to fit a model as we do to estimate how good it is, w
s s s s s s s s s s s s s s s s s s s
hat is likely to happen?: The model will appear to be better than it really is.
s s s s s s s s s s s s s s s
The model will be fit to both real and random patterns in the data. The model's effec-
s s s s s s s s s s s s s s s s
tiveness on this data set will include both types of patterns, but its true effectiveness o
s s s s s s s s s s s s s s s s
n other data sets (with different random patterns) will only include the real patterns
s s s s s s s s s s s s s
45. When comparing models, if we use the same data to pick the best model a
s s s s s s s s s s s s s s
s we do to estimate how good the best one is, what is likely to happen?: The mo
s s s s s s s s s s s s s s s s s
del will appear to be better than it really is.
s s s s s s s s s
The model with the highest measured performance is likely to be both good and lucky
s s s s s s s s s s s s s s
n its fit to random patterns.
s s s s s
46. What is a training set used for: used to fit the models s s s s s s s s s s s
47. What is a validation set used for?: used to choose best model
s s s s s s s s s s s
48. Why would we use two sets?: Reason to use two different sets is because if the fir
s s s s s s s s s s s s s s s s
st set, the training set, had unique random effects that the classifer was designed for,
s s s s s s s s s s s s s s s
we wouldn't be counting those benefits when we measure effectiveness on the valida
s s s s s s s s s s s s
tion set. s
49. What effects does randomness have on training /validation performance?-
s s s s s s s s
: sometimes the randomness will make the performance look worse than it really is, a
s s s s s s s s s s s s s s
nd sometimes the randomness will make the performance look better than it really is
s s s s s s s s s s s s s
50. how are high- s s
performing models affected by randomness?: They are often boosted by above a s s s s s s s s s s s
verage random effects making it look better s s s s s s
51. what is a test data set used for?: to estimate performance of chosen model
s s s s s s s s s s s s s
3s/s19
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller AGRADEPROMASTER. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $12.99. You're not tied to anything after your purchase.