Full summary including an introduction of Machine Learning and algorithms, such as Decision Tree, Perceptron, Gradient Descent, Logistic Regression (classifier) and Neural Networks. This summary also includes a section about Feature Engineering. Extra context and illustrations/graphs are also given...
This summary made the difference, absolutely (B4 2019-2020).
By: isabelle_olphen • 4 year ago
By: timmeikelenboom • 5 year ago
By: jannebillekens • 5 year ago
By: misterborodach • 4 year ago
By: berend_boomen • 5 year ago
Seller
Follow
ambervdmeijs
Reviews received
Content preview
Machine Learning
Lecture 1 – Introduction
You can have collection of rules that tells the program what to do. You can write these rules by hand, and apply
them and test them. Then you notice that they work, or not work and can change it. You automating it, but you
are doing it by hand. With Machine Learning you take automation a bit further, we want the machine itself to
learn. How would that go? You need to collect some information about the distribution of words or sequences.
Learning from examples, based on supervised learning.
Find examples of SPAM and non-SPAM
Come up with a learning algorithm
A learning algorithm infers rules from examples
These rules can then be applied to new data (emails)
Types of learning problems
Machine Learning has an input space and an output space. The nature of the output determines which kind of
machine learning form/problem we are talking about.
Regression
Regression involves estimating or predicting a response. The response/the output variable takes continuous
values. Thus, a real number.
Predict person’s age
Predict price of a stock
Predict student’s score on exam
Binary classification
The output variable takes class labels, but classifies the output into two groups: a yes/no answer, e.g.
True/false or 1/0.
Detect SPAM
Predict polarity of product review: positive or negative
Predict gender: male or female
Multiclass classification
The output is one of a finite set of options. Involve mostly more than thousands of labels / classes / categories.
Each training point belongs to one of n different classes. The goal is to construct a function which, given a new
data point, will correctly predict the class to which the new point belongs to.
Classify subject newspaper articles: politics, sports, science, technology, health, etc.
Detect species based on photo: passer domesticus, calidris alba, etc.
Multilabel classification
Multilabel classification is a classification problem where multiple target labels can be assigned to each
observation instead of only one. A multilabel classifier has to product a vector of output values. The output is
based on yes/no answers. You can think of it as a binary classification.
Assign songs to one or more genres:
o {rock, pop, metal}
o {hip-hop, rap}
o {jazz, blues}
o {rock, punk}
Ranking
Order object according to relevance. Ranking models for information retrieval systems. Training data consists
of lists of items with some partial order specified between items in each list.
Rank web pages in response to user query
Predict student’s preference for courses in a program
,Sequence labelling
Type of pattern recognition task that involves the algorithmic assignment of a categorical label to each member
of a sequence of observed values (e.g. speech tagging). Input is a sequence of elements (words) and the
response is a corresponding a sequence of labels.
Labels words in a sentence with their syntactic category
Labels frames in speech signal with corresponding phonemes (W, ð, Ɛ, ɚ)
o N inputs | N inputs | N not necessarily = M | Sequence 2 sequence
o N outputs | M outputs | |
Autonomous behaviour
The input are measurements from sensors – camera, microphone, radar, accelerometer, etc. and the response
are instructions for actuators – steering, accelerator, brake, etc.
Supervise learning is very often improved with reinforcement learning: learn from the sequence. It works with
positive and negative learning. Supervised learning is not the end of the story, but sometimes it is not really
applicable. Unsupervised learning became a very important approach also.
In what situation do you use F1 score instead of accuracy?
___________________
___________________________________
Evaluation
How well is the algorithm learning? You can evaluate the performance by using different evaluation metrics.
Mean Absolute Error
The average absolute difference between true value and predicted value
Mean Squared Error
The average square of the difference between true value and predicted value.
The aforementioned metrics can be used for predicting age (regression, numerical output) with a preference to
MSE. The MSE exaggerates the outliers (/magnitude of big numbers), and the MAE does not.
Accuracy
Accuracy is calculated as the number of all correct predictions divided by the total number of the dataset. The
best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1- error rate.
(TP + TN) / (P + N)
Error rate
It is a proportion of mistakes The error rate is calculated as the number of all incorrect predictions divided by
the total number of the dataset. The best error rate is 0.0, whereas the worst 1.0.
(FP + FN) / (P + N)
Predicting gender could use accuracy or the error rate as evaluation metric. However, for flagging spam
purposes error rate is preferred. If accuracy is 99 percent, you would probably display the error rate instead.
Is there any disadvantage? The error rate does not take into account if a false negative is worse than a false
positive.
, Precision and recall
This metric is a useful measure of success of prediction when the classes are very imbalanced. In information
retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results
are returned. Metrics which focus on one kind of mistakes. Is done sizes of certain sets.
Precision
The ratio of correctly predicted positive observations to the total predicted positive observations (of
all passengers that labeled as survived, how many actual survived? /what fraction of flagged emails
were real SPAMS?)
Recall
The ratio of correctly predicted positive observations to the all observations in actual class – yes (of all
the passengers that truly survived, how many did we label? / what fraction of real SPAMS were
flagged as SPAM?)
True Positives (TP) = the correctly predicted positive values
True Negatives (TN) = the correctly predicted negative values
False Positives (FP) = when actual class is no and predicted class is yes
False Negatives (FN) = when actual class is yes but predicted class is no
F-score
The harmonic mean between precision and recall. It is a kind of average aka F-measure. This score takes both
false positives and false negatives into account.
Fbeta
Parameter B quantifies how much more we care about recall than precision. It gives different importance
between precision and recall. F0.5 would mean that we care half as much about recall as about precision. The
beta parameter determines the weight of precision in the combined score. Beta < 1 lends more weight to
precision, while beta > 1 favors recall.
What is the difference between precision/recall, F-score and Fbeta?
F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best
if false positives and false negatives have similar cost.
Macro-average (multi-class classification)
It computes the Fscore per-class, and average. It calculate metrics for each class independently, and find their
unweighted mean. This does not take label imbalance into account. The rare classes have the same impact as
frequent classes. This can be a good thing or a bad thing, depends on what you want.
Micro-average (multi-class classification)
This calculates metrics globally by counting the total number of times each class was correctly predicted and
incorrectly predicted. You do it by a case by case basis.
Treat each correct prediction as TP
Treat each missing classification as FN
Treat each incorrect prediction as FP
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ambervdmeijs. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.35. You're not tied to anything after your purchase.