100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Machine Learning Course UvT 2021 $4.88   Add to cart

Summary

Summary Machine Learning Course UvT 2021

 66 views  3 purchases
  • Course
  • Institution

An extensive summary of the Machine Learning course at UvT (Fall 2021).

Preview 3 out of 30  pages

  • June 9, 2021
  • 30
  • 2020/2021
  • Summary
avatar-seller
Machine Learning
Week 1 – Part I: Practical Matters
Introduction
 Lectures will be live (being recorded) or prerecorded.
 Lecture videos will be shared weekly as well as the accompanying slides.
 Slides are not meant to be self-contained, take notes!
 The practical sessions will be online, interaction is possible during these sessions.
Group Assignment: ML Challenge
 Work in groups of 3 people to solve a challenge problem
 30% course grade
 No resit
 Collaborative work: you will need to describe work division and contribution of each student
Final Exam
 Worth 70% course grade
 Multiple choice and/or open-ended questions
 Programming exercises

Part II: Introduction to Machine Learning
How can we automate problem solving?
Example: flagging spam in your e-mail.
- Classification task
- Requires standard machine learning method.
Some email headers:

Rules: if (A or B or C) and not D, then SPAM.
- Specify them, so the system recognizes them

Machine Learning
Is the study of computer algorithms that improve automatically through experience [1]. (involves becoming better at a task T
based on some experience E with respect to some performance measure P).

Learning process
 Find examples of SPAM and non-SPAM (test set)
 Come up with a learning algorithm
 A learning algorithm infers rules from examples
 These rules can then be applied to new data (emails)

Learning algorithms
 See several different learning algorithms
 Implement simple 2-3 simple ones from scratch in Python
 Learn about Python libraries for ML (scikit-learn)
 How to apply them to real-world problems
Machine Learning examples: recognize handwritten numbers and letters, recognize faces in photos, determine whether text
expresses positive/negative or no opinion, guess person’s age based on a sample of writing, flag suspicious credit-card
transactions, recommend books and movies to users based on their own and other’s purchase history, recognize and label
mentions of people’s or organization names in text.

Types of learning problems: Regression
Response: a (real) number
 Predict person’s age
 Predict price of a stock
 Predict student’s score on exam

Binary classification
Response: yes/no answer
 Detect SPAM
 Predict polarity of product revies: positive vs negative

Multiclass classification
More than two elements (picture)
Response: one of a finite set of options
 Classify newspaper article as: politics, sports, science, technology, health, finance
 Detect species based on photo: passer domesticus, calidris alba etc.

Multilabel classification 
Response: a finite set of Yes/No answers
 Assign songs to one or more genres: rock, pop, metal, hip-hop

Ranking
Search engines searching for specific source.
Order object according to relevance
 Rank web pages in response to user query
 Predict student’s preference for courses in a program
Sequence Labeling

1

,Relevant in speech recognition.
Input: a sequence of elements (e.g., words)
Response: a corresponding sequence of labels
 Label words in a sentence with their syntactic category Determiner Noun Adverb Verb: Prep Noun
 Label frames in speech signal with corresponding phonemes.

Sequence-to-sequence modeling
Input: a sequence of elements
Response: another sequence of elements
 Possibly different length
 Possibly elements from different sets
Examples: translate between languages (My name is Penelope  Me llamo Penélope), summarize text

Autonomous behavior
Self-driving car
Input: measurements from sensors – camera, microphone, radar, accelerometer.
Response: instructions for actuators – steering, accelerator, brake, …

How well is the algorithm learning?
Evaluation
You need some standard, a performance metric!
- Predicting age
- Predicting gender
- Flagging spam
- …

Predicting age – Regression
Mean absolute error – the average (absolute) difference between true value and predicted value.



Mean squared error – the average square of the difference between the true value and predicted value (more sensitive to
outliers).




Predicting spam
We can use the error rate for that:

Kinds of mistakes
 False positive: flagged as SPAM, but not non-Spam
 False negative: not flagged, but is SPAM
 False positives are a bigger problem!

Precision and Recall
Metrics which focus on one kind of mistake.
Precision: what fraction of flagged emails were real SPAMs?
P=¿TP∨ ¿ ¿
¿ F∨¿ ¿
Recall: what fraction of real SPAMs were flagged?
P=¿TP∨ ¿ ¿
¿ S∨¿ ¿
F = true positives + false positives
S = true positives + false negatives

F-score
Harmonic mean between precision and recall, a kind of average (aka the F-measure):
P×R
F 1=2×
(P+ R)

Parameter β quantifies how much more we care about recall than precision.
P× R
F β =( 1+ β 2 ) × 2
β ×(P+ R)
For example F0.5 is the metric to use if we care half as much about recall as about precision.

Is precision, recall and f-score applicable for Multiclass Classification?



2

, Macro-average
Compute precision and recall per-class, and average.
Rare classes have the same impact as frequent classes.
Micro-average
Treat each correct prediction as TP
Treat each missing classification as FN
Treat each incorrect prediction as FP

Properties:
- In single-label classification
- If we average over all classes: including null/default class.
Precision=Recall=F−score=Accuracy
Multilabel classification
Each example may be labeled with any number of classes. How do micro P and R behave in this case?
Using examples: imagine you’re studying for a very competitive exam – how do you use learning material?

Disjoint sets of examples
Training set: observe patterns, infer rules
Development set: monitor performance, choose best learning options
Test set: REAL EXAM, not accessible in advance

Important considerations
Use the same evaluation metrics:
 Development set
 Test set
Important for evaluation to be close to true (real world) objective.

Summary
 Machine learning studies algorithms which can learn to solve problems from examples Several canonical
problem types.
 First step: decide on evaluation metric
 Separate training, development and test examples

Week 2 – Decision Trees
Supervised machine learning
Supervised: training data is labeled (known). Such a learning algorithm reads the training
data and computes a learning fuction (f). The function can then label future examples.

DT learning is a function where the labels are captured by a tree. In practice, this can be
more complex. For example: when hyper parameter tuning is applied.
A hyper parameter is a parameter whose value is set before the learning process begins,
so not derived during the learning. Usually, other parameters of the learning process are
learned. The value of the hyperparameter is used to control the learning process. Tuning is
done to find the best possible model to optimize the learning.
The depth of a decision tree is an example of a hyperparameter.
When hyper parameter tuning is involved, the data is split into 3 portions: training, validation
and test sets. Using the training and validation data a good value for maximum depth that
the trays between overfitting and underfitting can be found. The resulting decision tree model
is then run on the test data to get an estimate of how well the model is likely to do in the future on the unseen data.

Weakness of DT: prone to overfitting. Overfitting means doing well on the training set, but not on the generalization set (the test
set). On the bright side: they are very understandable.
Decision trees can be seen as a list of tests, can be used to classify objects
(with their hierarchical structure). Decision tree learning is about constructing
the tree.

Some real-life examples using decision trees:

Medical Diagnosis
A DT in predicting hepatitis. This tree is generated to support the diagnosis in
the existence or non-existence of the markers.

Customer Segmentation
A DT for the market segmentation of car consumers. Income is the main
identifier in people’s choices of cars. Depending on that, several other identifiers
such as profession, marital status and age are important too.

Decision trees in Data Mining can be used in classification tasks, where the
predicted outcome is the class. In this course mostly classification trees (not
regression).

A decision tree consists of:
 Nodes: check the value of a feature.

3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller sabrinadegraaf. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.88. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75759 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$4.88  3x  sold
  • (0)
  Add to cart