Resume

Summary Machine Learning

109 vues 3 fois vendu

Cours
Machine Learning (880083M6)

Établissement
Tilburg University (UVT)

Summary of all the lectures of Machine Learning. It contains all the relevant material needed for the final exam.

[Montrer plus]

Aperçu 4 sur 38 pages

Voir l'exemple

Publié le 5 décembre 2023
Nombre de pages 38
Écrit en 2023/2024
Type Resume

data science and society
machine learning
machine
learning
data science
dss
dss machine learning
master dss
data science machine learning
data science and society machine learning

Établissement
Tilburg University (UVT)
Cours
Data Science & Society
Cours
Machine Learning (880083M6)

bascrypto

Membre depuis 3 année 293 documents vendus

€2,99

Egalement disponible en groupe à partir de €5,49

Ajouté

Ajouter au panier

Ajouter au liste de veux

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Document également disponible en groupe (1)

Cheat Sheet and Summary Machine Learning

€ 5,98 € 5,49

4x vendu

2 éléments

1. Resume - Summary machine learning
2. Resume - Machine learning cheat sheet
Montrer plus

Summary Machine Learning

Lecture 1 – Introduction to Machine Learning

What is machine learning (ML) about?
 ML is about automation of problem solving.
 It is the study of computer algorithms that improve automatically through experience.
 Involves becoming better at a task T based on some experience E with respect to some
performance measure P
 Examples: spam detection, movie recommendation, speech recognition, credit risk analysis,
autonomous driving and medical diagnosis.

What does it involve?
 ML may involve a notion of generalization. Is it safe to assume that current observations can
be generalized to future observations?
 ML should be generalizable – should perform on unseen data / representative for real world
domain.
 Labeled data, objective, optimization algorithm (models), features/representations (columns),
and assumptions are some critical components.

Different types of learning
 Supervised learning: annotated/labelled dataset / ground truth
o Classification: discrete variable - predicting the price of a house
o Regression: continuous variable - spam detection
 Unsupervised learning: unlabeled dataset
o Clustering, association mining - customer segmentation, recommendation
 Semi supervised learning (only a portion of the data is labeled) - text classification
 Reinforcement learning: based on rewarding desired behaviors and/or punishing undesired
ones. (involves a feedback loop) - self driving car

Example - SPAM versus non-Spam
 Binary classification problem

Learning process
 Find examples of SPAM and non-SPAM
 Come up with a learning algorithm
 A learning algorithm infers rules from examples: If (A or B or C) and not D, then SPAM
 These rules can then be applied to new data (emails)

Learning algorithms

, See several different learning algorithms
 Implement simple 2-3 simple ones from scratch in Python
 Learn about Python libraries for ML (scikit-learn)
 How to apply them to real-world problems

Machine Learning – Examples
 Recognize handwritten numbers and letters
 Recognize faces in photos
 Determine whether text expresses positive, negative or no opinion
 Guess person’s age based on a sample of writing
 Flag suspicious credit-card transactions
 Recommend books and movies to users based on their own and others’ purchase history
 Recognize and label mentions of people’s or organization names in text

Types of learning problems: Regression
 Response: a (real) number
 Predict person’s age, predict price of a stock, predict student’s score on exam
 In regression, the response variable is predicted using a set of predictors that are believed to
have an influence on the response variable.

Types of learning problems: Binary classification
 Response: Yes/No answer
 Detect SPAM
 Predict polarity of product review: positive vs negative

Types of learning problems: Multiclass classification
Response: one of a finite set of options
 Classify newspaper article as
o politics, sports, science, technology, health, finance
 Detect species based on photo
o Passer domesticus, Calidris alba, Streptopelia decaocto, Corvus corax, …

Types of learning problems: Multilabel classification
Response: a finite set of Yes/No answers
 Assign songs to one or more genres
o rock, pop, metal
o hip-hop, rap
o jazz, blues
o rock, punk

Types of learning problems: Autonomous behavior
 Input: measurements from sensors – camera, microphone, radar, accelerometer,. . .
 Response: instructions for actuators – steering, accelerator, brake,

How well is the algorithm learning?
 Evaluation: Choose a baseline, choose a metric, compare!

, different tasks, different metrics

Predicting age – Regression
 Mean absolute error – the average (absolute) difference between true value and predicted
value (yn true value (ground truth), ^y n predicted value) - fails to punish large errors in
prediction as all errors are treated equally.

 Mean squared error – the average square of the difference between true value and predicted
value - more sensitive to outliers as the square amplifies the impact of large deviations.

Predicting spam - Classification

 Drawback: does not work well on imbalanced data. If the data is imbalanced the accuracy
will naturally be high.

Classification
Wrong classification
 False positive (FP) – Flagged as SPAM, but not non-SPAM
 False negative (FN) – Not flagged, but is SPAM
 False positives are a bigger issue for this problem! In the medical field this is the other way
around (False negative are the bigger issue). Minimizing one of the two depends thus on the
problem at hand.
Correct classification
 True positive (TP): Spam classified as spam
 True negative (TN): Not-spam classified as not-spam
Summarized in a confusion matrix (image on the right)
 Confusion matrix can be appended with more decision
classes.

Precision and Recall
 Metrics which focus on one kind of mistake. These are better for imbalanced data (together
with F1-score.
 Precision (positive predictive value (PPV)) – what fraction of flagged emails
were real SPAMs?
 Recall (sensitivity, hit rate, or true positive rate (TPR)) – what fraction of
real SPAMs were flagged?
 Specificity, selectivity or true negative rate (TNR) (usage not common)

Fβ-score
 F1 – score (F-measure): harmonic mean between precision and recall a kind
of average

,  Parameter β quantifies how much more we care about recall than precision,
when it is greater than 1, that means, recall is weighted more, when it is
smaller than 1, that means precision is weighted more.

Macro-average
 Precision and recall are usually calculated per decision class. Micro and
macro average are ways to aggregate the measures.
 Precision true positives over labeled positives; Recall, true positives over
actual positives. ((1), (2), (3), (4),(5)) represent the five data points.)
 Compute precision and recall per-class, and average: ex:

 Rare classes have the same impact as frequent classes (not ideal).
 Macro F1-Score is the harmonic mean of Macro-Precision and Macro-Recall.

Micro-average
 Micro averaging treats the entire set of data as an aggregate result, and calculates 1 metric
rather than k metrics that get averaged together.

 In micro averaging, we calculate one aggregate result for the entire data set (for precision and
recall and use these micro averaged precision and recall for the micro averaged F1).
 Micro-Average Precision and Recall are just the same values when there is one label, so is the
Micro Average F1-Score, and the accuracy.

How to find ^f ( x ) : A solution workflow
 Best outcome we can hope for: ^f ( x ) = f (x) for all x. Ideally, we would like ^f ( x ) such that a
loss between f(x) and ^f ( x ) is minimized, i.e., L( ^f ( x ) , f (x)) is small.
 The cost (average loss plus possibly a regularization term) in case of regression can be MSE
or MAE, computed over all values of x
 Problem 1 We do not have all values of x and f (x) (might not represent the whole population)
 Problem 2 We do not know how f (x) looks like (distribution)
 Compute loss on the data we have (empirical risk minimization) for MAE:

How to find ^f ( x )
 If ^f ( x ) = θx + c we assume a linear relationship

 For a more complex relationship, a polynomial function can also be used
 Choose a power p ^f ( x ) = c + θ1x + θ2x2 + . . . θpxp Higher p implies higher degree of
freedom/flexibility (and more fitted to the data -> risk of overfitting)

Changing p: overfitting / underfitting

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur bascrypto. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €2,99. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

80796 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!

Populaire universiteiten

Populaire hogescholen

Populaire studieboeken voor Communicatie en Taal

Populaire studieboeken voor Economie en Bedrijf

Populaire studieboeken voor Exact en Informatica

Populaire studieboeken voor Gedrag en Maatschappij

Populaire studieboeken voor Gezondheid en Geneeskunde

Populaire studieboeken voor Recht en Bestuur

Resume

Summary Machine Learning

Infos sur le Document

Sujets

École, étude et sujet

Vendeur

Avis reçus

Aperçu du contenu