Summary

Summary All the readings samengevat!

Name: Alle lectures samengevat !
SKU: doc_2177610
Rating: 5.00 (1 reviews)
Author: gideonrouwendaal

1 review

89 views 6 purchases

Course
Machine Learning

Institution
Vrije Universiteit Amsterdam (VU)

This summary is comprehensive, but contains everything you need to get a good exam grade! I myself got an 8.5, so this should work out! In addition to the summary, I would also take the practice exam (for the application questions), then you are ready to go!

[Show more]

Preview 4 out of 111 pages

View example

Uploaded on December 13, 2022
Number of pages 111
Written in 2021/2022
Type Summary

machine learning
vu university
artificial intelligence
ml

Institution
Vrije Universiteit Amsterdam (VU)
Education
Artificial Intelligence
Course
Machine Learning

1 review

By: lauraweerstra • 6 months ago

gideonrouwendaal

Member since 1 year 36 documents sold

$11.77

Added

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Lecture 1
Putting the process of inferring rules without explicitly stating them into a computer is Machine
Learning. You do this by giving the computer several number of examples and let the computer do
the work.

What makes a suitable ML problem?

- We cant solve it explicitly
- Approximate solutions are fine
- Limited reliability, predictability, interpretability is fine
- Plenty of examples should be available
- Good examples: Recommending a movie, clinical decision support, predicting driving time
and recognising a user
- Bad examples: computing taxes, clinical decisions, parole decision, unlocking phone

Where do we use ML?

- Inside other software: unlock your phone with your face, search with a voice command
- In analytics, data mining, data science: find typical clusters of users, predict spikes in web
traffic
- In science/statistics: if any model can predict A from B, there must be some relation

Machine learning provides systems the ability to automatically learn and improve from experience
without being explicitly programmed.

Reinforcement learning: taking actions in a world based on delayed feedback.

Online learning: predicting and learning at the same time

Offline learning: Separate learning, predicting and acting:

- Take a fixed dataset of examples (instances)
- Train a model to learn from these examples
- Test the model to see if it works by checking its predictions
- If the model works, put it into production: use its predictions to take actions

ML problems are not solved 1 by 1 (not years searching for chess algorithm/driving car etc.), but
abstract tasks like classification, regression…. And find solutions (algorithms) for these tasks e.g.
linear model, kNN….

,Abstract tasks are divided into supervised and unsupervised tasks:

Supervised: explicit examples of input and output. Learn to predict output for unseen input.
Learning tasks:

- Classification: assign a class to each example
- Regression: assign a number to each example

Unsupervised: Only inputs provided. Find any pattern that explains something about the data.

AI is general and is about building intelligent agents. ML is subset of AI.

Data science is general about data. ML is subset of Data science

Data mining is an intersection with ML (very close to each other). E.g. finding common clickstreams
in web logs or finding fraud in transactions networks is more DM. Spam classification, predicting
stock prices, learning to control a robot is more ML. Data mining is more about giving large database
of data and finding patterns in this. ML focuses more on (prediction) tasks.

Information retrieval: not the same but can benefit from one another.

Stats vs ML: most of the rules of stats are also in ML. Main difference is what we want from the
model once it is fitted. Stats: should fit the reality ML only interested in predictions that are likely to
be true.

Deep learning: a subset of ML.

An example of classification is to mark an email as spam or ham (2 classes). Data goes into a learner
and eventually there is a model. The model is a classifier.

Linear classifier: classification algorithm (for example with 2 features in 2D), that draws a line
through a certain space. An example: everything above this line is X and everything underneath the
line is classified as Y. In 2D it is a line, 3D a plane and in 4D+ it is called a hyperplane.

Loss function: a function that expresses for a particular model how well it fits our data:

Lossdata(model) = performance of model on the data (the lower the better). For classification: e.g. the
number of misclassified examples.

Decision tree classifier: classification algorithm. The leaves in the tree are labelled to classes.

k-Nearest Neighbours classifier: classification algorithm. “lazy classifier”. Does not do any learning,
but just remembers the dataset. Once it gets a new point, it just looks at the “k-nearest points”.
Assigns the class of the most frequent class among the k-nearest neighbours. k is a hyper parameter
(has to be chosen by the programmer).

Classification algorithms mostly work with numerical or categorical features (e.g. an algorithm only
works with numerical values). Binary classification: 2 classes. Multiclass classification: more than 2
classes. Multilabel classification: none, some, or all classes may be true. Class probabilities/scores:
the classifier reports a probability for each class: helpful property for a classifier to have.

Offline machine learning: the basic recipe:

- Abstract (part of) your problem to a standard task: classification, regression, clustering….
- Choose your instances and their features. For supervised learning: choose a target
- Choose your model class: linear models, decision trees, kNN

, - Search for a good model: usually a model comes with its own search method. Sometimes
multiple options are available.

Classification vs regression: in classification the target is a class and in regression the target is a
number. xi is the features of instance i. yi is the true label for xi. f(xi) is the model from the feature
space. The model goes from the feature space to the model space. In a regression model you do
have the feature space on the for example x-axis and the target on the for example y-axis
(classification just features as axis). The loss-function that is often used in regression models is the
mean-squared-errors (MSE): loss(f) = 1/n * (sum of all (f(xi) - yi)**2). Squared because for example a
big difference against a negative difference (cancel out). You also have a regression tree. And there
is a kNN regression.

Unsupervised abstraction tasks: clustering, density estimation and generative Modelling.

Clustering is a lot like classification. Divide the dataset or the feature space into a set of finite values.
But the difference is that in this case we are not given target values. Features are given but no
classes. Learner has to decide purely on pattern finding how to separate the dataset.

Density estimation: dataset of instances represented by features. The learner discovers patterns of
density. The task of the learner is to produce a model that outputs a number and that number
should indicate whether that instance is likely according to the distribution of the data. If features
are numerical: probability. If features are categorical: probability density. Fitting a normal
distribution to a set of numbers.

Generative modelling: A model that learns a probability distribution. thispersondoesnotexist.com

You can combine unsupervised and supervised learning: semi-supervised learning. Unlabelled data
is cheap to get (internet). An example of this kind of training is self-training.

Self-supervised learning: large unlabelled dataset is used to train a model without requiring a large
amount of manual annotation.

Sensitive attributes: features or targets that are associated with instances of data that require
careful consideration. Examples: sexual orientation, race, ethnic identity, cultural identity, gender.
What makes an attribute sensitive?

- Can it be used for harm?
- Can mischaracterizing relations become offensive?
- Is it commonly used to discriminate? → explicitly, as in apartheid regimes, or implicitly
through structural inequality.

Training data bias: where do you get your data from?

, Bias from technological legacy: rely on existing technology, that might have unexpected biases.

Amplifying bias: gender regarding words, for example Google translate used the male for words like
mi amigo es doctor as translate of my friend is a doctor. Might be that more doctors in general are
male, but still not 100%. Can be fixed by showing 2 results.

Are you predicting what you think you’re predicting? Results that are obtained from surveys can be
false (e.g. lies). Think we found a predictor, but this predictor is based on a survey.

It matters where you are predicting from! Persistence: the weather forecasting tactic of predicting
todays weather as tomorrow’s. Hence accuracy is not all that matters.

Can predictions be offensive or hurtful? There is a difference between being able to make a guess
and choosing to do so (predicting and acting). If a behaviour is not acceptable in a social context of
people, then humans will be upset if a computer does it. E.g. a website asking about your email
before allowing you to take a look (asking personal information before speaking to you).

Should we include sensitive data in attributes at all? To study bias, we need these attributed to be
annotated. If we remove them, they may be inferred from other features. Directly using a sensitive
attributes (SA) may be preferable to indirectly doing so. There are valid use cases (e.g. race and sex
affect medicine. Often requires a causal link).

Should we stop using SA as targets? What is input and what is target is not always clearly separated.
Showing that sensitive attributes can be inferred, may serve as a warning to those who are
vulnerable (building a proof-of-concept in a controlled setting is sometimes the best way to warn the
world that something can be built. E.g. using an algorithm to warn people that they might be seen as
gay, where homosexuality is not accepted).

Summary: use SA with extreme care: consider user communication over prediction. Check the
distribution. Do not: imply causality, and overrepresent what your predictions mean.

The aim of ML is to find a model that generalizes (does not work only on the training set). Hence,
you should not overfit (fitting the data that contains random noise). If overfitting happens, the
model is memorizing the data instead of generalizing. Hence: never judge your model’s
performance on the training data. The easiest way to prevent overfitting is to hold a bit of your
data. Hence, from all the data there is a part that is training data and a part that is test data. The aim
is not to minimize the loss on the training data, but to minimise the loss on the test data. You don’t
get to see the test data until you’ve chosen your model. Find the pattern in the data and discard the
noise. Machine learning is an empirical science.

The problem of induction. Inductive reasoning is learning. Observe something couple of times and
infer that it will probably happen the next time. Deductive reasoning is rule following. Deductive
reasoning: all men are mortal; Socrates is a man; Socrates is mortal. Inductive reasoning: the sun has
risen in the east my entire life; so it will do so tomorrow.

General heuristics: all else being equal, prefer the simple solution.

Lecture 2
Linear regression
Notation:

- Lower case, non-bold letter (x, y, z) → scalar (i.e. a single number)

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller gideonrouwendaal. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $11.77. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

79316 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Summary

Summary All the readings samengevat!

Document information

Subjects

Written for

1 review

Seller

Reviews received

Content preview

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Quick and easy check-out

Focus on what matters

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?