Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
This document contains all the lecture slides and notes of the course 'Real-life Machine learning (300363-B-6)', given at Tilburg University as premaster for JADS. This document contains everything needed for the exam and is complete.
Goodluck with the course!
Alle stof nodig voor het tentamen/everything needed for the exam
December 17, 2023
71
2023/2024
Summary
Subjects
premaster
jads
data analysis
machine learning
economics
business economics
ebe
data premaster
economie
bedrijfseconomie
tilburg university
Connected book
Book Title:
Author(s):
Edition:
ISBN:
Edition:
More summaries for
JADS Premaster - Introduction to Machine Learning Summary
All for this textbook (2)
Written for
Tilburg University (UVT)
Bedrijfseconomie
Real-life Machine learning (300363B6)
All documents for this subject (1)
Seller
Follow
Dee25
Reviews received
Content preview
Lecture 1
What are we going to learn today?
- What is machine learning?
- What are supervised and unsupervised machine learning?
- Which are the most common types of machine learning problems?
- Which are the basic steps of the CRoss Industry Standard Process for data mining
(CRISP-DM)?
Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed
Machine learning
Assume that you are iterating over and over again an exercise
What should be constant in your exercise?
- Learning! - machine learning applies strategies and algorithms, combined with data
and statistics
- Improving! - machine learning applies statistical indices to measure the overlap
between ML prediction and expected result
When you are doing it, it is human learning
When a machine does it, it is machine learning!
An example of supervised learning
Supervised learning - classification
Given a labelled dataset, the model learns to
predict new examples
An example of unsupervised learning
Unsupervised learning - clustering,
dimensionality reduction, anomaly detection
and novelty detection
Given a dataset, without labels, the model
learns to use to cluster/group similar data
,CRISP-DM process model
Business understanding in the CRISP-DM process
Determine business objectives and success criteria
Business objectives and measures to evaluate the results have to be established
Business objectives:
● What is the customer’s primary objective?
● Increase the number of loyal customers
● Selling more of a certain product
● Have a positive marketing campaign
,Business success criteria:
● Objective measure to establish success (e.g. return of investment)
Main steps in a data mining project
1. Define the goals:
Business and data mining experts together have to define the goals. For each goal a
measure must be defined to understand its success
2. Obtain the models:
Pre-process the data, apply data mining algorithms
3. Evaluate results
Use the pre-specified measures to evaluate the models
4. Deploy:
If the evaluation is successful, the model can be deployed
Costs & benefits
Perform a cost-benefit analysis
Compute the benefits of the project (e.g. return on investment)
Compute the costs of the project - main factors:
● Data sources
● Data mining problem to be solved
● Available tools
● Expertise of the development team
Quantify the risk that the project fails:
● Knowledge not available
● Data not available
● Missing tools
Quality data & feature engineering
What are we going to learn today?
- What kind of data exists?
- How to prepare data?
- What is data balancing?
- How to apply data cleaning and feature scaling?
- What is feature selection?
, What kind of data exists?
- Structured data
- Unstructured data
- Semi-structured data
Structured data
Tabular data (rows and columns) which are very well defined
We know which columns there are and what kind of data they contain (the format is very
strict)
Often such data is stored in databases that represent the relationships between the data as
well. Questions about data can be answered by using a query language.
Unstructured data
The rawest form of data that can be any type of file.
Extracting value out of this shape of data is hard, since you need to extract structured
features from the data
For example, you might want to extract topics from movies.
Semi-structured data
This format is between structured and unstructured data
A consistent format is defined. However, the structure is not very strict. For example, it could
not be tabular or parts of the data may be missing.
Semi-structured data are often stored as files. However, some kinds of semi-structured data
can be stored in document oriented-databases. Such databases allow you to query the
sem-structured data
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Dee25. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.04. You're not tied to anything after your purchase.