Lecture 1: Fundamental Concepts, Applications, and Process of Data
Science
Objectives of this lecture:
• Explain and understand concepts central to this course such as AI, machine learning,
data mining, etc.
• Explain the importance of data for business
• Identify applications and tasks that can be solved by data analytics and support decision
making
• Explain and apply the analytics process model for developing data-driven business
solutions
Terminology and Fundamental Concepts of Data Science:
• Querying and reporting:
o You know exactly what you are looking for.
o SQL: SELECT * FROM CUSTOMERS WHERE AGE > 45
• OLAP: Online Analytical Processing:
o GUI to query large data collections in real-
time
o Pre-programmed dimensions of analysis (à
faster to find information than with querying)
o Summary level
ð For both Querying and OLAP: No modeling or pattern
finding. OLAP GUI Example
è Classic Business Intelligence: You know what you are looking for à Query/OLAP
• Data Science: “A set of fundamental principles that guide extraction of knowledge from
data”
• Data Mining: “The extraction of knowledge from data, via technologies that incorporate
these principles”
• Big Data: “Data that is so large that traditional data storage and processing systems are
unable to deal with it”
ð You don’t know what you look for/want to find new intricate patterns in the (big) data
à Data Mining (to create value from unprocessed data)
1
,Technologies:
ð Last decade: evolution of AI relying more and more on ML, and ML on DL, but not
synonyms!
Examples of Applications of Data Science:
A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its
whereabouts are unknown.
The incident occurred on the downtown train line, which runs from Covington and Ashland
stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with
the Federal Railroad Administration to find the thief. “The theft of this nuclear material will
have significant negative consequences on public and environmental health, our workforce
and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement.
“The safety of people, the environment and the nation’s nuclear stockpile is our highest
priority,” Hicks said. “We will get to the bottom of this and make no excuses.”
ð What could be the relevance of this article regarding AI?
o This text was partly written by an A.I., only the part in bold was written by a
human. The rest was filled-in by an A.I. using only the first sentence as a base.
What could be the relevance of these pictures regarding AI?
ð What could be the relevance of these pictures regarding AI?
o Both people are images generated by A.I..
2
,Concerns?
• Modern ML techniques are very good at learning complex patterns in data to solve
certain types of predefined tasks
• Data science harnesses these techniques to solve commercial and business issues to
create value
Data:
• At the basis of all of this: data!
• What is data?
o Raw stream of facts
Sometimes big:
• The Large Hadron Collider (LHC at CERN) has 150 million sensors, together generating
about 40 million measurements per second
• Walmart registers more than a million customer transactions per hour
Data as a strategic asset:
• Data can lead to better decision making through data science
• Data à information/knowledge
• Data is a valuable asset
Which types of decisions to support through data science:
• Decisions for which discoveries need to be made:
o Usually high impact
o E.g., prediction of demand shocks in times of crisis
3
, • Decisions that repeat, especially at massive scale:
o Decision-making can benefit from even small increases in decision-making
accuracy on data analysis.
o E.g., credit scoring
The Data Science Process:
Important technology: Machine learning
ð Learns from data.
o But what is learning?
Learning:
• We usually learn a function:
y = f(x)
• f: a mathematical or logical formula:
o Can be learned using algorithms that learn f(x) from data, from examples
o E.g.: f() a program to identify cats in video data
o Gets better with more examples à Remember: Machine learning
OR:
o Mapping of x to y can be hardcoded, what the program does à solution is thus
not “learned”
Example:
• y = f(x) looks suspiciously like linear regression:
ð But often more complex!
4
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Madikan. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €6,49. Vous n'êtes lié à rien après votre achat.