Machine Learning (Data Mining) - Samenvatting (slides en handboek)
Business Intelligence Samenvatting (HW Ugent) - (19/20!! EXAMEN)
Tout pour ce livre (32)
École, étude et sujet
Universiteit van Amsterdam (UvA)
Business Administration
Strategy analytics
Tous les documents sur ce sujet (5)
8
revues
Par: carlosjmorenos • 9 mois de cela
Par: shop • 1 année de cela
Par: taliaabdool • 2 année de cela
Par: bridgetharrell • 3 année de cela
Par: tiesdelen • 3 année de cela
Par: thijsthijs • 3 année de cela
Traduit par Google
Good summary per chapter, fairly extensive. For depth it is sometimes necessary to do different research!
Par: raoulbouchrit • 3 année de cela
Afficher plus de commentaires
Vendeur
S'abonner
hannah2501
Avis reçus
Aperçu du contenu
Summary of the book
Chapter 1 - Introduction
Chapter 2 – Business problems & data science solutions
Chapter 3 – Introduction to predictive modeling
Chapter 4 - Fitting a model to data
Chapter 5 – Overfitting and its avoidance
Chapter 6 – Similarity, neighbors and clusters
Chapter 7 – Decision analytic thinking I
Chapter 8 – Visualizing model performance
Chapter 9 – Evidence and probabilities
Chapter 10 – Representing and mining text
Chapter 11 – Decision analytic thinking II
Chapter 12 – Other data science tasks and techniques
Chapter 13 – Data science and business strategy
Chapter 14 – Conclusion
Chapter 1 – Introduction: Data-analytic Thinking
Data science = principles, processes and techniques for understanding phenomena via the
analysis of data
Ultimate goal: improving decision making
Data-driven Decision making (DDD) = the practice of basing decisions on the analysis of
data, rather than purely on intuition
- Increases production (1 SD higher on the DDD scale equals 4-6% increase in
productivity)
- Higher return on assets, return on equity, asset utilization and market value
,2 types of decisions
1. Decisions for which discoveries need to be made within data
2. Decisions that repeat (especially at massive scale), so decision-making can benefit
from even small increases in decision-making accuracy
a. E.g. churn problems in big companies
Predictive model abstracts away most of the complexity of the world by focusing on a
particular set of indicators that correlate in some way with a quantity of interest
Data science supports data-driven decision making, but also overlaps with data-driven
decision making
Business decisions are being made automatically by computer systems
Data engineering & processing critical to support data science, but are themselves more
general
Many data processing skills, systems and technologies often mistaken as data
science
Difference data science vs. data processing
Data science = needs access to data and it often benefits from sophisticated data
engineering that data processing technologies may facilitate, but these technologies are not
data science technologies per se
Data processing = important for data-oriented business tasks that don’t involve extracting
knowledge or data-driven decision-making
Big data = datasets that are too large for traditional data processing systems and therefore
require new processing technologies
Big data technologies expected to be used for implementing data mining techniques,
but more often used for supporting data mining techniques
Big data 1.0 = during web 1.0, businesses busied themselves with getting basic internet
technologies in place to they could establish web presence, build electronic capability and
improve efficiency of operations: firms are busying themselves with building capabilities to
process large data, largely in support of current operations
Big data 2.0 = Once firms have become capable of processing massive data in flexible
fashion, they begin asking what can I do now that I couldn’t do before or do better than I
could do before
Implementation of social networking component and rise of the voice of the
individual consumer
Fundamental principle of data science: data and the capability to extract useful knowledge
from data, should be regarded as key strategic assets
Too many businesses regard data analytics as pertaining mainly to realizing value
from some existing data, without checking if you have the appropriate analytical
talent
2
,Right talent & right data = complementary assets
If you don’t have the right data: buy it
Many firms nowadays exploit new & existing data resources for competitive advantage
Data analytic projects reach into all business units: requires close interaction with
data scientists and business people
3
, Chapter 2 – Business problems and data science solutions
An individual = refers to an entity about which we have data (e.g. a consumer or business)
1.Classification and class probability estimation = attempts to predict, for each individual in
a population, which of a set of classes this individual belongs to (usually the classes are
mutually exclusive)
e.g. among all customers at MegaTelCo, which are most likely to respond to given offer
- Data mining produces a model that determines which class that individual belongs to
- Closely related task: scoring/ probability estimation = applies a score to individuals
representing the probability that the individual belongs to each of the classes
- Requires categorical (often binary) target
2.Regression (value estimation) = attempts to estimate or predict for each individual the
numerical value of some variable for that individual
e.g. how much will a given customer use the service?
- Related to classification but different: classification predicts whether something will
happen, regression predicts how much something will happen
- Requires numeric target
3.Similarity matching = attempts to identify similar individuals based on data known about
them (can be used to find similar entities)
e.g. IBM is interested in finding companies similar to their business customers
- Basis for one of the most popular methods for making product recommendations
(finding people who are similar to you in terms of the products they have liked/
purchased)
4.Clustering = attempts to group individuals in a population together by their similarity, but
not driven by any specific purpose
e.g. Do our customers form natural groups or segments?
- Useful in preliminary domain exploration to see which natural groups exist because
these groups in turn may suggest other data mining tasks/ approaches
5.Co-occurrence grouping = attempts to find associations between entities based on
transactions involving them (aka frequent itemset mining, association rule discovery,
market-basket analysis)
e.g. What items are commonly purchased together? Recommendation: customers who
bought X also bought Y
- While clustering looks at similarity between objects based on objects’ attributes, co-
occurrence grouping considers similarity of objects based on their appearing
together in transactions
- Co-occurrence of products is common type of grouping named market-basket
analysis (ch.12)
6.Profiling (behavior description) = attempts to characterize the typical behavior of an
individual, group or population
e.g. What is the typical cell phone usage of this customer segment?
4
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur hannah2501. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €10,48. Vous n'êtes lié à rien après votre achat.