Strategy Analytics summary
Lecture 1
Chapter 1: Data Analytic Thinking
Probably the widest applications of data-mining techniques are in marketing for tasks such as:
- Targeted marketing.
- Online advertising.
- Recommendations for cross-selling.
- General customer relationship management to analyse customer
behaviour in order to manage attrition and maximize expected customer
value.
The finance industry uses data mining for credit scoring and trading, and in
operations via fraud detection and workforce management.
The retail industry uses data mining for marketing and supply chain management.
Data science = involves principles, processes, and techniques for understanding
phenomena via the (automated) analysis of data. The ultimate goal of data
science is improving decision making, as this generally is of direct interest to
business. It addresses a specific question (in our case business decisions).
Data-driven decision making (DDD) = the practice of basing decisions on the
analysis of data, rather than purely on intuition.
E.g.: a marketer could select advertisements based purely on her long
experience in the field and her eye for what will work. Or, she could base
her selection on the analysis of data regarding how consumers react to different ads. She could also use
a combination of these approaches. DDD is not an all-or-nothing practice, and different firms engage in
DDD to greater or lesser degrees.
The sort of decisions we will be interested in in this book mainly fall into 2 types:
1. Decisions for which “discoveries” need to be made within data (non-obvious).
2. Decisions that repeat, especially at massive scale, and so decision-making can benefit from even small
increases in decision-making accuracy based on data analysis (repetitive decisions).
There is a lot to data processing that is not data science—despite the impression one might get from the media.
Data engineering and processing are critical to support data science, but they are more general.
o E.g.: these days many data processing skills, systems, and technologies often are mistakenly
cast as data science.
To understand data science and data-driven businesses it is important to understand the differences data
science needs access to data and it often benefits from sophisticated data engineering that data processing
technologies may facilitate, but these technologies are not data science technologies per se. They support data
science, as shown in the figure, but they are useful for much more.
Data processing technologies are very important for many data-oriented business tasks that do not
involve extracting knowledge or data-driven decision-making, such as efficient transaction processing,
modern web system processing, and online advertising campaign management.
Big data = datasets that are too large for traditional data processing systems, and therefore require new
processing technologies simply, very large datasets, but with 3
distinct characteristics (3Vs):
1. Volume = the quantity of generated and stored data.
2. Variety = the type and nature of the data.
3. Velocity = the speed at which the data is generated and
processed.
Data mining = the extraction of knowledge from data, via
technologies that incorporate these principles. Underlying the
extensive body of techniques for mining data is a much smaller
set of fundamental concepts comprising data science.
,Success in today’s data-oriented business environment requires being able to think about how these
fundamental concepts apply to particular business problems to think data-analytically.
Data analytics = the process of examining datasets in order to draw conclusions about the useful information
they may contain.
Types of data analysis:
- Descriptive analytics (BI): what has happened?
o Simple descriptive statistics, dashboard, charts, diagrams.
- Predictive analytics: what could happen?
o Segmentation, regressions.
- Prescriptive analytics: what should we do?
o Complex models for product planning and stock optimization.
There is convincing evidence that data-driven decision-making and
big data technologies substantially improve business performance.
Data science supports data-driven decision-making—and
sometimes conducts such decision-making automatically and
depends upon technologies for “big data” storage and engineering,
but its principles are separate.
One of the fundamental principles of data science data, and the
capability to extract useful knowledge from data, should be
regarded as key strategic assets (Capital One case).
Chapter 2: Business problems and data science solutions
Each data-driven business decision-making problem is unique,
comprising its own combination of goals, desires, constraints, and
even personalities. As with much engineering, though, there are sets of common tasks that underlie the business
problems. A collaborative problem-solving between business stakeholders and data scientists:
- Decomposing a business problem into (solvable) subtasks.
- Matching the subtasks with known tasks for which tools are available.
- Solving the remaining non-matched subtasks (by creativity!).
- Putting the subtasks together to solve the overall problem.
An individual = an entity about which we have data, such as a customer or a consumer, or it could be an
inanimate entity such as a business.
A typology of methods:
1. Classification and class probability estimation attempt to predict, for each individual in a population,
which of a (small) set of classes this individual belongs to.
2. Regression (“value estimation”) attempts to estimate or predict, for each individual, the numerical value
of some variable for that individual.
3. Similarity matching attempts to identify similar individuals based on data known about them.
4. Clustering attempts to group individuals in a population together by their similarity, but not driven by
any specific purpose.
5. Co-occurrence grouping (also known as frequent itemset mining, association rule discovery, and market-
basket analysis) attempts to find associations between entities based on transactions involving them.
6. Profiling (also known as behaviour description) attempts to characterize the typical behaviour of an
individual, group, or population.
7. Link prediction attempts to predict connections between data items, usually by suggesting that a link
should exist, and possibly also estimating the strength of the link.
8. Data reduction attempts to take a large set of data and replace it with a smaller set of data that contains
much of the important information in the larger set.
9. Causal modelling attempts to help us understand what events or actions actually influence others.
,Supervised vs. unsupervised methods key question: is there a specific target variable?
Consider two similar questions we might ask about a customer population.
o The first is: “Do our customers naturally fall into different groups?” Here no specific purpose or
target has been specified for the grouping. When there is no such target, the data mining
problem is referred to as unsupervised.
o Contrast this with a slightly different question: “Can we find groups of customers who have
particularly high likelihoods of cancelling their service soon after their contracts expire?” Here
there is a specific target defined: will a customer leave when her contract expires? In this case,
segmentation is being done for a specific reason: to take action based on likelihood of churn.
This is called a supervised data mining problem.
The difference between these questions is subtle but important. If a specific target can be provided, the problem
can be phrased as a supervised one. Supervised tasks require different techniques than unsupervised tasks do,
and the results often are much more useful. A supervised technique is given a specific purpose for the grouping—
predicting the target. Clustering, an unsupervised task, produces groupings based on similarities, but there is no
guarantee that these similarities are meaningful or will be useful for any particular purpose.
Technically, another condition must be met for supervised data mining: there must be data on the target. It is not
enough that the target information exist in principle; it must also exist in the data.
Classification, regression, and causal modelling solved with supervised methods.
Similarity matching, link prediction, and data reduction could be either supervised or unsupervised.
Clustering, co-occurrence grouping, and profiling solved with unsupervised methods.
Two main subclasses of supervised data mining, classification and regression, are distinguished by the type of
target.
Regression involves a numeric target while classification involves a categorical (often binary) target.
Consider these similar questions we might address with supervised data mining:
o “Will this customer purchase service S1 if given incentive I?”
This is a classification problem because it has a binary target (the customer either
purchases or does not).
o “Which service package (S1, S2, or none) will a customer likely purchase if given incentive I?”
This is also a classification problem, with a three-valued target.
o “How much will this customer use the service?”
This is a regression problem because it has a numeric target. The target variable is the
amount of usage (actual or predicted) per customer.
A vital part in the early stages of the data mining process is (i) to decide whether the line of attack will be
supervised or unsupervised, and (ii) if supervised, to produce a precise definition of a target variable. This
variable must be a specific quantity that will be the focus of the data mining (and for which we can obtain values
for some example data).
Unsupervised learning
- Training data provides “examples” – no specific “outcome”.
- The machine tries to find specific patterns in the data.
- Algorithm: clusters, anomaly detection, association discovery, and topic modelling.
- Because the model has no “outcome”, can not be evaluated.
Example questions: Training data:
Are these customers similar? Customer profile
Is this transaction unusual? Previous transactions
Are the products purchased together? Examples of previous purchases
Supervised learning
- Training data has one feature that is the “outcome”.
- The goal is to build a model to predict the outcome (the machine learns to predict).
- The outcome data has a known value, model can be evaluated.
o Split the data into a training and test set.
o Model the training set/predict the test.
o Compares the predictions to the known values.
- Algorithm: model/ensemble, logistic regression, and time series.
, Example questions: Training data:
How much is this home worth? Previous home sales
Will this customer default on a loan? Previous loan that were paid or defaulted
How many customers will apply for a loan next month? Previous months of loan application
Is this cancer malignant? Previous stats of benign/malignant cancer
There is another important distinction pertaining to mining data: the difference between (1) mining the data to
find patterns and build models, and (2) using the results of data mining.
The CRISP (Cross Industry Standard Process for Data Mining) data mining process:
Capital One case study
Since its founding as a credit card company in 1988, Capital One Financial Corp. has grown into a diversified bank
with more than 65 million customer accounts worldwide. It is not hard to see why Capital One is investing heavily
in digital technologies. It conducts over 80,000 big data experiments a year. Currently, 75% of customer
interactions with Capital One are digital, and this number is only expected to grow. In Q4 2013, Capital One was
one of the most visited websites, with 40 million unique online visitors.
Capital One has fundamentally altered traditional industry ways of working through digital technology
and has an unflinching focus on digital.