100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Analysis of Customer Data (880655-M-6) $8.62   Add to cart

Summary

Summary Analysis of Customer Data (880655-M-6)

 101 views  15 purchases
  • Course
  • Institution

Course grade: 9.0. Extensive summary for the course Analysis of Customer Data. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Cule and dr. G. Napoles, as part of the MSc Data Science & Society at Tilburg Un...

[Show more]
Last document update: 8 months ago

Preview 4 out of 65  pages

  • October 3, 2023
  • January 19, 2024
  • 65
  • 2023/2024
  • Summary
avatar-seller
Analysis of Customer Data
MSc Data Science & Society
Tilburg University




1

,Module 1. Introduction and Frequent Pattern Mining

1. Introduction
“People who buy diapers also buy beer”. How did we get there? How far did we go from there?

Types of Customer Data
- Purchase data
- Click logs: refer to the records of user interactions on a website, where each click or action
taken by a person during their visit is documented.
- Trajectories (GPS data)
- Opinions (ratings, reviews)
- Bank transactions
- Demographic data

Companies can capitalize on customer data in various ways, directly or indirectly. One approach
involves the collection of customer data which is then sold to other companies. Alternatively,
companies can indirectly benefit by tracking customer data to enhance their website's
performance and consequently boost sales. For instance, analyzing customer patterns through
click logs can aid in optimizing profits and improving overall services.

What Can We Do with Customer Data?
- Classification (supervised): e.g., should the bank give a loan to a particular customer
based on historical data on similar customers?
- Clustering (unsupervised): e.g., find sub-groups of customers to organize targeted
marketing campaigns.
- Recommender Systems: e.g., what product might the customer want to buy next?
- Next Event Prediction: e.g., pre-fetching web pages in expectations of a click. This entails
analyzing sequential data like web-clicks to forecast whether a customer will make a
purchase or not. Various methods can be employed; certain websites even customize
prices based on insights derived from customer behavior patterns.

The Focus of this Course: Pattern Mining (Building Block of Other Applications)
Patterns are interesting:
- People who buy diapers also buy beer (increase profit, complementary sales)
- People who like “Lord of the Rings” also like “Harry Potter” (recommender system)
- People read domestic news before international news

Patterns are also useful in many other applications:
- Classify/cluster customers based on common patterns in their data
- Recommend items to customers based on patterns in their purchase behavior (and
patterns in similar customers’ purchase behavior)
- Find anomalies in bank transactions (potential fraud): This process differs slightly from
classification. It entails classifying clusters of behaviors categorized as "normal" (no fraud)
versus "abnormal" (fraud) based on patterns. Since instances of "abnormal" behavior are
less frequent than "normal" behavior, the challenge lies in distinguishing anomalies.
However, the learning from patterns in "normal" behavior enables us to recognize these



2

, patterns, thus facilitating their classification as "normal" behavior. If any of the patterns
do not occur in “abnormal behavior”, we could flag these instances as “abnormal”.
- Place beer close to diapers in supermarket shelves.


2. Frequent Itemsets & Association Rules
Association Rule Mining
- Agrawal et al. introduced the model in 1993, which has become a significant focus of
study in the database and data mining community.
- The model is designed for data mining and operates on categorical data only, lacking a
suitable algorithm for numerical data. Note that the products are items, never numbers.
- Its initial application was in Market Basket Analysis, seeking relationships between items
purchased by customers.
- For instance, a rule like {Bread} → {Milk} [sup = 5%, conf = 100%] indicates that 5% of
transactions contain both bread and milk, and whenever bread is purchased, milk is also
bought with 100% certainty. Note that this is a one-way relationship. This is not the same
as saying “people who buy milk, also buy bread”.

The Model: Data
- 𝐼 = {𝑖% , 𝑖' , … , 𝑖) } represents a set of items. All possible items that we encounter in the
dataset.
- A transaction 𝒕 refers to a set of items, where 𝑡 is a subset of the set 𝐼 (𝑡 ⊆ 𝐼).
- The transaction database 𝑻 consists of a collection of transactions 𝑇 = {𝑡% , 𝑡' , . . , 𝑡1 }.

Transaction Data: Supermarket Data
- Market basket transactions are represented as 𝑡% , 𝑡' , ..., 𝑡1 , where each transaction
corresponds to a basket with a collection of items purchased.
o 𝑡% : {bread, cheese, milk}
o 𝑡' : {apple, eggs, salt, yoghurt}
o ...
o 𝑡1 : {biscuits, eggs, milk}
- Concepts:
o An item refers to an individual product or article found in a basket, such as bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o 𝑰 represents the set of all items available for sale in the store, including bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o A transaction refers to the items purchased in a basket; it may have a transaction
ID (TID)
o A transactional dataset is a set of all the transactions recorded, representing the
collective data of items purchased by customers.

Transaction Data: A set of Documents
In the context of market basket transactions, the data consists of a set of documents, where each
document represents a "bag" of keywords or items. Typically, we would remove stop-words, such
as ‘the’, ‘a’, etc. Examples:
- doc1: {Student, Teach, School}
- doc2: {Student, School}
- doc3: {Teach, School, City, Game}



3

, - doc4: {Baseball, Basketball}
- doc5: {Basketball, Player, Spectator}
- doc6: {Baseball, Coach, Game, Team}
- doc7: {Basketball, Team, City, Game}

The Model: Rules
The model used for mining association rules is based on the concept of "itemsets" and
"association rules":
- A transaction t contains X, an itemset I, if 𝑋 ⊆ 𝑡.
o {Coach, Game} is an itemset that appears in document 6.
- Association Rule: An association rule is an implication of the form X ⇒ Y, where X and Y
are subsets of the set of items (I), and X and Y have no intersection (items in common).
o 𝑋 ⇒ 𝑌, where 𝑋, 𝑌 ⊂ 𝐼, and 𝑋 ∩ 𝑌 = ∅
- Itemset: Again, an itemset is a set of items. For example, {milk, bread, cereal} is an itemset,
and a single item like {cheese} is an itemset of size 1.
- k-Itemset: A k-itemset is an itemset with k items. For instance, {milk, bread, cereal} is a
3-itemset.

Rule Strength Measures
The strength of association rules is measured using two metrics:
1. Support: Support measures the percentage of transactions containing both X and Y. It can
be expressed as the probability of X and Y occurring together (X ∪ Y). For instance, a rule
with sup = 0.5 means that 50% of transactions contain both X and Y.
o sup = Pr(X ∪ Y)
2. Confidence: Confidence indicates the percentage of transactions containing X that also
contain Y. It represents the conditional probability of Y given X (conf = Pr(Y | X). A rule
with conf = 0.8 means that 80% of transactions containing X also contain Y. In other
words, it is the probability of Y in transactions that already contain X.
o conf = Pr(Y | X)

Support and Confidence
Support count refers to the number of occurrences of an itemset X in a dataset T. In other words,
it counts how many transactions in the dataset contain the specific itemset X. Assuming the
dataset T contains n transactions, the support of the itemset X, denoted as "Support," is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the total
number of transactions (n).

On the other hand, confidence measures the strength of an association rule X ⇒ Y. It is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the count of
transactions containing only the itemset X (X.count). This ratio represents the likelihood that
when X occurs, Y will also occur in a transaction.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑇 𝑛

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋 𝑋. 𝑐𝑜𝑢𝑛𝑡




4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller tiu43862142. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.62. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

73314 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$8.62  15x  sold
  • (0)
  Add to cart