Introduction to Data Mining 2e (Global Edition) Pang-Ning Tan,
Michael Steinbach, Vipin Kumar (Test Bank All Chapters, 100%
Original Verified, A+ Grade)
1
Introduction
1. [Fall 2008]
For each data set given below, give specific examples of classification,
clustering, association rule mining, and anomaly detection tasks that
can be performed on the data. For each task, state how the data matrix
should be constructed (i.e., specify the rows and columns of the matrix).
(a) Ambulatory Medical Care data1 , which contains the demographic
and medical visit information for each patient (e.g., gender, age,
duration of visit, physician’s diagnosis, symptoms, medication, etc).
Answer:
Classification
Task: Diagnose whether a patient has a disease.
Row: Patient
Column: Patient’s demographic and hospital visit information (e.g., symptoms), along with
a class attribute that indicates whether the patient has the disease.
Clustering
Task: Find groups of patients with similar medical conditions
Row: A patient visit
Column: List of medical conditions of each patient
Association rule mining
Task: Identify the symptoms and medical conditions that co-occur together frequently
Row: A patient visit
Column: List of symptoms and diagnosed medical conditions of the patient
Anomaly detection
Task: Identify healthy looking patients with rare medical disorders
Row: A patient visit
Column: List of demographic attributes, symptoms, and medical test results of the patient
1
See for example, the National Hospital Ambulatory Medical Care Survey http://www.
cdc.gov/nchs/about/major/ahcd/ahcd1.htm
,2 Chapter 1 Introduction
(b) Stock market data, which include the prices and volumes of various
stocks on different trading days.
Answer:
Classification
Task: Predict whether the stock price will go up or down the next trading day
Row: A trading day
Column: Trading volume and closing price of the stock the previous 5 days and a class
attribute that indicates whether the stock went up or down
Clustering
Task: Identify groups of stocks with similar price fluctuations
Row: A company’s stock
Column: Changes in the daily closing price of the stock over the past ten years
Association rule mining
Task: Identify stocks with similar fluctuation patterns(e.g., {Google-Up, Yahoo-Up})
Row: A trading day
Column: List of all stock-up and stock-down events on the given day.
Anomaly detection
Task: Identify unusual trading days for a given stock (e.g., unusually high volume)
Row: A trading day
Column: Trading volume, change in daily stock price (daily high − low prices), and average
price change of its competitor stocks
(c) Database of Major League Baseball (MLB).
Classification
Task: Predict the winner of a game between two MLB teams.
Row: A game.
Column: Statistics of the home and visiting teams over their past 10 games they had played
(e.g., average winning percentage and hitting percentage of their players)
Clustering
Task: Identify groups of players with similar statistics
Row: A player
Column: Statistics of the player
Association rule mining
Task: Identify interesting player statistics (e.g., 40% of right-handed players have a batting
percentage below 20% when facing left-handed pitchers)
Row: A player
Column: Discretized statistics of the player
Anomaly detection
Task: Identify players who performed considerably better than expected in a given season
Row: A (player,season) pair e.g, (player1 in 2007)
Column: Ratio statistics of a player (e.g., ratio of average batting percentage in 2007 to
career average batting percentage)
2
, 2
Data
2.1 Types of Attributes
1. Classify the following attributes as binary, discrete, or continuous. Also
classify them as qualitative (nominal or ordinal) or quantitative (interval
or ratio). Some cases may have more than one interpretation, so briefly
indicate your reasoning if you think there may be some ambiguity.
(a) Number of courses registered by a student in a given semester.
Answer: Discrete, quantitative, ratio.
(b) Speed of a car (in miles per hour).
Answer: Discrete, quantitative, ratio.
(c) Decibel as a measure of sound intensity.
Answer: Continuous, quantitative, interval or ratio. It is actually
a logratio type (which is somewhere between interval and ratio).
(d) Hurricane intensity according to the Saffir-Simpson Hurricane Scale.
Answer: Discrete, qualitative, ordinal.
(e) Social security number.
Answer: Discrete, qualitative, nominal.
2. Classify the following attributes as:
• discrete or continuous.
• qualitative or quantitative
• nominal, ordinal, interval, or ratio
, 4 Chapter 2 Data
Some cases may have more than one interpretation, so briefly indicate
your reasoning if you think there may be some ambiguity.
(a) Julian Date, which is the number of days elapsed since 12 noon
Greenwich Mean Time of January 1, 4713 BC.
Answer: Continuous, quantitative, interval
(b) Movie ratings provided by users (1-star, 2-star, 3-star, or 4-star).
Answer: Discrete, qualitative, ordinal
(c) Mood level of a blogger (cheerful, calm, relaxed, bored, sad, angry
or frustrated).
Answer: Discrete, qualitative, nominal
(d) Average number of hours a user spent on the Internet in a week.
Answer: Continuous, quantitative, ratio
(e) IP address of a machine.
Answer: Discrete, qualitative, nominal
(f) Richter scale (in terms of energy release during an earthquake).
Answer: Continuous, qualitative, ordinal
In terms of energy release, the difference between 0.0 and 1.0 is not
the same as between 1.0 and 2.0. Ordinal attributes are qualitative;
yet, can be continuous.
(g) Salary above the median salary of all employees in an organization.
Answer: Continuous, quantitative, interval
(h) Undergraduate level (freshman, sophomore, junior, and senior) for
measuring years in college.
Answer: Discrete, qualitative, ordinal
3. For each attribute given, classify its type as:
• discrete or continuous AND
• qualitative or quantitative AND
• nominal, ordinal, interval, or ratio
Indicate your reasoning if you think there may be some ambiguity in
some cases.
Example: Age in years.
Answer: Discrete, quantitative, ratio.
4