Summary

Practical 5.1: Advanced Data Analysis: full summary + explanations

65 views 10 purchases

Course
Advanced Data Analysis (2052FBDBMW)

Institution
Universiteit Antwerpen (UA)

PART 1 of Practical 5: Advanced Data Analysis: full summary + explanations, screenshots and answers to questions

[Show more]

Preview 2 out of 11 pages

View example

Uploaded on May 5, 2023
Number of pages 11
Written in 2021/2022
Type Summary

advanced data analysis
practical 5
r

Institution
Universiteit Antwerpen (UA)
Education
Biomedische Wetenschappen
Course
Advanced Data Analysis (2052FBDBMW)

Bi0med

Member since 5 year 494 documents sold

$4.88

Added

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Supervised classification with Decision Trees
and Random Forests

1 The Breast Cancer dataset
This dataset contains features regarding images of malignant and benign
breast tumors. There are ten continuous features that describe the size and
shape of each tumor. These are a) radius (mean of distances from center to
points on the perimeter) b) texture (standard deviation of gray-scale values)
c) perimeter d) area e) smoothness (local variation in radius lengths) f)
compactness (perimeter
/ area - 1.0) g) concavity (severity of concave portions of the contour) h) con-
cave points (number of concave portions of the contour) i) symmetry j) fractal
dimension ("coastline approximation" - 1).

The goal of this dataset will be to use to provided features to predict if a
sample is malignant (M) or benign (B). The dataset has been split up into
a training data set and a test data set. We can read in the dataset with the
following command:
cancer.train<-read.csv(file="breast-cancer-
train.csv",header=TRUE,stringsAsFactors=T)
cancer.test<-read.csv(file="breast-cancer-
test.csv",header=TRUE,stringsAsFactors=T)
summary(cancer.train)

2 Decision tree
We will use the ’rpart’ R package to learn and apply our decision trees. Install
it from CRAN if you have not already done so. We can load in the library with
the standard command:
library(rpart) library(rpart.plot)
We then need to apply it to the breast cancer dataset. One of the standard
optimizations that is part of the rpart() function is to optimize the number
of branches to include in the decision tree. The more branches, the higher the
chance to overfit, but we need some branches to solve our classification problem.
It does this by running an internal cross-validation, where part of the training
data is held out to validate, to optimize this branch parameter.
tree <-
rpart(diagnosis~.,data=cancer.train,method="class")
#Overview of the optimization
printcp(tree)

1

, #CV optimization of branch
number plotcp(tree)

We needed to set our method to "class" for classification. Now that we have
an optimal decision tree for classification, we can visualize it by using the
following command:
rpart.plot(tree,main="Decision Tree for Cancer Dataset")
This gives a clear overview of the features that are being used in the decision
tree.

2

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Bi0med. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.88. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

79223 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Summary

Practical 5.1: Advanced Data Analysis: full summary + explanations

Document information

Subjects

Written for

Seller

Reviews received

Content preview