100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
ISYE 6501 Introduction to Analytic Modelling Homework 2 Georgia Institute of Technology. $16.19   Add to cart

Exam (elaborations)

ISYE 6501 Introduction to Analytic Modelling Homework 2 Georgia Institute of Technology.

 23 views  0 purchase
  • Course
  • ISYE 6501
  • Institution
  • ISYE 6501

ISYE 6501 Introduction to Analytic Modelling Homework 2 Georgia Institute of Technology.

Preview 3 out of 23  pages

  • August 28, 2024
  • 23
  • 2024/2025
  • Exam (elaborations)
  • Unknown
  • isye 6501
  • ISYE 6501
  • ISYE 6501
avatar-seller
saraciousstuvia
ISYE 6501 Introduction to Analytic
Modelling Homework 2 Georgia
Institute of Technology.

, lOMoARcPSD| 43283024




ISYE6501 Homework 2


Clear environment

rm(list = ls())

Question 3.1
Using the same data set (credit_card_data.txt or credit_card_data-headers.txt) as in Question 2.2, use the
ksvm or kknn function to find a good classifier: (a) using cross-validation (do this for the k-nearest-neighbors
model; SVM is optional); and (b) splitting the data into training, validation, and test data sets (pick either
KNN or SVM; the other is optional).
3.1 (a)Answer - #load the kernlab and kknn library (which contains the kknn function) and set seed value

library(kernlab)
library(kknn)
library(MASS)
set.seed(100)



Read the Data from the data file provided. Optional check to make
sure the data is read correctly

creditcarddata <- read.table("credit_card_data.txt", stringsAsFactors = FALSE, header = FALSE)
head(creditcarddata)


## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 1 30.83 0.000 1.25 1 0 1 1 202 0 1
## 2 0 58.67 4.460 3.04 1 0 6 1 43 560 1
## 3 0 24.50 0.500 1.50 1 1 0 1 280 824 1
## 4 1 27.83 1.540 3.75 1 0 5 0 100 3 1
## 5 1 20.17 5.625 1.71 1 1 0 1 120 0 1
## 6 1 32.08 4.000 2.50 1 1 0 0 360 0 1

#I have used k fold cross validation - in this method the data set is divided into k datasets. A model is given
a known dataset (training data set- Training is done on the training data set) and a an unknown data set
(test data set) against which the model we find is tested. K fold cross validation is a procedure to estimate
the skill of the model on new data. For our given data set I am taking the value of k as 10, as it is the
standard value. By doing this I am dividing the dataset into 10 sample datasets, out of which K-1(9) is the
no. of train dataset and 1 dataset is for Test. Then I have to find the model for this combination.
Similarly, I have to find the model for other 9 combinations where every time the test data set will be
different leaving k-1 dataset for the train dataset. By doing this we are making sure that every data is used
in training as

, lOMoARcPSD| 43283024




well as test. Ultimately we will have 10 models after running all the datasets. now I will choose the model
with the best accuracy among all the models.
First Create 10 partitions of the data into a matrix. Times = 1 means spilt this data 1 time and find 90%
of the data. I have used the sapply function instead of doing loops coz sapply is computationally efficient.

K <- 10
creditcarddata_folds <- cut(seq(1, nrow(creditcarddata)), breaks = K, labels = FALSE)
head(creditcarddata_folds)


## [1] 1 1 1 1 1 1

cv.model <- sapply(1:K, FUN = function(i){
creditcard_vectorID <- which(creditcarddata_folds == i, arr.ind = TRUE)
creditcard_test <- creditcarddata[creditcard_vectorID, ]
creditcard_train <- creditcarddata[-creditcard_vectorID, ]
model <- qda(V11~V1+V2+V3+V4+V5+V6+V7+V8+V9+V10, data = creditcard_train)
model_pred <- predict (model, creditcard_test)
cv.model_mean <- mean(model_pred$class != creditcard_test$V11)
return(cv.model_mean)
})
cv.model

## [1] 0.50000000 0.40000000 0.41538462 0.39393939 0.10769231 0.07692308
## [7] 0.01515152 0.26153846 0.30769231 0.03030303

mean(cv.model)


## [1] 0.2508625

I get 10 values for all the samples of K taken. With the above code, I am getting least error in 7th value
but the mean error comes as 0.2508625.




3.1 (b)Answer -
My approach is to first divide the data sets to Validation data, Training data and Test data. I will be taking
70% of my data into the Training Data set and 15% each into Test and Validation data set. We can see 457
obs are in the Training Data set, 98 obs are in the validation data set and 99 obs are in the test data set.

creditcarddata_train <- creditcarddata[sample(1:nrow(creditcarddata),as.integer(0.7*nrow(creditcardda
creditcarddata_train


## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 503 0 25.33 2.085 2.750 1 1 0 0 360 1 0
## 358 0 20.83 0.500 1.000 0 1 0 1 260 0 0
## 624 0 15.75 0.375 1.000 0 1 0 1 120 18 0
## 470 0 18.83 4.415 3.000 1 1 0 1 240 0 1
## 516 1 36.33 3.790 1.165 1 1 0 0 200 0 0

2

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller saraciousstuvia. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $16.19. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

82191 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$16.19
  • (0)
  Add to cart