100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Data Analytics (ITNPBD6) Class Notes $20.49   Add to cart

Class notes

Data Analytics (ITNPBD6) Class Notes

 112 views  1 purchase
  • Course
  • Institution

Handwritten notes related to the Data Analytics (ITNPBD6) course at the University of Stirling. I obtained a high 1st in the module. The notes cover all the basics Machine Learning principles and techniques. The outline is as follows: Chapter 1: Introduction to Data Analytics Chapter 2: Model ...

[Show more]

Preview 4 out of 74  pages

  • July 12, 2021
  • 74
  • 2020/2021
  • Class notes
  • Multiple
  • All classes
avatar-seller
ITNBD6




DATA ANALYTICS

,CHAPTER 1

INTRODUCTION TO DATA ANALYTICS

Objectives:

• Describe CRISP-DM and how it can be applied to real-world problems
• Recognise the differences between variable types
• Discuss the differences between continuous and discrete distributions
• Identify the need for data cleaning
• Load a dataset and use visualisations to clean the data in both Orange and Python


1.1 Data Analysis
1.1.1 Model
In data analysis the approach
,
is driven by learning something
about data that would have been hard or even impossible to write

computer code for by hand .




The knowledge learnt is then embedded in what is called a model ,
a general framework capable of performing a particular
task Typically the model will take in data points and output
.
,


predictions or estimates It has a number of parameters that
.




are determined as part of the learning process and is a
representation of what has been learned about a data set .




The functionality of the model is determined by the data and
not by pre programmed rules
-
.




• Data mining :
process of learning patterns , making predictions and
building the model .





Hyper parameters :
settings that control how the model learns
and operates .





Learning 1 training :
process by which a model 's parameters are

determined .





Inference :
process of providing previously unseen data to a
trained model and making predictions or estimates
about them .

,1.1.2 Data
Data is the raw material used for machine learning consisting
of a set of variables .
Each variable can take a range of values
known as its domain .




Water volume = C ?) minutes
-


b
-


k d
variable parameter variable



The data in question is a snapshot of real world and data
mining assumes that whatever produced the data will in some

way continue to produce it in the same way in the future .




We might encounter problems with this approach as the data
we're provided with might :




• have errors
• be incorrect

be missing parts

be insufficient in quantity



A collection of data ,
known as a data set ,
contains a set of
values for a number of variables It .
is often represented in
tabular format in which one row is a single data point ( or
instance ) and is made up of a value for each of the variables
in table
the .
A column of the table corresponds to a single
variable .




1.1.3 Supervised vs. Unsupervised Learning
In supervised learning the data the model is trying learn
to
from is marked with the correct values and it can be used to
test the model
quality of a .




It involves data that describes both the inputs and outputs
to the system and requires a
mapping to be learned from
the inputs to the outputs .




In unsupervised learning there is no existing set of clusters to
compare against .




It involves only the inputs and requires the algorithm to
organize and characterize the data in some way .

, 1.1.4 Tasks performed with Data Mining
SUPERVISED LEARNING



Classification :



An inputpattern is classified as belonging to one of a

number of possible classes The output variable is .




nominal and the inputs can be a mix of numerical
and nominal .





Prediction 1 Regression :



A continuous output value is calculated from an
input pattern The learning task is to find the
.




relationship between the input variables and one or
more output variables The inputs can be a mixture .




of numeric and nominal variables but the output of ,


a regression task is always numeric .




UNSUPERVISED LEARNING



Clustering :


Data points that are close to each other , by some

distance metric ,
are assigned to one of a number of
clusters so that members of different clusters are

far apart .




The input variables can be numeric or nominal .




Clustering is similar to classification except that the
class labels are not given by the training data but ,



they are inferred from the distribution of points in the
input data .





Novelty detection :



Requires the system to spot patterns of data that
have not been seen before There is no output variable .




in the training data but the resulting system will have
a binary output that classifies each input pattern as

novel or not .





probability distribution estimation :


Build a model that takes a single data point as input
and produces an estimate of the density of the
population data at that point .




A model is built from the data in the form of a

function from the inputs X to a probability estimate ,


which is not known and must be inferred
p CX ) ,
.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller clacc. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $20.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

82956 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$20.49  1x  sold
  • (0)
  Add to cart