Information Systems and Data Analytics 2 (INFO2006)
Summary
Summary: Information Systems and Data Analytics 2 (INFO2006) - Data Inspection, Extraction, Loading, Transformation and Modelling
2 views 0 purchase
Course
Information Systems and Data Analytics 2 (INFO2006)
Institution
University Of The Witwatersrand (wits)
This quick summary contains everything you need to know for this section of the INFO2006 course. These 4 pages condense the lecturer's notes as well as her explanations from lectures. Perfect for a cram session the night before an exam.
Information Systems and Data Analytics 2 (INFO2006)
All documents for this subject (5)
Seller
Follow
uvthisingh
Content preview
Understand The Concepts of Data Inspection, Data Extraction, Loading, Data
Transformation and Data Modelling
1. Data Inspection
To ensure that you are dealing with the right information you need a clear view of your data at every stage of the transformation process.
Data Inspection is the act of viewing data for verification and debugging purposes, before, during, or after a translation.
2. Data Cleansing 3. Data Inspection
If the data is well maintained and the data format be consistent, then After the dataset is cleansed, determine the quality of the dataset.
little action is required on the part of the data scientist. At this stage, the data scientist would seek to determine the following:
It the responsibility of the data scientist to process and standardise the o How much usable data do I have?
data as required, until it is in a usable state. o How complete is the data set?
When cleansing a dataset, data-scientists seek to: o What form does it take?
o Remove null or invalid results. o What type of data do I have?
o Standardise data within a single relevant, usable format. Once these are determined, one of two things must happen.
o Unify disparate data sources in a consistent format. o The dataset is determined as useful, and project proceeds
o Maintain the integrity of the source dataset. as intended or there is a request for additional data.
Actual data sets are often much larger and typically contain a large
variety of disparate values requiring greater scrutiny.
4. Data Extraction
The process of obtaining data from a database so that it can be replicated to a destination.
Data extraction is the first step in a data ingestion process called ETL — extract, transform, and load. The goal of ETL is to prepare data for analysis or business
intelligence (BI).
4.1 Types of Data Extraction Full extraction
Extraction jobs may be scheduled, or analysts may extract data on o The first time you replicate any source you must do a full
demand based on business needs and analysis goals. extraction, and some data sources have no way to identify
Data can be extracted in three primary ways: data that has been changed, so reloading a whole table
Update notification may be the only way to get data from that source.
o Easiest way - the system will issue a notification when a o Because full extraction involves high data transfer volumes, it
record has been changed. is not the best option if you can avoid it.
o Most databases provide a mechanism for this so that they 4.2 The Data Extraction Process
can support database replication. Involves the following steps:
Incremental extraction 1. Check for changes to the structure of the data - the addition of
o Some data sources are unable to provide notification about new tables and columns. Changed data structures must be dealt
an update, but they can identify which records have been with programmatically.
modified and provide an extract. 2. Retrieve the target tables and fields from the records specified by
o During subsequent ETL steps, the data extraction code the integration’s replication scheme.
needs to identify and propagate changes. 3. Extract the appropriate data, if any.
o One drawback of this is that it may not be able to detect It is important to understand the context of your data sources and
deleted records, because there is no way to see a record destinations and use the right tools.
that is no longer there.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller uvthisingh. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.66. You're not tied to anything after your purchase.