Chapter 2 Data Management and Wrangling
21. Data Management
Data wrangling – the process of retrieving, cleansing, integrating, transforming, and enriching data to support subsequent data analysis
oObjectives
Improve data quality
Reducing the time and effort required to perform analytics
Helping reveal the true intelligence in the data
oThe inability to clean and organize big data is one of the primary barriers preventing organizations from taking full advantage of business analytics Data management – the process that an organization uses to acquire, organize, store, manipulate, and distribute data Database- a collection of data logically organized to enable easy retrieval, management, and distribution of data Relational database- one or more logically related data files, often called tables or relations oT wo-dimensional grid with rows (records or tuples) and columns (fields or attributes)
oColumns (Ex. Sex of a customer, price of a product) contain a characteristic of a physical object (product or places), event (business transactions), or person (customer, students)
oRecord- a collection of related columns, which represent an object, event, or person Database management system (DBMS) – a software application for defining, manipulating, and managing data in databases
Data Modeling: The Entity-Relationship Diagram
Data Modeling- the process of defining the structure of a database Entity-Relationship Diagram (ERD)- a graphical representation used to model the structure of the data Entity- a generalized category to represent persons, places, things, or events about which we want to store data in a database table
Instance- a single occurrence of an entity oIn most instances, represented as a record in a database table
Relationship- represents certain business facts or rules oOne-to-one (1:1)
Less common than the other two Ex. Describes a situation where each department can have only one manager, and each manager can only manage one department