,Table of Contents
Lecture 1: Database management systems, relational data models, and SQL ....................................... 1
1.1. Database management systems .................................................................................................. 1
1.2. Relational data model .................................................................................................................. 3
1.3. Single table queries using SQL ...................................................................................................... 4
Lecture 2: Entity relationship and translating from a natural language specification ............................ 5
2.1. Basic concepts .............................................................................................................................. 5
2.2. Relationships, degrees & cardinalities ......................................................................................... 9
2.3. Generalization & specialization .................................................................................................. 15
Lecture 3: Transforming ERD to relational schema, and normalization ............................................... 19
3.1. Transforming ERDs ..................................................................................................................... 19
3.2. Data normalization ..................................................................................................................... 25
Lecture 4: Evolution of data management, big data, and data intensive systems ............................... 28
4.1 Evolution of data management ................................................................................................... 28
4.2. Big data analytics ........................................................................................................................ 28
4.3. Reasons for going beyond traditional RDBMS ........................................................................... 30
4.4. Storage layer............................................................................................................................... 32
4.5. Computation layer ...................................................................................................................... 33
Lecture 5: The Spark ecosystem, RDDs, programming model, and PySpark ........................................ 40
5.1. Lambda expressions ................................................................................................................... 40
5.2. Apache Spark .............................................................................................................................. 41
5.3. RDDs ........................................................................................................................................... 41
5.4. Programming model ................................................................................................................... 43
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, and more ........... 49
6.1. Processing multiple tables .......................................................................................................... 49
6.2. Views .......................................................................................................................................... 50
6.3. Functions .................................................................................................................................... 51
6.4. Creating & populating ................................................................................................................ 53
6.5. Data from websites, integration & cleaning, and entity extraction & resolution ...................... 56
6.6. Integration & cleaning ................................................................................................................ 59
,Lecture 1: Database management systems, relational data models,
and SQL
1.1. Database management systems
Reasons for database management systems (DBMS): it offers solutions to the following problems:
• Data redundancy and consistency: multiple file formats, duplication in different files.
• Difficulty in accessing data: need to write a new program to carry out each new task.
• Data isolation: multiple files and formats.
• Integrity problems: integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly. Hard to add new constraints or change
existing ones.
• Atomicity of updates: transfer of funds from one account to another should either be
complete or not happen at all. Failures may leave data in an inconsistent state with partial
updates carried out.
• Concurrent access by multiple users: uncontrolled concurrent accesses can lead to
inconsistencies.
o Example: two people reading a balance (e.g., €100) and then withdrawing money (e.g.,
50 for person A, 70 for person B) at the same time.
• Security problems: hard to provide user access to some, but not all, data.
Database (DB): shared collection of data with the same structure, including correlations and
relationships for a common purpose.
DBMS: a collection of programs that manages the database structure and controls access to the data
stored in the database. It offers functions and methods to build and manipulate the data. It can be
seen as a black box interacting between users/applications and the database.
Goals of a DBMS: separate data from application.
• Provide an interface that the application programmer must follow.
• Allow system administrator to make modifications without having an impact on the user, for
example improve or reconfigure systems.
• Users can change their view of the data without having to worry about how it is stored.
1
, Layers of a DBMS (architecture):
• Internal layer: software for storing and structuring the data and offers efficient access
methods.
• Logical layer: optimization of queries, resolves conflicting accesses of multiple users and
guarantees constant availability (even in case of failures).
• External layer: communicates with users, analyses user requests/queries, controls access and
presents the answers.
Development process / life cycle of a DBMS:
• Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem. Steps include analyzing the current data processing and
general business functions and needs.
• Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements. The output is a conceptual schema/ERD that corresponds to a
detailed, technology independent specification of the overall organizational data structure.
• Logical design: representation of the database. Transform the conceptual schema, i.e.,
outcome of previous step, in terms of the data management system.
• Physical design: the set of specifications that describe how data are stored in a computer’s
secondary memory by a specific database management system.
• Implementation: build database implementation, populate with data, install and test
applications, complete documents and training materials.
• Maintenance: monitor the operation and usefulness of the system. Repair errors in the
database and applications. Enhance by analyzing the database and applications to ensure that
evolving information requirements are met.
2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller IMstudentTiU2122. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.41. You're not tied to anything after your purchase.