100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
JADS master Data Engineering Notes $6.43   Add to cart

Class notes

JADS master Data Engineering Notes

 17 views  1 purchase
  • Course
  • Institution

Excessive summary of the relevant notes in the data engineering course.

Preview 4 out of 65  pages

  • March 21, 2024
  • 65
  • 2023/2024
  • Class notes
  • Dr. indika weerashinga dewage
  • All classes
avatar-seller
Lecture 1: Introduction to Data Engineering
Week Week 2



A primer to data engineering
V3: Volume, Velocity and Variety
Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes, even petabytes, of information. (Amount)

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to
maximize its value. (Speed)

Variety: Big data is any type of data → structured and unstructured data such as text, sensor data, audio, video, click streams, log les and more. (Type)


(Big) Data Structure
Structured data: RDMSs

Semi-structured data: XML, JSON, CSV, etc.

Unstructured data: natural language, video, images, etc.




Processing Big Data: Data Pipelines
A data pipeline aggregates, organizes, and moves data to a destination for storage, insights, and analysis. Modern data pipeline systems automate the ETL (extract,
transform, load) process and include data ingestion, processing, filtering, transformation, and movement across any cloud architecture and add additional layers of
resiliency against failure.




Stages in a Big Data Pipeline




Lecture 1: Introduction to Data Engineering 1

, Lecture 2: Virtualization and Cloud Computing
Week Week 3



Virtualization
Virtualization is the ability to run multiple operating systems on a single physical system and share the underlying hardware resources

Uses software to create an abstraction layer over computer hardware that allows the hardware elements of a single computer (processors, memory, storage, and
more) to be divided into multiple virtual computers, commonly called virtual machines (VMs).

Each VM runs its own operating system (OS) and behaves like an independent computer, even though it is running on just a portion of the actual underlying computer
hardware.

Improves IT throughput and costs by using physical resources as a pool from which virtual resources can be allocated.


Virtual Architecture
A virtual machine (VM) is an isolated runtime environment (guest OS and applications)

Multiple virtual systems (VMs) can run on a single physical system




Hypervisor
A hypervisor, a.k.a. a virtual machine manager/monitor (VMM), or virtualization manager, is a program that allows multiple operating systems to share a single
hardware host.

Each guest operating system appears to have the host's processor, memory, and other resources all to itself. However, the hypervisor is actually controlling the host
processor and resources, allocating what is needed to each operating system in turn and making sure that the guest operating systems (in virtual machines) cannot
disrupt each other.


Benefits virtualization
Economies of Scale: Sharing of resources helps cost reduction

Isolation: Virtual machines are isolated from each other as if they are physically separated

Encapsulation: Virtual machines encapsulate a complete computing environment

Hardware Independence: Virtual machines run independently of underlying hardware

Portability: Virtual machines can be migrated between different hosts.




The Cloud
A style of computing where massively scalable (and elastic) IT-related capabilities are provided “as a service” to external customers using Internet technologies


What’s new

Acquisition Model: Based on purchasing of services

Business Model: Based on pay for use

Access Model: Over the internet to any device

Technical Model: Scalable, elastic, dynamic, multi-tenant & sharable




Cloud computing
“A consumption and on-demand delivery computing paradigm that enables convenient network access to a shared pool of configurable and often virtualized
computing resources (e.g., networks, servers, storage, middleware and applications as services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction”




Lecture 2: Virtualization and Cloud Computing 1

, Cloud computing is one answer to this crisis of complexity in the data Center

Clouds primarily as a new way of consuming and delivering IT services




Three aspects cloud modelling

Self-service: A new relationship with IT, which enables the user a degree of freedom in configuring and accessing services and can dramatically reduce labor on the
delivery side

Flexibility sourcing options: The idea of more choices and, a hybrid modes of delivery that allows CIOs to optimize costs and qualities of service by work load

Greater focus on scale: enables both new economics and new capabilities




Why cloud
Cost reduction

Lower infrastructure costs

Lower maintenance and energy costs

Elasticity / Scalability

Capacity only when you need it

Ability to handle expected or unexpected changes in load

Achieve high business agility

Speed to serve

Reduction of time to pilot and test projects

Faster availability to customers

High performance computing

Increase capacity from your current physical infrastructure

Avoid provisioning (and paying) for the peak

“Infinite” computing capacity on demand




Cloud Service Delivery Models / Usage Models




Cloud Service Type
There are three cloud service types




Lecture 2: Virtualization and Cloud Computing 2

, IaaS

company needs a virtual machine, opt for infrastructure as a service

PaaS

company requires a platform for building software products, pick platform
as a service

SaaS

company doesn’t want to maintain any it IT equipment, choose software
as service

customer of SaaS is called a tenant

can be individual user or a group of users (e.g. customer organization)




Cloud Deployment Models
Public Clouds: The cloud infrastructure is available to the general public (anyone wanting to use or purchase cloud services).

Private Clouds: The cloud infrastructure is operated solely by a single organization.

Community Clouds: is available to members of a community. A community can be a set of organizations with similar requirements and goals (e.g., universities).

Hybrid Clouds: is a combination of public and private clouds.

Multi Clouds: is a combination of more than one public cloud (a private cloud can also be included).



Public Clouds Private Clouds

Often depicted as being available to users from a third-party provider Offer many of the same benefits as “public” clouds but are managed within the
organization
“Public” clouds are typically made available via the internet and may be free or
inexpensive to use These types of clouds are not burdened by network bandwidth and availability
issues or potential security exposures that may be associated with public
e.g. Amazon Web Services
clouds
Greater risks in terms of security, resiliency, transparency and performance
Can offer the provider and user greater control, security and resilience
predictability
Better cost effectiveness and agility
Key benefit: tremendous elasticity
Move to SLA based service delivery

Lower elasticity in comparison to external clouds

single-tenant environment: all resources are accessible to one customer only
(isolated access)

Typically hosted on-premises in the customer’s data center (Can be hosted on
an independent cloud provider’s infrastructure)




Tenancy Models for SaaS Application
A customer of a SaaS application is called a tenant. A tenant of a SaaS can be an individual user or a group of users, such as a customer organization.

There are three main tenancy models to be used for SaaS applications

Single tenant

Mixed tenant

Multi-tenant



Single Tenant Model
3-tier Simple Example: A single dedicated instance of an application is deployed for each customer




Lecture 2: Virtualization and Cloud Computing 3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller juultjevandervelden. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.43. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

72042 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$6.43  1x  sold
  • (0)
  Add to cart