Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Summary Data Engineering €5,48   Ajouter au panier

Resume

Summary Data Engineering

 141 vues  3 achats

Samenvatting van 7 studenten, voorbeeldexamen, summary of classes

Aperçu 4 sur 167  pages

  • 9 juin 2020
  • 167
  • 2019/2020
  • Resume
Tous les documents sur ce sujet (5)
avatar-seller
ecgef
Data Engineering
Samenvatting




Prof. Dr. Dieter Devlaminck

Assistenten Tom Vermeire, Yanou Ramon

Kevin Milis, Stiene Praet




1

,Table of Contents
I. Week 1 ....................................................................................................................... 7
1. Welcome to data engineering ............................................................................................. 7
1.1 Overview: ......................................................................................................................................... 7
1.2 Pop quiz ............................................................................................................................................ 7
1.3 Differentiating between data engineering and data science? ......................................................... 7
1.4 About this course............................................................................................................................ 10
1.5 Exam ............................................................................................................................................... 12
2. File formats .......................................................................................................................12
2.1 Overview: ....................................................................................................................................... 12
2.2 Formats........................................................................................................................................... 13
a) Human readable formats .................................................................................................................... 13
b) Not human readable and compressed file formats ............................................................................ 18
2.5 When to use what? ........................................................................................................................ 20

3 Python concepts ................................................................................................................21
3.1 Overview......................................................................................................................................... 21
3.2 Programming paradigms ................................................................................................................ 21
3.3 Python functions are first-class objects .......................................................................................... 21
3.4 Anonymous lambda function ......................................................................................................... 21
3.5 Passing functions to other functions .............................................................................................. 22
3.6 Sorting elements of an iterable ...................................................................................................... 22
3.6.1 Sorting (index for python starts at 0) ............................................................................................. 22
3.7 Partial functions.............................................................................................................................. 23
3.8 Collections ...................................................................................................................................... 23
3.8.1 defaultdict ...................................................................................................................................... 23
3.8.2 counter ........................................................................................................................................... 24
3.9 Map the elements of an iterable .................................................................................................... 24
3.10 Itertools: zip (je kan over meerdere lists tegelijkertijd itereren).................................................... 24
3.11 Itertools .......................................................................................................................................... 24
3.11.1 combinations (vindt alle mogelijke combinaties tussen de verschillende elementen in een list
(afh. van de parameters die je meegeeft) ................................................................................................... 24
3.11.2 permutations.............................................................................................................................. 25
3.12 One-line dot-product ...................................................................................................................... 25
3.13 Unicode .......................................................................................................................................... 25
3.14 Dates and times .............................................................................................................................. 27

II. Week 2 ..................................................................................................................... 28
1. Computer architecture and os ...........................................................................................28
1.1 Basic computer architecture and operating systems ......................................................28
1.1.1 At the end of this lecture ................................................................................................................ 28
1.1.2 Why do I need to know about computer architecture? ................................................................. 28
1.1.3 The main components of a computer ............................................................................................ 28
1.1.4 The clock frequency (speed of CPU) and the architecture of the CPU influence the number of
instructions that can be executed per second............................................................................................. 29
1.1.4.1 Parallelism will have a bigger impact on your data processing.................................................. 29
1.1.4.2 Modern CPU packages contain multiple cores improving parallelism ....................................... 29


2

, 1.1.4.3 The speed gap between CPU and memory ................................................................................ 30
1.1.4.4 Caching to optimize your data flow ........................................................................................... 30
1.1.5 Disks: high capacity, very slow storage devices.............................................................................. 31
1.1.6 Hard Disk Drive (HDD) .................................................................................................................... 31
1.1.6.1 HDD latency................................................................................................................................ 32
1.1.7 Solid State Disks (=transistors) ....................................................................................................... 32
1.1.8 Scaling: vertical vs horizontal ......................................................................................................... 33
1.2 Operating system level ..................................................................................................34
1.3 What is an operating system (OS) .................................................................................................. 34
1.3.1 The operating system hides some of the hardware complexity..................................................... 34
1.4 Process management: .................................................................................................................... 35
1.4.1 Process Control Block (PCB)............................................................................................................ 36
1.4.2 Threads (exists within the same process) and concurrency ........................................................... 36
1.4.3 Scheduling ...................................................................................................................................... 37
1.5 Memory management and virtual memory ................................................................................... 37
1.6 Inter-process communication ........................................................................................................ 39
1.7 Input/Output management ............................................................................................................ 39
1.8 File systems as a way to organize files on (secondary) memory .................................................... 39
1.9 Directory structure ......................................................................................................................... 39
1.10 Distributed File Systems (DFS) ........................................................................................................ 40
1.11 Virtualization: virtual machine ....................................................................................................... 40
1.12 Containers: lightweight virtualization ............................................................................................ 41

2. Regular expressions ...........................................................................................................41
2.1 Overview......................................................................................................................................... 41
2.2 Regular expression = regex ............................................................................................................. 41
2.2.1 Extracting email addresses of all people registered for Data Engineering ..................................... 42
2.2.2 Applications .................................................................................................................................... 43
2.2.3 Regular expressions are like a mini-language where certain characters have a special meaning . 43
2.2.4 Searching for a literal...................................................................................................................... 44
2.2.5 Match any character with . ............................................................................................................. 44
2.2.5.1 Match a set of characters........................................................................................................... 44
2.2.5.2 Match a range of characters ...................................................................................................... 45
2.2.5.3 Negate a set of characters ......................................................................................................... 45
2.2.6 Some predefined set of characters ................................................................................................ 45
2.2.7 Repeat a pattern one or zero times (make it optional) .................................................................. 46
2.2.8 Repeat a pattern one or more times .............................................................................................. 46
2.2.9 Repeat a pattern exactly n times .................................................................................................... 46
2.2.10 Repeating operators are greedy ................................................................................................ 46
2.2.11 Capturing and non-capturing groups ......................................................................................... 46
2.2.12 Lookahead .................................................................................................................................. 47
2.2.13 Lookbehind................................................................................................................................. 47
2.2.14 Regexes gone wrong .................................................................................................................. 48
2.2.15 Put an ad on all urls that contain the name of the telecom operator ......... Error! Bookmark not
defined.
2.2.16 Why [^t] ? It’s easy to miss edge cases, might not be perfect ................................................... 48
2.3 Concluding remarks ........................................................................................................................ 48
2.4 Extra................................................................................................................................................ 49
3. Computer networks ...........................................................................................................50
3.1 Based on computer networking: a top-down approach by Kurose and Ross................................. 50
3.1.1 The internet described in terms of its hardware components ....................................................... 50
3.1.2 Hosts and the client-server model ................................................................................................. 50



3

, 3.1.3 Protocol .......................................................................................................................................... 51
3.1.4 Packet Switching ............................................................................................................................ 51
3.1.5 The internet protocol stack ........................................................................................................... 52
3.1.6 Layered structure and protocols: the protocol stack ..................................................................... 52
3.2 Network applications...................................................................................................................... 53
3.2.1 HTTP – HyperText Transformation Protocol ................................................................................... 54
3.2.1.1 The HTTP request message ........................................................................................................ 54
3.2.1.2 The HTTP headers: ..................................................................................................................... 55
3.2.1.3 The general structure of the HTTP request message................................................................ 55
3.2.1.4 The HTTP response message ...................................................................................................... 55
3.2.1.5 The general structure of the HTTP response message ............................................................. 56
3.2.1.6 If using non-secure http (not https), one’s login and password are send in clear text. ............. 56
3.2.1.7 HTTP using the TCP transport layer protocol to send its messages ........................................... 56
3.2.1.8 Addressing processes ................................................................................................................. 57
3.2.2 DNS – Domain Name System.......................................................................................................... 57
3.2.2.1 Why distributed? ....................................................................................................................... 58
3.2.2.2 DNS client used by a web browser............................................................................................ 58

III. Week 3: ................................................................................................................. 59
1. Cloud services ....................................................................................................................59
I.1 Overview .......................................................................................................................59
I.2 Defining CS ....................................................................................................................59
I.2.1 What is the cloud? .......................................................................................................................... 59
I.2.2 Managing your cloud account ........................................................................................................ 59
I.2.3 The cloud stack ............................................................................................................................... 59
I.2.4 Examples of SaaS on AWS .............................................................................................................. 59
I.2.5 Advantages of cloud computing ..................................................................................................... 60
I.2.6 Geographic organization of AWS .................................................................................................... 60
I.2.7 Virtualization .................................................................................................................................. 60
I.2.8 Native and hosted virtualization .................................................................................................... 61

I.3 Core AWS services .........................................................................................................61
I.3.1 Virtual server hosting: AWS EC2 ..................................................................................................... 61
I.3.2 EC2 price models ............................................................................................................................ 61
I.3.3 VPS: Virtual Private Cloud............................................................................................................... 63
I.3.4 Identity Access Management (IAM): Permissions and roles .......................................................... 63
I.3.5 EBS: Elastic Block Storage ............................................................................................................... 64
I.3.6 Security groups (comparable to firewall) ....................................................................................... 64
I.3.7 Storage Infrastructure .................................................................................................................... 66
I.3.8 Database services ........................................................................................................................... 67
I.4 Cloud architecture example ...........................................................................................68
2. The Linux Operating System ..............................................................................................69
2.1 Unix the standard operating system (OS)....................................................................................... 69
2.2 Linux: a Unix-like OS ....................................................................................................................... 70
2.3 Linux command line instructions (file manipulation) ..................................................................... 74
2.4 JQ .................................................................................................................................................... 77

IV. Week 4: ................................................................................................................. 78
1. Algoritmic Complexity .......................................................................................................78
1.1 Motivation ...................................................................................................................................... 78



4

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur ecgef. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €5,48. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

73314 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€5,48  3x  vendu
  • (0)
  Ajouter