Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Full summary of the course Advanced Data Analysis (theoretical lessons) €15,49   Ajouter au panier

Resume

Full summary of the course Advanced Data Analysis (theoretical lessons)

1 vérifier
 243 vues  12 fois vendu

This document contains the full summary of all theoretical lessons of the course Advanced Data Analysis. For the exam also the practical lessons have to be known: these are uploaded separately on my profile. 1st Ma Biomedical Sciences

Aperçu 2 sur 70  pages

  • 10 mai 2022
  • 70
  • 2021/2022
  • Resume
Tous les documents sur ce sujet (19)

1  vérifier

review-writer-avatar

Par: Student1822 • 1 année de cela

avatar-seller
Bi0med
ADVANCED DATA ANALYSIS
CHAPTER 1: INTRODUCTION 18/02

1.1 BIG DATA
Data for which conventional computer techniques are not sufficient anymore due to size,
complexity.. It is a disruptive trend in computer sciences. Characterised by:

1. Volume 2. Velocity 3. Variety 4. Veracity

Fourth paradigm: for thousands of years we had experimental and observable science, later there
was theoretical science (Newton, formulas…). Then the raise of computer science occurred to
simulate things such as weather forecasts. Now new time of data-driven science where data is the
breeding ground of the science we are doing. You first look at what data is already out there: re-
analyse data.

1. DATA VOLUME
An unprecedented amount of information is coming towards us. For example genomics information
is huge, while the cost of sequencing genomes has gone down tremendously. Computer power is
roughly doubling every 18 months for the same price: faster computers, bigger hard drives. We have
to learn new ways to deal with this big amounts of data.

2. DATA VELOCITY
The speed at which data is generated and the speed at which we need to analyse it. If we sequence a
lot of genomes we can take our times to analyse the genomes and publish it, but some data (such as
sensors) need to be processed immediately.

Also transporting data is a bottleneck: takes too long. Often hard drives are transported, which is
more efficient than internet availability and fibers. Data can be sent from China to here by hard
drives instead of through the internet.

Dynamic molecules profiles can now also be sequenced and analysed, for example by sequencing the
immune system, which is changing constantly. This data is therefore preferably processed instantly,
to know the status of patients in real time.

3. DATA VARIETY
A lot of data in biomedical sciences is heterogenous and unstructured. Most data is based on
literature you need to read, also unstructured image data (just pixels). We estimate that 80% of the
world’s data is structured and also very diverse: DNA sequences, protein structures, gene regulation,
interactions, morphology, metabolism… This data is all heterogenous: difficult to deal with this much
diversity.

4. DATA VERACITY
= trustworthiness of data. There is a lot of uncertainty about data points. This uncertainty is not
consistent: you can’t make a standard deviation of every data point in big data because the

, uncertainty is varying. Some data points are highly certain, some are very solid, some are missing,
also bias…



1.2 WHAT IS DATA?
Data is the collection of data objects and their attributes. The object can be
patients, samples, observations,.. Attributes are properties / characteristics
of the object. This will often be represented in a tablet data format where
you have rows for objects and columns for attributes.

Attributes ≠ attribute values
- Attribute values = numbers/symbols assigned to an attribute
o E.g. attribute = eye colour, attribute values = green, blue,
brown
- Distinction between attributes & attribute values
- The same attribute can be mapped to different attribute values


DIFFERENT TYPES OF ATTRIBUTES
1. Nominal attributes: E.g. ID numbers, eye colour, zip codes
2. Ordinal attributes: E.g. rankings (e.g. 1-10), grades, height in tall/medium/short
3. Interval attributes: E.g. calendar dates, temperatures in Celsius or Fahrenheit
4. Ratio attributes: E.g. temperature in Kelvin, length, time, counts

The type of an attribute is based on the type of mathematical operations you can execute on these?
It depends on which of the following properties it possesses:
- Distinctness: =
o Two attributes equal or not Nominal Distinctness
- Order: <>
o You can order them, and it makes sense Ordinal Distinctness & order

- Addition: +-
Interval Distinctness, order & addition
o You can add or subtract values from the
attributes
Ratio All 4 properties
- Multiplication: * /
o You can multiply & divide the attributes


DISCRETE VS. CONTINUOUS ATTRIBUTES
Discrete attribute: have only a finite or countable set of values. They are often represented as
integer variables, for example zip codes, counts, or the set of words in a collection of documents.

Continuous attribute: has real numbers as attribute value. Practically, real values can only be
measured and represented using a finite number of digits but continuous attributes are typically
represented as floating-point variables. For example temperature, height, or weight.

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Bi0med. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €15,49. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

80467 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€15,49  12x  vendu
  • (1)
  Ajouter