Resume

Practical 3: Advanced Data Analysis: full summary + explanations

Name: Practical 3: Advanced Data Analysis: full summary + explanations
SKU: doc_1724545
Rating: 1.00 (1 reviews)
Author: Bi0med

1 vérifier

97 vues 12 fois vendu

Cours
Advanced Data Analysis (2052FBDBMW)

Établissement
Universiteit Antwerpen (UA)

This document includes all solutions of practical 3 data analysis, including the used code, explanations and screenshots.

[Montrer plus]

Aperçu 3 sur 19 pages

Voir l'exemple

Publié le 10 mai 2022
Nombre de pages 19
Écrit en 2021/2022
Type Resume

1 vérifier

Par: roberto777torres • 5 mois de cela

Bi0med Membre depuis 5 année 494 documents vendus

€5,98

Ajouté

Ajouter au panier Ajouter au liste de veux

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Part 1. Automation in R : loops, lists and functions

Introduction
The folder contains the 4 datasets with results of a study into the genetic causes of age-related
hearing impairment (ARHI) : connexins.txt, kcnq4.txt, oxidStress.txt and monogenic.txt.
ARHI is the gradual decline of hearing with ageing. In some persons this decline is worse
than in others, and there are reasons to believe that part of this variation between individuals
is genetic. Unlike monogenic disorders, where one single gene causes a disease phenotype,
ARHI is complex – due to multiple genes with a small effect, plus environmental risk factors.
In the ARHI study, we searched for genetic variants (single nucleotide polymorphisms, SNPs)
that were associated with the hearing phenotype. If a gene contains several SNPs showing a
strong association with the hearing phenotype, this can indicate a role of this gene in hearing
impairment.
The statistical analysis involves an association test between the phenotype and the genotype.
The phenotype is described by the Z-score. This is an age- and gender corrected score that
describes how well a person can hear, given age and gender. Good-hearing persons have a
low (negative) Z-score, whereas bad hearing persons have a high (positive) Z-score. The Z-
score is a numeric variable with an approximately normal distribution.
Each SNP has two alleles and, hence, 3 genotypes: aa, ab and bb. Association tests are
commonly performed in two ways:
1) using an ANOVA : treating the 3 genotypes as 3 completely distinct categories
(with no ordering)
2) using regression, coding the genotypes like this : aa=0 , ab=1 and bb=2. This
analysis treats the heterozygous as the intermediate between the two homozygous
genotypes. This is called testing under an additive model. :

Automation 1: Looping
Read in the dataset connexins.txt. This dataset contains genotyping results from SNPs within
several genes of the connexin family (gap junction proteins expressed in the inner ear.).

First: setwd(“/”)
inputData <- read.table("connexins.txt", header=TRUE, sep="\t",
na.strings=c("0",""),stringsAsFactors=TRUE)

The latter argument specifies that missing values are indicated by either an empty field ("") or
a zero. (Yes, R allows more than one missing value indicator).
How does the dataset look like?

str(inputData)

1

,The first column is the subject-identifier (ID). The Zscore is the numeric phenotype. The
remainder of the columns represent the genotypes of the SNPs of interest. They have been
read in as factors (=categorical variables), with 3 levels.

When we test the first SNP (Cx26_SNP1) for association with the phenotype (Zscore), we can
use the ANOVA or the regression approach. Remember that both are carried out by the lm
function in R, and the type of X-variable determines whether an ANOVA or regression is
performed:

For ANOVA, with genotype categorical :

modelANOVA<-lm(Zscore~Cx26_SNP1,data=inputData)
summary(modelANOVA)

p.anova <- anova(modelANOVA)[1,5]

2

, The last command retrieves the overall p-value. If we want the regression approach, we need
to convert the genotype to a number (1,2,3). In R, we can automatically change the categorical
genotypes aa, ab, bb to numbers using the co-ercion formula as.numeric(). The lm function
will then carry out regression:

modelRegr<-lm(Zscore~as.numeric(Cx26_SNP1),data=inputData)
summary(modelRegr)

p.regres <- anova(modelRegr)[1,5]

This gives us the association test for the first SNP. However, the study involved genotyping
several SNPs – nowadays such studies involve up to a million SNPs. We therefore need to
automate the analysis.

If we want to run the above code for all SNPs, the only argument that changes is the
independent variable (after the tilde). The Z-score and the rest of the code will always be the
same. First we create a dataframe with only the SNPs. That is, the original with the first two
columns removed.

allSNPs<-inputData[,-c(1,2)]

In the allSNPs dataframe, the ith column contains the genotype from the ith SNP. How many
SNPs are these…? The dim() function gives us the dimensions of the dataframe : the number
of rows and columns. The SNP names can be retrieved using the names() function.

n.columns<-dim(inputData)[2]
n.snps<-n.columns-2
SNPnames<-names(allSNPs)

3

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Bi0med. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €5,98. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

80467 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!

Populaire universiteiten

Populaire hogescholen

Populaire studieboeken voor Communicatie en Taal

Populaire studieboeken voor Economie en Bedrijf

Populaire studieboeken voor Exact en Informatica

Populaire studieboeken voor Gedrag en Maatschappij

Populaire studieboeken voor Gezondheid en Geneeskunde

Populaire studieboeken voor Recht en Bestuur

Resume

Practical 3: Advanced Data Analysis: full summary + explanations

Infos sur le Document

Sujets

École, étude et sujet

1 vérifier

Vendeur

Avis reçus

Aperçu du contenu

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

L’achat facile et rapide

Focus sur l’essentiel

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Garantie de remboursement : comment ça marche ?

Auprès de qui est-ce que j'achète ce résumé ?

Est-ce que j'aurai un abonnement?

Peut-on faire confiance à Stuvia ?