I wrote these notes for the course Big Data in Biomedical Sciences. I tried to write all the important information. Everywhere you read (EXAM QUESTION), this subject literally came back in my exam. I got an 8.1, I hope it helps others too!
BIG DATA NOTES
GENETICS/ GWAS
Issues:
1. Relative influence of genes and environment still not resolved (need more reliable estimates).
MZ twins share 100% genes, 100% shared and 0% non-shared environment.
DZ twins share 50% genes, 100% shared and 0% non-shared environment. (EXAM QUESTION)
100% heritable? Then Rmz = 2Rdz.
2. Nature of genes also still under debate (additive vs non-additive).
3. Same for determination of causal mechanism (detection and interpretation).
Heritability is the proportion of trait variance attributable to genetic variance (the extent to which
observed individual differences can be traced back to genetic differences).
DNA facts:
- Each cell contains 23 sets of chromosomes, carrying the heritable blueprint of life.
- Each single chromosome is a DNA molecule.
- A DNA molecule consists of sequences of nucleotides (ACGT).
- The 23 chromosomes together contain ~3 x 10 9 nucleotides, ‘completely’ sequenced (in 2002).
- A sequence of 3 bases is a codon and particular sequences serve as a recipe for an amino-acid.
- Multiple codons together enclosed in transcription start and end sites are called genes and
provide blueprints for proteins.
- One chromosome consists of both non-genic (~90%) and genic regions (~10%).
- Humans have in total 22,000-24,000 genes. (EXAM QUESTION)
- Not every gene is expressed in every cell.
- The specific set of genes that is expressed in a cell determines the cell type.
Human share: 87.5% DNA with mice, 99% with chimp and 99.9% with humans. (EXAM QUESTION)
Genetic variations can be: harmless, harmful, latent or silent. Causes (EXAM QUESTION):
Mutation- level of base pairs (occurs by accident e.g. when DNA is replicated).
o Monogenic disorders: influenced by one gene, most genetic causes already known.
o Polygenic disorders: influenced by multiple genes, causes mostly unknown, often
complex (G + E).
Recombination- level of parts of the chromosome (crossing over).
Segregation- level of combination of chromosomes.
, Candidate gene study: focus on very small sub set of genes.
GWAS: in every single nucleotide in the genome (1 million SNPs). Microarrays can now contain more
than 1 million tagging SNPs covering the genome in high density.
Advantages:
- May identify several possible loci as spans whole genome.
- Relationships between loci may identify new biological pathways.
- Results from multiple studies can be integrated, aiding the prioritization of genes for replication
and increasing statistical confidence.
Disadvantages:
- Increased likelihood of false positives because you do multiple testing.
- Population stratification.
- Large number of samples needed.
- Vast amounts of data analysed (need cluster computers) and produced.
Every point above the threshold of a Manhattan plot is evidence for association. Every dot in this graph
represents the outcome one single test for allele frequency difference of one variant per trait.
Majority of human complex traits probably caused by thousands of genes of very small effect. Huge
sample sizes needed. GWAS have only detected a fraction of genetic variance (<2%).
4 issues GWAS:
- GWAS hits for polygenic traits mostly outside genes, or in non-coding genic regions, with likely
regulatory functions that are currently unknown.
- GWAS hits for polygenic traits have small effects.
- SNPs are correlated which complicates pinpointing the causal SNP.
- There are 100’s of genes involved in polygenic traits – a single gene will not provide the whole
picture.
Functional categories of SNPs
Protein coding
o SNPs in exonic regions may alter protein structure and/or function e.g. nonsense SNPs
or missense SNPs.
Splicing regulation
o SNPs in splice sites may disrupt splicing regulation, resulting in exon skipping or intron
retention.
o They can also interfere with alternative splicing regulation by changing exonic splicing
enhancers or silencers.
Transcriptional regulation
o SNPs in transcription regulatory regions can alter binding sites, and thus disrupt proper
gene regulation.
Post-translational modification
o SNPs in protein-coding regions may alter post-translational modification sites.
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur arzuburak. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €10,49. Vous n'êtes lié à rien après votre achat.