I wrote these notes for the course Big Data in Biomedical Sciences. I tried to write all the important information. Everywhere you read (EXAM QUESTION), this subject literally came back in my exam. I got an 8.1, I hope it helps others too!
BIG DATA NOTES
GENETICS/ GWAS
Issues:
1. Relative influence of genes and environment still not resolved (need more reliable estimates).
MZ twins share 100% genes, 100% shared and 0% non-shared environment.
DZ twins share 50% genes, 100% shared and 0% non-shared environment. (EXAM QUESTION)
100% heritable? Then Rmz = 2Rdz.
2. Nature of genes also still under debate (additive vs non-additive).
3. Same for determination of causal mechanism (detection and interpretation).
Heritability is the proportion of trait variance attributable to genetic variance (the extent to which
observed individual differences can be traced back to genetic differences).
DNA facts:
- Each cell contains 23 sets of chromosomes, carrying the heritable blueprint of life.
- Each single chromosome is a DNA molecule.
- A DNA molecule consists of sequences of nucleotides (ACGT).
- The 23 chromosomes together contain ~3 x 10 9 nucleotides, ‘completely’ sequenced (in 2002).
- A sequence of 3 bases is a codon and particular sequences serve as a recipe for an amino-acid.
- Multiple codons together enclosed in transcription start and end sites are called genes and
provide blueprints for proteins.
- One chromosome consists of both non-genic (~90%) and genic regions (~10%).
- Humans have in total 22,000-24,000 genes. (EXAM QUESTION)
- Not every gene is expressed in every cell.
- The specific set of genes that is expressed in a cell determines the cell type.
Human share: 87.5% DNA with mice, 99% with chimp and 99.9% with humans. (EXAM QUESTION)
Genetic variations can be: harmless, harmful, latent or silent. Causes (EXAM QUESTION):
Mutation- level of base pairs (occurs by accident e.g. when DNA is replicated).
o Monogenic disorders: influenced by one gene, most genetic causes already known.
o Polygenic disorders: influenced by multiple genes, causes mostly unknown, often
complex (G + E).
Recombination- level of parts of the chromosome (crossing over).
Segregation- level of combination of chromosomes.
, Candidate gene study: focus on very small sub set of genes.
GWAS: in every single nucleotide in the genome (1 million SNPs). Microarrays can now contain more
than 1 million tagging SNPs covering the genome in high density.
Advantages:
- May identify several possible loci as spans whole genome.
- Relationships between loci may identify new biological pathways.
- Results from multiple studies can be integrated, aiding the prioritization of genes for replication
and increasing statistical confidence.
Disadvantages:
- Increased likelihood of false positives because you do multiple testing.
- Population stratification.
- Large number of samples needed.
- Vast amounts of data analysed (need cluster computers) and produced.
Every point above the threshold of a Manhattan plot is evidence for association. Every dot in this graph
represents the outcome one single test for allele frequency difference of one variant per trait.
Majority of human complex traits probably caused by thousands of genes of very small effect. Huge
sample sizes needed. GWAS have only detected a fraction of genetic variance (<2%).
4 issues GWAS:
- GWAS hits for polygenic traits mostly outside genes, or in non-coding genic regions, with likely
regulatory functions that are currently unknown.
- GWAS hits for polygenic traits have small effects.
- SNPs are correlated which complicates pinpointing the causal SNP.
- There are 100’s of genes involved in polygenic traits – a single gene will not provide the whole
picture.
Functional categories of SNPs
Protein coding
o SNPs in exonic regions may alter protein structure and/or function e.g. nonsense SNPs
or missense SNPs.
Splicing regulation
o SNPs in splice sites may disrupt splicing regulation, resulting in exon skipping or intron
retention.
o They can also interfere with alternative splicing regulation by changing exonic splicing
enhancers or silencers.
Transcriptional regulation
o SNPs in transcription regulatory regions can alter binding sites, and thus disrupt proper
gene regulation.
Post-translational modification
o SNPs in protein-coding regions may alter post-translational modification sites.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller arzuburak. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.39. You're not tied to anything after your purchase.