Moleculargenetics I

 Please only use for repetition. If you want to use the questions on your smartphone a file having will be upoloaded

  • Q: The question
  • A: The answer
  • C: The topic

And here a pre-formatted file for AnyMemo for Android. The latter program is GPLed so it should always be free for you and you can check the source code. Please note that I didn't check it much apart from - the file loads the first few answers seem to be ok.

File for AnyMemo

NEW File in tab format




Using the questions you should be able to revise MOST of the knowledge.


(Der Quizz beinhaltet den allergrößten Teil des geprüften Sachwissens . Seien Sie aber auch auf Techniken und machen Sie sich Gedanken zu dem Zusammenhang.


Lecture 1 Introduction

How can the genome size of an organism in MB be estimated using a simple experiment?


This can be achieved via the c-Value 1/47



Are there genomes larger than the human genome?


There are many, even in relatively "simple" organisms 1/53



If you compare our ability to sequence cheaply how do you think that will affect (bio) informatics?


Sequencing will become (is) cheaper than storage so computer science has to be pushed and find new solutions. (Often a suggestion is throw the data away but the cost for the experiment is not taken into account)e.g. 1/24



Lecture 2 Gene Structure

Name at least four different RNA species produced in the eukaryotic cell?


mRNA, rRNA, tRNA, snRNA, miRNA, ... 2/4



How does an RNA polymerase "know" which strand to read from?


this depends on the promoter (more detailed answer needed!) 2/9



Describe a very basal sigma70 E.coli promoter and name at least one consensus sequence?


-35 and -10 element 35 and 10 bases upstream of promoter. TTGACA TATAAT Some might have an additional UP element 2/10



Explain a sequence logo based on frequency?


See picture, you would rather see a figure and would have to explain it 2/15



Understand a sequence logo based on information content?


See picture, you would rather see a figure and get an idea on how to interpret it. It is enough to have a rough idea about pseudocounts 2/15



Which fraction of the eukaryotic genes seem to contain a TATA box?


only about a third, but almost all might have a TATA or TATA like box 2/21



Depict an eukaryotic promoter with three typical upstream elements without consensus sequences?


see figure 2/24



What does TBP (as part of TFIID) stand for?


TATA binding protein 2/25



Explain the sequence for eukaryotic gene transcription including TFII general TFs?


see 2/28ff



Do you find more information about the sequence in the major or minor grove of the DNA?


in the major grove



Explain a gel retardation assay?


Fragments are incubated with and without nuclear proteins that might bind. In the fraction with potential binding protein a band is “retarded” moved to apparently higher size 2/39



Explain a DNA footprinting assay?


DNA fragments are treated with and without DNA binding proteins; afterwards the fragments are digested with DNAse, binding proteins protect : a footprint is visible on the gel 2/40



You have found a likely promoter fragment, how would you identify a binding protein?


Explain either 2/47 affinity purification, 2/47 clone based and/or 2/48 Y1H



How can you define the binding site of a transcription factor?


Explain either SELEX 2/41 and/or PBM 2/42



How can you find (alternative) splicing events using only in-silico public data?


ESTs can be mapped onto the genome and splice sites be found 2/58f



Describe what ESTs are?


See figure 2/59 for an overview



Draw a scheme of how an (eukaryotic) gene might look like and what we can identify?


Figure at the end of the lecture: Cis Elements, core cis Elements, Introns, alternative Exons, Exons, splice modifiers e.g. ISE



Lecture 3A Measuring Transcripts

If you want to measure the amount of RNA by means of q-RT-PCR what is an important first step?


The RNA needs to be reverse transcribed into cDNA first, e.g. using an oligo dT primer 3A/2-3



How can classical PCR be used to get an idea about DNA amount?


For each sample a “constitutive” product is monitored as well. The PCR reaction is Stopped after various cycle numbers (or just at one cycle number) or competitive PCR see question 3A/6



What is the problem with the reference gene approach for determining RNA amount?


The reference gene is used to standardize the amount of RNA, this would assume that the expression of the reference gene is constant under the assayed conditions, which might not always be the case 3A/6-7



Describe competitive PCR?


A standard which is very similar to the target (e.g. some different length but using the same primers!) is added in known amounts to the reaction mixture, where band intensities are equal one can determine the amount 3A/8



Describe the Sybergreen principle


Sybergreen (like Etidiumbromide) intercalates in double stranded DNA, which is produced by PCR, the more dsDNA is made, the higher the absorbance. The reaction mix is almost like normal PCR+ Sybergreen 3A/10



Describe the TaqMan principle


Probes which hybridize to ssDNA are added to the PCR mix, these have two dyes, one absorber and a quencher, which quenches the signal by resonance transfer if both probes are close. Once polymerase comes by the probe is depolymerized thus the probes are far from each other => signal increases 3A/9-10



What is the CT value in real time qRT-PCR?


The number of cycles that have been run when a a certain Fluorescence threshold has been reached 3A/9



Explain the Delta CT method for real time qRT-PCR data evaluation?


The Ct values of gene of interest and a "standard" are simple subtracted 3A/13



Name at least 6 golden rules for real time qRT-PCR?


At least three BIOLOGICAL reps, High quality RNA, Digest RNA with DNAse I to remove genomic DNA, Use a good reverse transcriptase with no RNase H activity, Test cDNA, Design good primers, Standardize and use a master mix, Test at least 4 potential reference genes, Perform real time PCR on test and reference genes at the same time (and use melting curves), One should determine the best reference genes, Calculate transcript abundance using efficiency 3A/18



Explain the workflow for a two color microarray?


See figure 3A/26



Explain how Affymetrix microarrays are made (specifically the photolitographic process)?


wafer is silanated-> start,Linker is added, UV light deprotects linker but not where a mask is shielding UV light, nucleotide with protection is added, UV light with shielding and repeat, finally capping reagent is added 3A/30-33



What are PM and MM probes on the Affymetrix chip?


perfect match and mismatch probes, the PM are complementary the MM have a mismatch in the 13th of 25 nucleotides 3A/35,36



Why is array normalization necessary?


This is necessary in order to compare arrays and thus samples, as one might have different amount of RNA extracted, a different laser intensity, different washing conditions etc.



Which major assumption underlies most array normalization procedures?


One big assumption is that not too many genes are changing, or in other words most genes stay more or less the same 3A/44



How can one use the different probes on an Affymetrix array determine RNA quality?


One knows where the probes are on a gene and can order them 5' to 3' Synthesis often starts from the 3' poly A tail so one can see how the signal changes from 3' to 5' 3A/56



Lecture 3B QSequencing

Describe the principle of Sanger sequencing?


dideoxyy method, uses ddNTPs flourescently labelled, extension stops, mix is run on gel in CE for a proper answer see 3B/8-15



What do you see in a sequence trace?


The intensities for the different dyes (translates to TCGA) at each position, from the peaks the sequence can be read 3B/15



What is a Phred score and is it linear?


A base quality score, no it is not linear, 10 units decrease the error rate by 10fold 3B/15



Describe the principle of Pyrosequencing?


Nucleotides are flowed in one after the other, upon incorporation PPi is released which is converted by Sulfurylase to ATP which is used by luciferase to produce light. Apyrase uses up unused nucleotides for a proper answer see 3B/17-25



What causes a problem for precise Pyrosequencing (i.e. where would you likely find sequencing errors)?


Homopolymers of the same Base, while the produced light gets more intense with more PPi released, it is difficult to distinguish longer stretches 3B/16-27



Which deoxynucleotides are used in Pyrosequencing and why?


dGTP, dTDP, dCTP, dATP-alpha S. The latter is used as it is not recognized by luciferase instead of ATP but can be used by the polymerase 3B/27



Describe the principle behind the Ion torrent sequencing?


Nucleotides are flowed in in a sequence, as they become incorporated H+ is released this is detected by sensors 3B/36



What causes a problem for precise Ion Torrent sequencing (i.e. where would you likely find sequencing errors)?


Homopolymers of the same base, while the produced H+ gets more intense with more bases incorporated in each cycle, it is difficult to distinguish longer stretches for a proper answer see 3B/35



Describe the principle behind Illumina sequencing (w.o. bridge amplification)?


Nucleotides are blocked and fluorescently labeled, all are flowed in at the same time, the cell is imaged, the block and label removed and a next cycle commences 3B/42



Name two ways to assemble a genome and describe them in briefly ?


Overlay/Layout/Consensus 3B/49ff and kmer/de Bruijn based graph 3B/55ff



Show how the assembly of these two reads would proceed in a de Bruijn based assembler if the size of the used piece were 4 and 3 : ATCGAG, TATAAT, GAGTAT ?


only 4 shown, as this is easy to display ATCG=>TCGA=>CGAG TATA=>ATAA =>TAAT GAGT=>AGTA=>GTAT, three generates one component in the graph only



What is scaffolding; what is it good for and how is it often done?


Bringing contigs together, i.e. joining these by e.g. pieces of reads originating from two sides of a stretch of DNA where the middle is unknown, the unknown part is padded with Ns, this is done to get fewer and large pieces 3b/58ff



Lecture 3C Measuring Transcripts

What is GO and how can it be used to understand array data?


This is the Gene ontology, an ontology to classify genes. One can use it to look for groups of genes which are commonly up or down regulated 3C



What is a controlled vocabulary and why is it useful?


A controlled vocabulary is giving only a certain set of "controlled" words that should be used to describe something. As these are controlled there is no ambiguitiy and we know what we speak of. 3C



What is an ontology and how is it useful?


basically as summary of lecture 3C



Be prepared to interpret a tree?


be mindful about the topology 3C/18



Be careful 3C doesn't bring in a lot of things but can be used to make sense of data.


Be careful 3C doesn't bring in a lot of things but can be used to make sense of data.



How can correlation be used to learn something about genes?


If genes are correlated across a large number of arrays, the MIGHT have a similar function 3C/25



Lecture 4 Transposons

Be prepared to interpret Figure 4/2 showing a C0T graph?


Be prepared to interpret Figure 4/2 showing a C0T graph?



What are transposable elements (transposons)?


E.g. Transposable elements (transposons) are fragments of DNA that can move around in the genome and insert at target sites 4/3



If you looked at Drosophila and compared it to mammals, do you think a lot of its genome is derived from transposons?


No 4/3



Do you think a lot of new mutations in Drosophila are caused by mutations?


Yes indeed 4/3



What are Class II/Class I elements?


Elements that replicate via DNA (II DNA transposons) or RNA (I, or Retrotransposons) intermediates. 4/4



Give at least 3 examples of large DNA transposon families having terminal inverted repeats (the "normal" case) ?


Mutator, Merlin, PiggyBac, P, Transib, PIF-Harbinger, hAT, TC1-Mariner, ... 4/9



Explain how terminal inverted repeated Class II transposons propagated (the molecular biology) a scheme would be helpful ?


See 4/12ff



When one looks at moving transposons of Class II, one can sometimes observe a footprint, what is that ?


After insertion of a DNA transposon the target site is duplicated, after excision this duplication remains 4/14



When one looks at moving transposons of Class II, one can sometimes observe a footprint, what is that ?


After insertion of a DNA transposon the target site is duplicated, after excision this duplication remains 4/15



When one looks at moving transposons of Class II, what is a precise excision ?


After excision of a transposon, homology directed repair of a transposon free site, restores the state before the transposon inserted, i.e no footprint 4/14



What are helitrons ?


Helitrons are Class II transpons, that seem to replicate via a rolling circle mechanism. In any case they are characterized by the RepHel protein 4/23ff



What is maverick?


An odd transposon potentially related to DNA viruses 4/30



Are transposons only bad for the host?


No some transposons have been domesticated, where the DNA binding domain is often reused (also they might contribute to plasticity of the genome) 4/33ff



When a transposon is domesticated which function is often used?


The DNA binding function 4/33ff



Give three examples of LTR family transposons and give the class (I or II) they belong to?


Class I: Copia, Gypsy, Bel-Pao, Retrovirus, ERV 4/43



Draw a scheme how non LTR elements might replicate?


See 4/49



Lecture 5 small RNAs

What kind of small RNAs exist in bacteria?


Riboswitches, protein binding sRNAs,base-pairing sRNAs (cis-, trans-encoded); CRISPR RNAs



Explain the difference of cis- and trans-encoded base-pairing sRNAs.


cis-encoded: located on the antisense strand of the target RNA, perfect complementarity to target mRNA, act negatively (mRNA degradation, Transcription termination)



Draw a scheme of a CRISPR array.


See 5/9 , important features: DNA repeats, Unique spacers, CAS genes



How do CRISPR RNAs provide resistance against bacteriophages?


During phage infection, new spacers corresponding to sequences of this phage are integrated into existing CSIPR arrays.



What is the difference of miRNAs and siRNAs?


miRNAs: encoded in the genome by specific MIR genes; regulate developmental and physiological events; two Cleavage processes (Drosha, Dicer)



Draw a scheme of the generation of miRNAs.


See 5/20



What proteins are necessary for the generation of miRNA (with function)?


Drosha (cleaves pri(mary)-miRNA to pre-miRNA), Dicer (cleaves pre-miRNA to miRNA duplex), RISC complex (removal of one strand of miRNA, binding of complementary target mRNA, degradation of target mRNA)



What is the structure and function of ARGONAUTE proteins?


Structure: PAZ domain for recognition of 3‘ end of miRNA; RNAse domain for cleavage of target RNA



What are the differences between animal and plant miRNAs (4 examples)?


See 5/29



How does viral induced gene silencing work in plants?


See 5/36



Which plant-specific proteins are essential for RNA-directed DNA methylation and what is their role?


RNA-Polymerase IV (production of siRNA) and RNA-Polymerase V (Recruitment of ARGONAUTE to DNA)



What are piRNAs and what is their function?


PIWI-interacting RNAs; Transposon control in germ cells and stem cells in animals



Describe two applications of small RNA technique.


See 5/53-55



Lecture 6 Proteomics/Interactomics

Why does it make sense to study protein accumulation?


posttranscriptional regulation, posttranslational modification, different splicing and polyA isoforms 6/3



When you compare data from proteomics and transcriptomics what can you say ?


Usually these do not complete coincide (but it is not always as bad as in 6/4)



What is DIGE, how does it work and where is it used?


Differential gel electrophoresis, it is used to compare protein amounts. Proteins are labelled by a fluorescent label and run on a 2D Gel 6/7



If you were to inject a protein mixture into an MS would you likely get many meaningful results out and why ?


No, the mixture would be too complex there would be overlapping Signals Due to Multiple Charged States & overlapping Signals Due to Natural Isotope Abundance 6/9



What is MALDI?


Matrix assisted Laser Desorption ionization. Analyte embedded in matrix a laser pulse (right wavelength) is given which ionizes matrix+analytes 6/14



Which ion(s) will you always see in MALDI?


The ions originating from the matrix 6/14



Explain ESI?


Electrospray Ionization details from 6/15 needed a drawing will help



Lecture 8 Proteomics/Interactomics

Explain TOF and quadrupoles very roughly?


Just some principle needed, e.g. quadropole has four rods, and selects based on m/z ratio ....



Lecture 6 Proteomics/Interactomics

How would you obtain an MS/MS spectrum on a triple quadrupole machine?


Quad 1 can be used to select an ion, Quad 2 is used as a collision cell, in quad 3 resulting fragment ions are scanned for 6/23



What would you often need for an MS^n and where can this be useful?


An ion trap based instrument useful in elucidating structure especially if MS/MS is not too meaningful 6/27



Why does one often need the genome before the proteome?


Several algorithms don't sequence peptides based on the ion, but compare spectrum obtained to simulated spectra 6/31ff



Explain the SILAC workflow?


see 6/41 ff



How could you directly compare two conditions in proteomics (name at least 2)?


SILAC, iTRAQ, ICAT or digestion with H2 18O



Explain a Mass Western?


For a protein a heavy,stable ion labelled peptide is ordered and this is used to tune the instrument, afterwards the peptide of interest is injected with the synthetic one and quantified 6/56



Lecture 7 modern Genetics

Explain how a genetic map is constructed?


genetic markers are testes for linkage, linkage can be expressed in cM to relate markers to each other. More detail from 7/7ff needs to be given



What is an RFLP marker and how is it usually detected?


Restriction Fragment length Polymorphism, a SNP causes the creation or deletion of a restriction site in the DNA. DNA can be cut with the restriction enzyme run on a gel, blotted and interrogated with a (radioactive) probe hybridizing to that stretch of DNA 7/23



What is an SNP marker and how would you detect it?


Single nucleotide polymorphism. E.g. using Illumina Golden gate (explain the assay!) 7/28ff



What is a pyhsical map?


Where markers are assigned to physical location on the chromosome 7/31



Explain restriction mapping and be prepared to interpret the results (constructing a physical map)?


Cut with different restriction enzymes either alone or in combination, determine the “restriction map” by combinatoric analysis 7/33



How do you separate really large chunks of DNA?


pulsed field gel electrophoresis 7/37



Explain optical mapping?


DNA is stained and immobilized (kind of stretched) restriction enzymes added, the DNA shrinks a bit, gaps become visible these are visualized 7/41



Explain STS tagging?


See 7/47



What is GWAS?


Genome wide association mapping. In a population phenotypes are correlated to markers.... 7



Where can GWAS be used?


Usually relatively common allels with low penetrance/effect size 7/63



What is linkage disequelibrium?


LD is the “nonrandom association of alleles at different loci.” 7/67



What can be a pitfall when you study a large panel with GWAS?


Population structure 7/72ff



When you visualize genetic marker structure from a diverse geographic region by PCA, what can you sometimes observe?


That the structure shown reflects the geographic location somewhat (TRICKY BONUS: Explain why this might be the case) 7/75



Do you think GWAS is successful?


Based on the number of publications, it probably is 7/53



Lecture 8 Metabolomics

Why is metabolomics important?


one gets an information about the cellular components, potentially their physiological state etc



If you compare the typical ionization in GC-MS for metabolomics to LC-MS what is the major difference?


GC-MS uses "hard" ionization LC-MS a soft one, the latter tries to keep the molecule intact.



Explain EI ionization and where is it often used?


Electron impact, the molecule is bombarded with an electron and often "falls apart". In a GC-MS system one often finds EI 9/5



Explain EI ionization and where is it often used?


Electron impact, the molecule is bombarded with an electron and often "falls apart". In a GC-MS system one often finds EI 9/5

Basic Genetics


How do you explain a quantitative trait by simple Mendelian gentics?

Multiple loci involved

What is Heritability genetically

proportion of total variance within a population attributable to genetic factors