BioNotes.Org

ML, vim, biology, math, and more

Notes on MIT 7.00x at edX

Course started: July 14, 2020

Different versions of Intro Bio course at MIT

There is a special version of the MIT intro bio course that was optimized for edX called 7.00x, Introduction to Biology - The Secret of Life taught by Eric Lander. Current lectures recorded in 2012-13 with updates circa 2015.

Separately, 7.01 Fundamentals of Biology was designed specifically for OpenCourseWare. It draws upon material developed for the three versions of MIT’s Introductory Biology classes known as 7.012, 7.013, and 7.014. All three classes cover the same core material, which includes the fundamental principles of biochemistry, genetics, molecular biology, and cell biology. Each version also has lectures on different special topics.

According to this page, these versions of 7.01 were recorded in the following years:

  • Fall, 2004 - 7.012 with Professors Eric Lander and Robert Weinberg, focuses on genomics, neurobiology, and cancer cell biology.
  • Spring, 2005 - 7.014, with Professors Graham Walker and Penny Chisholm, focuses on ecology and environment.
  • 7.013 with Professors Hazel Sive and Tyler Jacks, focuses on development and cancer cell biology has been recorded twice:

Species count as of 2014 (Sadava, p. 8)

Taxon Currently described species Estimated number of living species
Bacteria 10,000 Millions
Archea 300 1k - 1M
Protists 80,000 500k - 1M
Plants 270,000† 400-500k
Fungi 100,000 1-2M
Animals 1,300,000 10M - 100M

The 2018 Earth Biogenome Project paper cites 391,000 plant species as of 2016 (Lewin et al., p. 4326) per this report.


Lecture 1: Introduction and the Biochemistry of Life

July 14 - 20, 2020

  • Lectures by Eric Lander were likely recorded in the 2012 - 2013 academic year.
    • Lander makes a reference to this NYT Sunday Magazine article on pain which had just come out the “last weekend”. He also discusses this drug for cystic fibrosis.
    • Most interestingly, Lander referred to a seminar he’d attended at the Broad Institute earlier that very morning. At that meeting, they discussed a paper that had been published four weeks earlier; probably the seminal 2012 CRISPR/Cas9 paper by Jinek, Chylinski, Fonfara, Hauer, Doudna, and Charpentier.
    • Of course, Feng Zhang’s lab published this 2013 Science paper applying CRISPR-Cas9 to human and mouse cells.
    • In lecture 18 on the Human Genome Project, 13:00 into the clip “The Human Genome Project: Genome assembly and analysis”, Lander says that the 10th anniversary of the final draft of the HGP was coming up soon. Which means that lecture 18 was recorded sometime just before April 25, 2013.
  • Timeline
    • 4.5 billion years ago: Earth formed
    • 4.0 B ya: Earth cooled enough for life
    • 3.7 B ya: first evidence of prokaryotic life
    • 2.0 - 1.5 B ya: first eukaryotic life with nucleus
    • 1.6 - 1.4 B ya: fungi split off from other kingdoms of life. According to The Ancestor’s Tale by Dawkins and Wong, this occured around 1.2 Bya
    • When did metazoans emerge?
      • According to Lander’s lecture, multicellular life appeared ~500 mya just prior to Cambrian Explosion
      • Per Erwin 2015, molecular clock studies estimate multicellular animals emerged 750-800 mya
      • The same Erwin 2015 paper lists the fossil evidence which is closer to Landers’ estimate. The Doushantuo embryos from ~600 mya are the earliest animal fossils we have. Next are the Australian Ediacaran fossils (580 mya). The earliest undisputed bilaterians are from 542 mya.
    • 5 mya: first hominids aka Australopithecina or Hominina subtribe within the Hominidae Family.
    • 100,000 ya: first homo sapiens
    • 1861 AD: MIT founded
  • Fundamental principles and intellectual unification of biology
    • Biochemistry in ~1900: purification and analysis of unique proteins, nucleic acids, etc.
    • Genetics in ~1900: Organism minus one component, aka mutant. This component which they called a gene.
    • 1953: structure of DNA discovered by Watson & Crick, combining the two streams of biochem and genetics into the new unified science of molecular biology.
    • 1970s: Invention of recombinant DNA to isolate specific gene, read what protein it encodes. Take a protein and decide which gene codes for it.
    • 1980s: How do we map the genes that map to specific disease? Rather than studying single genes in isolation, was it possible to study all the genes for an organism at once? Yes, the beginning of genomics.

Lecture 2: Biochemistry

July 21 - 28, 2020

  • Eduard Buchner and the founding of biochemistry.
    • During a series of experiments from 1857-1877, Louis Pasteur proved that microorganisms were involved in fermentation (conversion of juice into CO2 and alcohol). This seemed to be a defense of vitalism.
    • However, in 1897, Buchner showed that if one took a mass of yeast and crushed all the living cells to death, one could extract dead “yeast juice” that could still perform fermentation. This showed that a nonliving enzyme (zymase) produced by yeast was responsible for fermentation.
  • Composition of cells; covalent bonds
    • Atomic composition of human life: Consists largely of 63% H, 24% O, 10% C, 14% N, 0.2% phosphorus, and less than 0.1% sulfur.
    • Molecular composition of human life measured by dry weight: 80% H2O. The remaining 20% can be subdivided into:
      • 50% protein
      • 15% carbohydrate
      • 15% nucleic acids
      • 10% lipids and fats
    • Covalent bonds for 6 most important elements: C,H,O and N, P, Sulfur. Number of bonds formed for an octet satisifed atom under biological conditions:
      • Carbon = 4
      • Hydrogen = 1
      • Oxygen = 2
      • Nitrogen = 3 or 4
      • Phosphorus = 5
      • Sulfur = sometimes 2 but up to 6 bonds
  • Noncovalent bonds, including hydrogen bonds (5 kcal/mol), ionic bonds, Van Der Waals forces
  • Lipids and phospholipids
    • Oil and water don’t mix b/c the polar water gets “offended” by the nonpolar interlopers interrupting the “desired” hydrogen bonding between water molecules
    • A nonpolar hydrocarbon (like cyclohexane C6H14) can be turned polar by converting them to:
      • Alcohols have a hydroxl -OH group attached.
      • Carboxylic Acids contain one or more -COOH groups.
    • Fatty acids are a type of carboxylic acid. If you attach 3 of them to an alcolhol like glycerol, we create a “Tri-acyl-glyceride”, aka triglyceride. This is a lipid
    • Lipids are amphipathic with a hydrophlic head and hydrophobic tail.
    • When placed in a polar solvent like water, lipids spontaneously form globular structures like micelles (monolayer with hydrophilic heads out, tails in) or liposomes (bilayered). See also this page
  • ATP - adenosine triphosophate
  • Carbohydrates
    • glucose = monosaccharide. Can form disaccharides, polysaccharides, etc.
  • Remember, here are the types of forces that Lander emphasizes:
    1. Ionic Bonds
    2. Covalent Bonds (polar like O-H, N-H, C=O, vs. nonpolar like C-H, C-C, C=C, etc.)
    3. Hydrogen Bonds (O—-H—O, N—-H—O , etc.)
    4. Hydrophobic forces that force hydrophobic molecules together, so that the liquid H2O molecules around them can form hydrogen bonds easily “as desired”.
    5. Van Der Waals Forces which are temporary charges that can add up to a lot, e.g, geckos use these to climb walls

Lecture 3: Proteins

July 25 - 31, 2020

  • Relatively constant planes in peptide with pivot points at the alpha carbons aka the carbons that the -R sidechains hang off of. See also Alberts p. 127
  • Subcategories of the 20 amino acids
    • Nonpolar and Noncharged (hydrophobic): Alanine, Valine, Methionine (with a sulfur), Leucine, Isoleucine, Phenylalanine, Tyrosine, Tryptophan
    • Polar and Uncharged: Serine, Threonine, Aspargine, Glutamine
    • Polar, negatively charged: Aspartic acid, glutamic acid
    • Polar, positively charged: lysine, arginine, histidine

    • Special:
      • Glycine very flexible, simple CH3
      • Proline (ring that bonds with the amino group)– technically an imino acid, not an amino acid.
      • Cysteine with sulfur at the end. Can form disulfide bond.
    • Secondary structure
      • Alpha-helix. 3.7 amino acids per 360-degree turn of the alpha helix.
      • Beta-sheets
      • All other is called “loops” or “random coils”
    • Tertiary structure and quartenary structure is determined in part by the following types of interactions between sidechain groups:
      • Disulfide bonds (esp. cysteine side chain)
      • Ionic bonds between side chains
      • Hydrogen bonds between side chains
      • Van der Waals forces between side chains
    • X-Ray Protein Structure supplemental lecture by x-ray crystallographer Brian, a scientist at MIT:
      • Metaphor of throwing basketballs and ping pong balls, first at a single parked car. Then, dropping many of these balls onto an entire parking lot full of regularly spaced parked cars.
      • To prepare them for scanning, proteins are first purified and then crystallized so they are in a regular repeating pattern (like in an orderly parking lot)
      • He uses the example of lysozym, an enzyme used to break down bacterial walls

Lecture 4: Enzymes - Channel proteins, β-barrels, TIM, flu burglar

August 1, 2020

  • Channel proteins
    • Design guideline: allow molecules of about 600 molecular weight, polar but no net charge (either positive or negative)
    • Solution: OmpF channel protein. (No wiki article available but see related OmpA article.)
      • “residue” = synonym for side chain
      • has charged residues on the inside to repel potential passengers who are positively charged or negatively charged. The fact that these residues are polar will also keep out nonpolar passengers which we do not want to pass
      • Has hydrophobic sidechains along the exterior of the barrel so that it embedds properly with the hydrophobic C-H tails inside the phospholipid bilayer
    • Has a lot of charged residues
  • Enzyme: Triose Phosphate Isomerase aka TIM aka TPI
    • See Jeremy Knowles and W.J. Albery TIM paper (1977)
    • TIM interconverts between two isomers of triose phosphate
    • Essential to process of glycolysis in muscles for fast efficient energy production
    • See also Protopedia article
    • Convert G3P aka (glyceraldehyde 3-phosphate) to (DHAP) dihydroxyacetone phosphate via intermediate transition state (cis-enediol) which without TIM would take 26 kcal/mol
    • TIM stabilizes the transition state and also prevents the side-reaction of G3P losing it’s phosophate group which tends to happen in water.
    • TIM increases speed of reaction by 1010. Aka, the difference in length of time between ~1 second and 300 years!
    • TIM is about 150x the size of reactant G3P and product DHAP.
    • Three strategies used by TIM/TPI:
      1. Hydrogen proton transfer between the oxygens attached to carbon1 and carbon2. (see drawings in JH notes 8/02/2020 to 8/03/2020). Glutamine-165 and Histidine-95 are the residues that make this happen. If the glutamine residue is replaced by similar Asparagine, reaction performs 1000x worse. Note that Glu and Asp sidechains are exactly identical except that Asp has just one more CH2 in its chain.
      2. There is a lid that keeps the triose phosphate reactant/intermediate/product safe from hydrophilic attack by H2 molecules. Without this lid, the reaction would perform 100,000x worse.
      3. Lysine at position 12 is used to create hydrogen bonds that connect to the various oxygens of triose phoshate to stabilize the molecule as the reaciton proceeds. If we use arginine instead of lysine, the reaction performs 200x less efficiently.
    • “Burglar tools” used by influenza virus to escape the lysosome acidic pH trap.

Lecture 5: Glycolysis and pathways

August 3-6, 2020

  • Biochemical pathways in fermentation
  • How does yeast change a hexose sugar (C6H12O6) into CO2 and alcohol? Multi-step pathway
  • Energetics of a single reaction
    • G3P → DHAP. Lower free energy in this reaction. Gibbs free energy change is about -1.86 kcal/mol. ΔG0 pronounced “delta G zero prime”
      • The activation barrier is very high.
      • But even there is enough TIM, because of entropy and relative concentration of product/reactant, there will always be some reactant left at equilibrium
      • The way to understand whether the forward or reverse reaction is favored is based on delta G ΔG (distinct from ΔG0’ above).
      • See also handwritten JH notes 8/03/2020
    • Tricks to make a reaction move forward even when ΔG is positive (favoring reverse rxn)
      • Trick #1: Direct coupling
        • Couple 2 reactions together. Aka even if A → B is unfavorable, if C → D is extremely favorable, then A+C → B+D might move forward. Example, pair reaction with ATP
        • Convert ATP to ADP (adenosine diphosphate). ATP → ADP has a ΔG = -7.3 kcal/mol
      • Trick #2: Indirect coupling
        • Have a “next step” reaction, e.g., B → C which is highly favorable.
        • So again, assume A → B has positive ΔG (and is not favorable in forward direction).
        • Assume B → C has a very negative ΔG (and is very favorable in forward direction)
        • Then the overall reaction A → B → C will move forward b/c the 2nd rxn constantly removes B, lowering the concentration of [B], and thereby encouraging forward progress of A → B
  • Intro to glycolysis: the breaking of sugar. See JH handwritten notes and textbook references:
    • Alberts
      • Chapter 2: p.80-83: ATP and NAD+/NADH. See detailed 10 step version of glycolysis in Panel 2-8 on p. 120-121. See also JH handwritten notes of this 10-step process August 7-9, 2020
      • Chapter 14: Mitochondria and Chloroplasts. p. 817 Citric Acid Cycle (fka Krebs Cycle). p. 819 diagram of overall citric acid cycle, Figure 14-10
    • Sadava Chapter 9, p.165 - 182
      • p. 167: Redox reactions, diagram of methane <–> methanol <–> formaldehyde <–> Formic acid <–> carbon dioxide. For best version of Jeff’s redox diagram, see JH notes 8/06/2020
      • p. 168: summary diagram of glucose → glycolysis → pyruvate → branch between aerobic respiration and anaerobic fermentation
      • p. 177: more detailed summary of reactants and products for glucose → glycolysis → pyruvate → branch between aerobic respiration and anaerobic fermentation
      • p. 169: detailed chemical diagram from input glucose to output pyruvate
      • p. 170: diagram of citric acid cycle
  • Regulation of pathways. Negative feedback loops aka balancing loops.
    • Allosteric regulation changes the shape of the enzyme to control it’s activity. Thus, let A → B → C → D → E. If/when [E] is too high we want E to act as a regulator to change the performance of the C→D enzyme, we could have the enzyme have a different location (“allo” Greek root for different/other) binding site on the enzyme. And E attaches to this binding site, changing the enzyme shape and reducing activity of the C → D reaction. aka feedback inhibition
  • In contrast, feedforward activation if there is too much reactant and not enough product and we want the mechanism to work faster. This upregulates the activity of the relevant enzymes
  • Cellular respiration and fermentation

Handwritten notes on 10 steps of glycolysis and NAD+ → NADH

  • Drawn from Alberts Chapter 2: p.80-83: ATP and NAD+/NADH. See detailed 10 step version of glycolysis in Panel 2-8 on p. 120-121.
  • Also see this page on glycolysis
  • For detailed drawings of NAD (Nictonamide Adenine Dinucleotide) in both oxidized NAD+ and reduced NADH forms, refer to JH handwritten notes August 7-9, 2020.
  • Note also that pure adenine is just the nucleobase by itself. Adenosine refers to adenine covalently bonded to a ribose sugar like in ATP, NAD, or RNA.

Lecture 6: Intro Mendel and TH Morgan Fruitfly experiments

August 6-15, 2020

  • Biography of Mendel, initial pea experiments
  • Definitions of genetic terms
    • Gene: discrete factors of inheritance
    • Allele: Alternative versions of a gene, e.g., A vs. a
    • Genotype: The two alleles carried by an organism
    • Homozygote: Two of the same allele. e.g., AA or aa
    • Heterozygote: Two different alleles in a genotype, e.g., Aa
    • Phenotype: External appearance or trait.
  • Important distinction. There are no such things as dominant and recessive alleles. ONLY dominant versus recessive phenotypic traits. Especially important because the same gene can contribute to multiple traits, some of which are dominant and some of which are recessive.
  • Error in video lecture. Prof. Lander says that green is dominant to yellow color in pea pants. It’s actually the reverse; yellow pea color is dominant vs. green peas
  • Mendel’s Second Law: Independent Assortment
  • Circa 1890, cytology led to discovery of these mysterious “colored organelles” or colored bodies which were named “chromosomes”. Organization of chromoses into bundles used during cell division aka mitosis.
  • Description of Mitosis and Meiosis
  • Coincidence that Mendel luckily only studied 7 traits in peas. and peas only happen to have 7 chromosomes???

Lecture 7: Meiosis

August 16-18, 2020

  • During meiosis, homologs
  • Review of Mendelian inheritance versus chromosomal inheritance
  • Fruit flies and linkage among chromosomes. Thomas Hunt Morgan was a great skeptic and did not trust that people kept inventing “inheritance factors” just to explain new data a posteri. Should have an a priori model that is then properly tested
  • Pictures of fruitflies with various traits. “+” = “wildtype” phenotype = “wt”
  • See Handwritten notes for multiple generations of wildtype versus mutant flies for Body phenotype: (either dominant wt or recessive black body) and for Wing phenotype: (either wt or recessive vestigial wing)
  • Experimentally, TH Morgan found among the children of Parent 1 ( wt b * wt vg) crossed with Parent 2 (b b * vg vg) these children:
    • 965 identical to parent 1 (wt b * wt vg = heterozygous for both Body and Wing)
    • 944 identical to parent 2 (bb * vg vg = homozygous recessive for Body, homozygous recessive for Wing)
    • 206 new recombinant fly (wt b * vg vg = heterozygous for Body and homozygous recessive vestigial wing for Wing)
    • 105 new recombinant fly ( b b * wt vg = homozygous for recessive black Body and heterozygous for Wing)

Linkage Maps

  • In 1911, 19-year-old Columbia undergraduate Alfred H. Sturtevant took the frequency of recombination ratio data home from the Morgan fruit fly lab and created the first genetic linkage maps.
  • “The recombination frequency equals the number of recombinants divided by the total progeny times 100. A longer distance between two genes increases the chance of crossover between two genes. A recombination frequency between 0 and 50 indicates that the genes are located on the same chromosome. Genes that are far apart on the same chromosome or lie on different chromosomes show a recombination frequency of 50%. Since the most extreme case of un-linkage is independent assortment, which gives 50% recombinant offspring, the maximum value for the recombination frequency is 50%.”
  • the recessive traits black body and vestigial wing assort together about 17% of the time.
  • Sturtevant is doing “pairwise” aka 2-factor crosses for 2 different genes
  • Further, let’s consider a 3-factor cross. How many gametes are possible when 2 parents mate when one considers three phenotopic traits (aka 3 genes)? Answer is 8 = 23.

Linkage Mapping

  • Sample drosophila mutations: Antennapedia and Double Wing aka Ultrabithorax
  • Test question on Linkage Mapping: “The maximum recombination frequency seen between any two genes on the same chromosome is about 50%. Why this is the case?”
    • Answer: Imagine a case where the two genes are located on different chromosomes, so that during meiosis the segregation of alleles of one gene is completely independent of the alleles of the other gene (this is stated in Mendel’s Second Law and is known as the law of independent assortment). On average, you would see the parental arrangement 50% of the time. When the two genes are far apart on the same chromosome, there can be 1, 2, 3, or more recombination events between the two genes. In this case, it is random whether there will be an even or odd number of recombination events, so on average, you would see the parental arrangement 50% of the time.
  • Sex-linked traits See notes on Mac bionotes directory including *.png screenshot.

Meiosis review by PhD student Michelle

  • Diploid number refers to the number of total chromosomes = 2n where n refers to the number of pairs. e.g., human beings have n = 23 for 23 pairs of chromosomes. So humans diploid number = 2n = 2 * 23 = 46 chromosomes where each chromosome has a homologous “semi-duplicate” chromosome with the same genes but possibly different alleles at each gene loci.
  • See also supplemental videos about Punnett Squares, etc.
  • Also, covers meiosis and Barbara McClintock’s work on corn recombination

Lecture 8: Basics of Human Genetics

August 17-18, 2020

X-linked Recessive Inheritance.

  • All men who receive this mutation express it. Women can be carriers or only express trait if they are homozygous recessive on both X chromosomes. Examples: red-green color blindness, hemophilia. Rules include:
    • Predominantly affects males because women are only affected when they inherit allele on both of their X chromosomes.
    • Affected men do not pass this trait on to sons unless man mates with a woman who has the recessive allele. (b/c sons will always get their X chromosome from their mom.) This characteristic is what led to the idea of traits “skipping a generation”
    • A woman who is a carrier who has sons will see this trait show up in 50% of her sons.
    • A woman who is affected by this trait is homozygous recessive (+/+) will pass this trait onto 100% of her sons.
  • Population genetics. Relationship between phenotype and mutant allele. E.g., red-green color blindness occurs in 8% of the male popuation. This means that 8% of all X chromosomes have this allele. Allele frequence = 8%.

Autosomal dominant inheritance

  • Autosome = chromosomes that are not sex chromosomes. Humans have 22 pairs of autosomes, 1 pair of sex chromosomes.
  • Autosomal dominant inherited traits. Example: Huntington’s disease. Population frequency is quite low: ~5 in 100,000 people have Huntington’s.
  • Another example: Familial hypercholesterolemia aka FH affects 1 in 500 people. Using a Punnett Square / Hardy-Weinberg quadratic equation, we can derive that this allele will occur in roughly 2q aka ~1 in 1000 chromosomes
  • Rules
    • 50% of offspring are affected
    • Affects all genders equally
    • Unaffected never transmits
    • Never “skips a generation”
    • If a parent is homozygous for this mutant dominant allele, that means that 100% of their offspring will have express this trait.
  • Breast cancer for example of penetrance and sex-limited traits.
    • Full penetrance of a dominant trait. Huntington’s and FH are full penetrance.There can be incomplete penetrance of a dominant trait. Breast cancer in the form of both BRCA1 (chromosome 17) and BRCA2 (chromosome 13) are examples of incomplete penetrance. About 60% of the women who have either allele will develop breast cancer (not 100% which we would expect for an automsomal domaninant full penetrance type gene).
    • Also, males still exhibit some incidence of breast cancer, at 1% for male carriers of BRCA1 and 6.5% for male carriers of BRCA2. This is a sex-limited disease.

Autosomal recessive inheritance

  • Rules: (1) Trait does not appear in every generation (illusion of “skipping a generation”). (2) Parents of affected children generally are not affected.
  • Example of cystic fibrosis. About 1 in 2000 chromosomes have this mutant allele which produces the recessive trait of CF. aka q2 = 1/2000. So the % of the population that has CF is q= square root (1/2000) aka q = about 1/44. about 1/22 of people are carriers for CF allele on chromosome 7.
  • Predicted Mendelian ratio is that two carrier parents should have 1/4 of their offspring exhibit CF. However, the observed ratio was 1/3. The reason was because humans have small families and so there was an undercounting of families that should include people with CF allele. If humans typicially had 1000 children per family, the expected ratio of 1/4 would probably be observed.

How Real Human Genetics differs from the simple rules in intro biology textbooks

  • Example of Charles Davenport who produced a pedigree chart that claimed that the tendency of people to become sea captains conformed to Mendelian genetics. This was called thalassophilia.
  • Example of early 20th-century scientists who thought that pellagra was inherited rather than caused by poverty and malnutrition.
  • Important to be careful!

Archibald Garrod and Alkaptonuria

  • In early 20th century London. Alkaptonuria = “black urine” in infants. In 8 of the 17 cases, the affected children were the result of first cousin marriages. If only 1 grandparent has a single Alkaptonuria allele (which expresses a recessive trait), and assuming that the spouses who marry into the family in the F1 generation are 100% wildtype, and assuming that first cousins in the F2 generation intermarry, then there is a 1/16 chance that the great-grandchildren born in the F3 generation will be homozygous for the alkap allele and express the recessive trait.
  • Next Dr. Garrod studied the chemical compound in the urine that turns things black in air is homogentisic acid (HGA). The oxidation of HGA creates this black color. The phenol ring looks like amino acids phenylalanine and tyrosine. Experimentally, when Garrod fed these babies extra protein, extra phenylalanine, and extra tyrosine, more HGA was produced.
  • He knows that in wildtype human digestion, protein is broken down into → amino acids (like Phe, Tyr) → homogentisic acid (HGA) → further broken down into something else. In 1908, he gives a lecture claiming that babies with alkaptonuria are missing the enzyme for that final step so they simply accumulate more and more HGA that must be urinated out. Wildtype babies have something (Garrod didn’t necessarily know it was an enzyme), that breaks down HGA.

Josh Meisel and Pedigree Analysis

Lecture 9: Genetic basis of biochemistry

August 19-20, 2020

  • George Beadle and Edward Tatum work. After a lot of frustrating work with the chimeric eyes of mutant drosphila, they switched to a model organism Neurospora, single celled fungus.
  • For purposes of today’s lecture, Lander will instead talk about Saccharomyces cerevisiae (S.cerevisiae) common baker’s yeast.
  • S. cerevisiae are true eukaryotes. 2n = 16 pairs of chromosomes. n = 32 total chromosomes, counting homologous.
    • Diploid cells undergo mitosis.
    • Also, yeast diploid cells (with 2n chromosomes each) can undergo meiosis to create haploid gametes (with n chromosomes each) that can mate like humans, producing F1 cells with diploid chromosomes again.
    • Finally (unlike humans), yeast gametes can themselves undergo mitosis without need for mating etc.
  • Auxotrophs vs. prototrophs
  • Esther Lederberg makeup compact to do replica plating. Much more efficient than individual scrapings of yeast colonies with a toothpick
  • Prefer to screen potential mutants using haploid cells because they can determine a mutant allele whether it is dominant or recessive. Using diploid yeast would only pick up dominant allele.
  • Haploinsufficiency
  • Characterizing mutants test #1: Dominant or Recessive?
    • Simple cross with a wildtype haploid through mating.
  • Test #2: Find complementation groups
    • Definition of complementation: “the ability of two mutants in combination to restore a wildtype phenotype”. Meanwhile, “dominance in a heterozygote indicates the ability of a wildtype allel to complement loss-of-function alleles.
    • Start with 50 different mutants of yeast; none of them can produce arginine. They are all called “arginine auxotrophs”.
    • Cross some of them to clarify if these mutants are based on one or multiple genes.
    • Doubly heterozygous aka “double het”.
    • Gene that makes arginine is called arg1 which may have various mutations: m1, m2, m3.
    • A second gene that contributes to the production of arginine is called arg2
    • Failure to complement means that two mutants are on the same gene. B/c a successful complementation means that one of them is a dominant allele on the same gene.
  • Test #3: Epistasis Test
    • Mutations in a sequential biochemical pathway to indicate which enzymes are affected.
    • Interesting history about how the idea of epistasis has evolved from the early days of genetics in the first decades of the 20th century.
  • Ryan, PhD video on lab techniques with S. cerevisiae

Lecture 10: Intro to DNA Structure

August 20-22, 2020

  • Frederick Griffith’s search for the “S-Transforming Principle”.
  • Oswald Avery, Colin MacCleod, and Maclyn McCarty experiments
    • Through repeated purification of the S-transforming principles, eliminated protein, eliminated proteins, etc. All they are left with is DNA. In 1943, publish the paper claiming that DNA is the hereditary material. People don’t believe it b/c reviewers worry that there was protein contamination b/c DNA seemed to be too simple to be the hereditary material.
  • Structure of DNA
    • Pentose deoxyribose is missing a hydroxyl at the 2’-carbon.
    • Bases. Purines (A, G) have 2 carbon rings. Pyramidines (T, C) have 1 carbon ring. These bases are connected to the 1’-carbon of deoxyribose
  • Al Hershey and Barbara Chase experiments on E. coli bacteriophage viruses.
    • Uniquely tagged protein with radioactive isotopes of sulfur. Since DNA doesn’t have sulfur, DNA will not be radioactively tagged.
    • Alternately, DNA has phosphorus atoms but if you choose the right proteins, amino acids do not.
  • Now, there are two converging lines of evidence that DNA is the genetic material: Avery-MacCleod-MacCarty experiments and Hershey-Chase experiments.
  • Erwin Chargaff did a lot of experiments and formulated Chargaff’s rules, which indicates that the number of A = T and C = G regardless of what species examined.
  • Linus Pauling’s proposed triple axis model for DNA structure has phosphates in the center axis which can’t work b/c they are all strongly negative and will repel each other.
  • Supplemental vidoes.
    • dsDNA = double-stranded aka the most common formation B-form DNA
    • Shirleen PhD student. Replication occurs at origin of replication on the DNA strand. Enzymes involved: helicase, topoisomerase, DNApolymerase, primase, RNAseH, ligase. Okazaki framgnets

Lecture 11: Central Dogma 1 - DNA Replication

August 22-25, 2020

  • Matthew Meselson and Franklin Stahl experiments proving the semi-conservative nature of DNA replication. Used regular N14 nitrogen and heavy nitrogen N15 to specifically label the strands of DNA, proving that replication is semi-conservative

Basic Replication

  • DNA Polymerase
    • 1956, Arthur Kornberg manually adds radioactively tagged dATP, dCTP, dGTP, and dTTP. Short for dexoyadenosine triphosphate (dATP), deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate (dGTP), and deoxythymidine triphosphate (dTTP). See table on this wiki page.
    • Explanation of why DNA must grow in the 5’ → 3’ direction; otherwise, the last monomer on the growing chain might hydrolyze
  • Other enzymes involved:
    • Helicase unwinds helix
    • Primase creates the RNA primer to get the replication started
    • DNA Polymerase attaches the the RNA primer and adds new monomeric nuclueotides to the growing DNA polymer strand (for both leading and lagging strands of the replication fork)
    • As replication proceeds, there is greater and greater tension ahead of the replication fork. Topoisomerase releases this tension by making strategic cuts in the parent DNA double strand.
    • Ligase “glues up” and seals the separated Okazaki fragments on the lagging strand by forming phosphodiester bonds betwen the 3’ OH and the 5’ phosphate group adjacent fragments.

Methods that ensure replication fidelity

  • DNA Polymerase does proofreading.
  • Exonuclease runs in the 3’→5’ direction and takes bases off of the chain (opposite activity compared to DNA Polymerase). Improves accuracy from 1/1000 to 1/1,000,000. One error in every 103 becomes after exonuclease one error in every million 106.
  • Another method: mismatch detection and repair. When a basepair shows a mismatch between two nucleotides, how does E.coli know which one is the “correct” nucleotide and which is the new “wrong” nucleotide? Luckly, there is a methylation process that marks the original parent strand of DNA. New, daugher strands of DNA lack these methylization markers for a bit. So E.Coli only repairs the unmethylated “newborn” DNA strand.
  • After combining proofreading, exonuclease, and mismatch repair, error rate becomes about one in a billion bp (aka 1 in 108). Given that the human genome is only about 3 billion bp, this means that there are only a total of 30-40 errors per replication.
  • The real workhorse for replication is Pol III, not Pol I and Pol II that were discovered earlier. Pol I is not the “real polymerase” and it polymerizes at only 10-20 nucleotides/second. It runs out of steam after about 50 nucleotides. Discovered DNA Polymerase II in a mutant e. coli that could still go through mitosis even though this mutant did not produce DNA Poly I.

More details on Replicon

Lecture 12: Central Dogma 2 - DNA Transcription and Translation

September 4, 2020

  • Uracil is a pyrimidine that is exactly like thymine except it is missing the -CH3 methyl group off the 5 carbon
  • Because DNA transcription process does not use error-correction / proofreading mechanisms like replication, accuracy for RNA creation is only 1 error in 104 (10,000) vs the 1 error in 109 (1 billion) for DNA replication.
  • Note: Because RNA polymerase builds the mRNA in the 5’→ 3’ direction, that means that RNApol is reading off of 3’→ 5’ strand of DNA. See Fig 14.4 on Sadava p. 287 for example.
  • Also, RNA polymerase is officially called DNA-directed RNA polymerase.
  • For more details:
    • Since transcription grows via RNA polymerase aka RNAP aka RNApol in the 5’→ 3’ direction, that means that it is reading in the 3’→ 5’ direction. It is reading off of what is called the template strand aka antisense strand in the 3’→ 5’ direction.
    • In contrast, the new mRNA transcript has the same pattern as the opposite strand aka the coding strand aka the sense strand.
    • See also this diagram from the wikipedia article on transcription.

Lecture 13: Central Dogma 3 - Variations from the Dogma

September 7-9, 2020

  • How do replication, transcription, and translation work in eukaryotes, prokaryotes, and viruses?

Replication

  • Viral replication (5 - 200 kb) and has these types of genetic material in their protein capsid:
  • Prokaryotes (700 - 6,000 kb. e.g., E. coli has 4,000 kb genome)
    • Replication in prokaryotes is pretty straightforward b/c it has 1 large chromosome that is circular. Easy replication for various bacterial plasmids as well.
  • Eukaryotic replication (12 Mb - 4,000 Mb; see PSU notes, Lecture 2)
    • However, replication in eukaryotes is more complicated because eukaryotic chromosomes are linear.
    • This is fine for leading strand replication but lagging strand synthesis will screw up unless the final Okazaki fragment lines up exactly with the length of the chromosome.
    • Instead, there are telomeres on the end of the each chromosomes, consisting of repeated sequences of 5’-TTAGGG-3’ (aka “T2AG3”). These T2AG3 sequences are elongated by telomerase.
    • Note that there are different repeating sequences depending on genera. See table at wiki article here.

Transcription

  • Prokaryote transcription is pretty straightforward
  • Reminder:
    • Sense strand = coding strand = mRNA pattern.
    • Anti-sense strand = template strand = complement of the mRNA
  • Transcription in eukaryotes. The immature RNA aka primary transcript is processed into mature mRNA.Three. Three things happen during this processing:
    • A single GTP (aka Guanine with three phosphates) is added as a “start” cap to the 5’ end of mature mRNA in a process called “capping”.
    • The 3’ end of the mRNA goes through polyadenation. In other words, a poly(A) tail (aka a repetitive series of adenines, AAAA…) are added to the 3’ end. Virtually all eukaryotic mRNA’s have poly(A) tails.
    • Introns removed by RNA splicing. Alternate splicing means that the same DNA gene and same immature mRNA can be spliced into different mature mRNAs (with different exons) that therefore express different proteins. In other words, most eukaryotic genes are called “split” or “interrupted”.
      • Intron and exon are terms coined by Walter Gilbert at Harvard; Glibert also co-founded Biogen and Myriad Genetics and, in 1986, first proposed the RNA World hypothesis.
      • The existence of interrupted genes and RNA splicing was discovered by at Cold Spring Harbor Lab by Richard Roberts and Phillip Sharp. Sharp would later take a leadership role at the MIT Biology Dept and co-found Biogen, Alnylam Pharmaceuticals, and Magen Biosciences.

Translation

  • Eukaryotes. start code: 5’-AUG-3’ and stop codon: 5’-UAG-3’ Eukaryotes are very profligate spenders of genome space so perhaps they don’t leverage multiple reading frames over the same mRNA segment.
  • In contrast, both prokaryaotes and viruses have much more compact genomes.
    • As such, they often need to compress as many messages as possible into a short strand of mRNA.
    • One strategy used by bacteria is to string along many sequential genes all activated by the same promoter region–this is called being polycistronic. Both archaea and bacteria leverage polycistronic RNA
      • Often useful if all these proteins are involved in the same biochemical process.
      • In contrast, being monocistronic is the opposite. This is one a single promoter region one DNA gene → one mRNA → one protein, in keeping with a more traditional Central Dogma model. Most eukaryotic mRNAs are monocistronic.
    • Another strategy used by both bacteria and viruses is having multiple reading frames on the same mRNA. They may pack 2 or even 3 reading frames onto a single strand of mRNA. Example: HIV.

Lecture 14: β-Galactosidase in E. Coli, Estrogen in Mammals, and β-Globin in Humans

September 10-11, 2020

  • Today’s lecture is about gene control in the context of two genes: (1) β-Galactosidase in E. coli as part of the glucose-lactose catalysis switch. (2) β-Globin in humans as part of hemoglobin

β-Galactosidase in E. coli

  • Beta-galactosidase β-Galactosidase (in E. coli) aka “beta-gal” or “β-gal”
  • Lac operon
  • High lactose concentration attaches to the lac repressor protein. Since lac repressor active site is bound molecular lactose, lac does not attach to DNA where it usually rests on top of the lac operon promoter region. Thus, without the lac repressor molecule sterically blocking the DNA promoter region, the lac operon is expressed, increasing lactose metabolism.
  • At the same time, RNAP binds somewhat weakly to the lac operon promoter region. However, if, in addition to high lactose concentration there is also low glucose concentration, this is an additional factor that encourages lactose metabolism.
    • The way this factor works is that if the concentration of glucose molecules is LOW, that means that the concentration of cAMP is high. cAMP which means attaches to the CAP (Catabolite Activator Protein). When there is no cAMP activating CAP, CAP just floats in solution. But with cAMP attached, CAP becomes activated and clamps onto the DNA strand. As an activator molecule, CAP significantly improves the activity of the lac operon, overcoming the weak baseline activity of the lac promoter region when all you have is the lac repressor thrown away from the DNA.
  • For more details watch Sera Thornton’s lac operon regulation video and refer to Sadava notes for Chapter 16.

Estrogen receptors in humans

  • Estrogen receptor is bound to another protein, sitting in the cytoplasm of mammalian cells. When that this receptor-protein complex connects with an estrogen molecule, this complex then moves into the cell nucleus and then attaches to the nuclear DNA in various places.
  • At that point, it activates DNA transcription / expression of specific genes.
  • In other words, estrogen acts as a DNA-binding transcription factor that can cross the phospholipid bilayer cell membrane.

β-Globin in humans

  • Reminder that hemoglobin as a tetramer with two alpha-subunits and two beta-subunits.
  • The alpha-subunits are coded by the HBA1 gene (located on chromosome 16).
  • The beta-subunits are coded by the HBB gene (located on chromosome 11).
    • Compact gene with three small exons: 140 bp, 222 bp, 252 bp
    • The first 50 bp are not translated, called the 5’-untranslated region.
    • Thus, the start codon AUG is around 50bp into the first exon. The TAA stop codon is about 120 bp into the third exon.
    • The untranslated part at the end of the mRNA is called the 3’-untranslated region.
    • 432 bp coding region inside the overall 614 bp mature (aka post-processed) mRNA gene
    • 432 / 3 = 144 amino acids
    • At the end is a poly(A) tail starting at the stop codon AUG which is the beginning of the terminating polyadylation site

Mutations Affecting β-Globin Production

  • GAG encodes glutamic acid. Consider the possible mutation which turns it to GAA, which also codes for GLU. Since no change, this is called a silent substitution.
  • What if GAG is mutated to GAT aka GAU? Now we are coding for aspartic acid (Asp) instead of Glu. We call this a conservative missense change; not a silent substitution b/c the amino acid coded is actually different. However, since both Asp and Glu are both negatively charged polar residues, probably the function doesn’t change that much. Similarly, if we change leucine to isoleucine probably not a huge change.
  • Imagine GAG changed to GTG; this means Glu turns into valine (Val). Big change from a negatively charged polar (aka hydrophilic) Glu to a hydrophobic (aka nonpolar) Val residue. This is a nonconservative missense mutation.
    • The problem with this is that now the nonpolar hydrophobic Val residues attach to similar hydrophobic patch on each α-Globin subunit. As a result, instead of small circular clean tetramers, hemoglobins become these long chains of back-to-back α and β subunits.
    • This forms long rods inside the red blood cell, turning the overall RBC into a sickle shape: causing sickle cell anemia!
  • Next consider a mutation in the eighth amino acid: AAG which codes for lysine (Lys). If we change it to TAG, that codes for a Stop codon! We call this a nonsense mutation. If you are homozygous for this mutation, it’s very deadly and it’s called beta thalassemia.
  • Consider the codons in positions 3, 4, and 5: CTG-ACT-CCT. If the G is deleted from Codon 3 (CT G), then we have a frameshift mutation. Now the sequence is CTA-CTC-CT?-.
    • The original Leu-Thr-Pro now becomes a Leu-Leu-Leu-and everything afterwards is scrambled
  • Another problem is if the promoter is killed. Then the whole operon is unable to be expressed
  • What about if there are mutations in the splicing machinery?
    • There are GT before and AG after exons that are necessary to indicate that there is an exon to be spliced.
    • If these these splice sites are mutated, then immature precursor RNA won’t be properly turned into mature mRNA
  • Poly-adenylation site mutations. If there is a mutation at the Poly(A) site, then the protective endcap on the mRNA tail won’t be there and it may be degraded, leading to problems during translation.
  • Transposons can jump right into the middle of a gene. Even though this seems unlikely (given the small % of the genome that is coding protein), it happens frequently enough that we see a child of two non-thalassemic parents can be a mutant with thalassemia because a transposon destroyed their β-Globin gene.
  • Wholesale deletion of a gene. Murphy’s Law; if exposure to x-rays, a big mistake in the DNA copying machinery, a total break in the chromosome, etc.

LCRs and β-Globin Regulation

β-Globin Region

  • Not just the gene, but the overall DNA before and after it.
  • The next gene over is another gene called delta-globin.
  • 98% of hemoglobin in humans is the standard alpha-alpha-beta-beta tetramer (α-α-β-β). However, 2% of the hemoglobin is alpha-alpha-delta-delta tetramer (α-α-δ-δ).
  • There are also gamma-globin (γ-globin) genes close to the beta-globin genes. Usually, these genes are not expressed. They are primarily turned only turned on during fetal development. Binds more strongly to oxygen than regular beta-globin hemoglobin.
  • This is one reason why some of the sickle cell anemia people suffer less b/c they are still expressing fetal γ-globin hemoglobin.
  • Another globin called epsilon-globin (ε-globin) used in embryos.
  • These paralogs are probably caused by unequal crossing over, aka crossing over homologous recombination during meiosis which caused duplication. Uneven exchange during this process, leading to duplication. Process is called duplication and divergence and is responsible for many homologs out there.
  • Globin superfamily includes alpha, beta, gamma, delta, epislon, and myoglobins. (See photos - 9/11/2020)

Lecture 15: Cloning - Purifying a Gene

September 11-14, 2020

  • By the mid-1960’s, some of the most senior biologists thought that the secret of life had essentially been “solved”; perhaps it was time to move onto the human brain.
  • Typical gene is about 10-30kb long
  • How do you isolate a single gene? It’s hard to distinguish one segment DNA from another segment b/c they all so similar chemically speaking. It’s not like segregating proteins which is comparatively easier bc proteins are so chemically distinct.
  • Important to clarify that cloning as applied to DNA does not refer to human or sheep organismal cloning. Specifically, we mean it here the process of isolating a specific fragment of DNA, encouraging it to replicate (which it was built to do),
  • Steps:
    1. Isolate / Cut the desired DNA segment
    2. Paste the desired DNA segment into a vector. A vector is something a cell will automatically duplicate. (e.g., plasmid?)
    3. Transform aka insert this duplicated/cloned DNA into a target cell
    4. Select for the cells that took up the DNA. Aka have a filter that can select only the transformed cells. Selectable markers (like an ampicillin resistance gene are critical to isolating the desired colony of transformed cells in the end). See also auxotrophs

Cutting with restriction enzymes

  • Restriction endonuclease with a palindromic sequence. Because the enzyme is a dimer so it can read in either direction. Found in E. coli strain R. Called EcoRI. EcoRI is part of the bacterial defense/immune system to protect against bacteriophages. That’s why it’s called “restriction enzyme”. Because it restricts the growth of virus.
  • Second enzyme called Dam methylase first comes and adds a methyl group to the adenines in all the EcoRI sites (palindromic GAATTC sequences) to keep EcoRI from cutting there. This keeps the bacteria from cutting the palindromic GAATTC sequences in E. coli’s own DNA.
    • Dam methylase works pretty quickly so it’s ok to only methlyate one strand before EcoRI gets there.
    • And after mitosis, only one daughter strand has any methylation at all. Again, Dam works pretty quickly so both daughter chromomsomes quicky have at least one strand that is fully methylated.
  • See photo roll on 9/14/2020 for screenshot of various other restriction endonucleases.
  • Some restriction endonucleases cut “cleanly” leaving blunt ends whereas most like EcoRI create single-stranded overhangs called sticky ends.

Pasting with ligase

  • You can paste with DNA ligase which we learned about in the context of DNA transcription (on the lagging strand where the former RNA primers and Okazaki fragments need to be connected).
  • Recombinant DNA, where ligase could actually connect a bacterial strand with a zebra strand.

Vectors

  • Plasmids have origins of replication.
  • Horizontal transfer of DNA between bacteria. If a bacteria dies, it lyses open and the plasmids and other DNA molecules can be absorbed but other living bacteria.

Transform another cell

  • Genetically modify an organism by passing a modified vector plasmid into it. Important to associate a gene for antibiotic-resistance on the vector so that after you plate out the E. coli on a petri dish you can eliminate the untransformed bacteria.
  • Story of how Salvador Luria discovered the phenomenon of methylation, which is part of what led him to discover restriction enzymes.

Michelle Mischke

  • Origin of replication (Ori) and promoters and strong versus weak binding of RNAP. Even RNAPs weakly bound to a promoter region (b/c of sequence semi-mismatch) can improve their activity if other helper proteins attach to this complex.
  • Note that b/c there are different DNA pol and RNAP used in humans vs. bacteria, the ori and promoter sequences will be different across different taxa. This has consequences for lab experiments.

Lecture 16: Finding a Specific Gene in the Library

September 14-18, 2020

Tricks of the trade

  • Library = set of bacterial colonies on a petri dish
  • See Wikipedia for a good overview of molecular cloning including the history of this technique.
  • Problem 1: what if there is an EcoRI site in the middle of the desired gene? Two possible solutions:
    1. Partial digestion aka lower concentration of restriction enzyme in solution
    2. Restriction/methylation competition by titrating how much methylase you add to protect a certain desired % of the source DNA sequence
  • Problem 2:
    • Assume you are using an antibiotic like ampicillin as the selector to filter out E.coli colonies that haven’t been transformed with the desired source DNA sequence.
    • We assume that only bacteria that have absorbed a plasmid that contains both the ampR resistance segment and the source DNA segment survive the petri dish that contains ampicillin.
    • However, it is quite likely that a plasmid accidentally closes with only the ampR resistance segment, thereby conferring immunity to an bacteria that is missing the source DNA segment.
    • How do we prevent this?
    • Answer:
      • We use the enzyme phosphatase (see for example alkaline phosphatase) which strips off the terminal phosophate. Thus, ligase can no longer anneal this plasmid until we are ready. In other words, phoshatase makes it such that ligase can no longer do its job and the plasmid will no longer “self-ligate”.
      • Next, when the desired source DNA insert segment comes along, it carries its own phosphates which allows the plasmid to close.
      • Note that this will still leave single-strand nicks. Those nicks will be healed in the target bacteria.
  • Shearing DNA is another trick: you can shake up DNA in water, shearing them to pieces of various lengths. Depending on how much you’ve shaken things up, you can make smaller and smaller segments of DNA. Also, you can use enzymes to fix the ends of each strand so they are blunt ends or sticky ends as desired

Different vectors for different target organisms, etc.

Vector Target Size (base pairs)
Bacterial plasmid Bacteria 100’s - 5000’s
Bacterial virus Bacteria ?
Yeast plasmid Yeast 5000
Yeast artificial chromosome Yeast 1M range
Mammalian virus Mammals ?
  • Use a bacterial virus as the vector (instead of a plasmid) to transform target bacteria
  • Use a mammalian virus as the vector to transform target mammal
  • Create an artificial chromosome from scratch. Scientists have done this in yeast already. Worked out the telomere sequence, origin of repliction, a site of the desired gene, etc.
    • Can actually build a chromosome of 1M nucleotides. Used to do this a lot in the 1990s
    • No longer do this anymore because it’s a lot of work but is not needed/necessary anymore.
  • Can even create artificial human chromosomes

cDNA

  • Source DNA doesn’t have to be genomic DNA, there is another way we can create source DNA.
  • One an make a library of expressed mRNA. You do this by cloning RNA into DNA using reverse transcriptase. This creates cDNA aka complementary DNA.
  • cDNA is synthesized from ssRNA into dsDNA. Often used to copy (clone) eukaryotic genes into prokaryotes.
  • Generally created off of an RNA template. If cDNA is built off of a mature, post-processed single-strand mRNA template, the resulting double-stranded cDNA will be “clean”, containing only the exons and not have the extraneous introns, start/stop codons, etc.
  • But what do we use as a primer to start the DNA pol? We can use a string of poly(T) aka 5’-TTT…TTT-3’ to match with the poly(A) tails that mature mRNA has to match and create a starting primer region.
    • For more detail, see this quote: “cDNA can be generated by…first extracting total RNA from cells, and then isolating the mRNA from the more predominant types—transfer RNA (tRNA) and ribosomal (rRNA). Mature eukaryotic mRNA has a poly(A) tail—a string of adenine nucleotides—added to its 3’ end, while other types of RNA do not. Therefore, a string of thymine nucleotides (oligo-dTs) can be attached to a substrate such as a column or magnetic beads, to specifically base-pair with the poly(A) tails of mRNA. While mRNA with a poly(A) tail is captured, the other types of RNA are washed away.”
  • One benefit of cDNA is that it is much more compact than genomic DNA because the introns have been removed!

Finding your gene by complementation; the ARG1 gene

  • ARG1 codes for enzyme arginase
  • Yeast mutants who are auxotrophs requiring external arginine because the mutation prevents it from manufacturing arginine by itself. The gene is ARG1.
  • Take all the yeast DNA and cut it to pieces, ligate it to vectors. Transformed it into cells.
  • Now we have thousands of colonies are all carrying different plasmids from the fragmented source DNA.

Finding a gene by protein expression

  • e.g., inject a rabbit to produce monoclonal antibodies and you link them to fluorescence

Review

  • Again the steps to cloning a gene:
    1. Decide Source DNA which is either genomic DNA or cDNA
    2. Place source DNA into Vector
    3. Use vector to Transform bacterial, yeast, mammalian or other target cells
    4. Select transformed cells using either (4a) complementation or (4b) protein expression

Lecture 17: Analyzing a Gene with Gel Electrophoresis, Sequencing, and PCR

September 18-19, 2020

Gel Electrophoresis

  • DNA is a negatively charged molecule so the electric field applied moves it away from the negative pole and towards the positive pole.

DNA Sequencing

  • Concept of DNA sequencing. ddNTP = double dehydroxy 2,3-NTP to terminate DNA Polymerase activity.
  • Implementation of original Sanger sequencing
  • Updated post-Sanger protocol using fluorescent tagging instead of radioactive phosphorus, higher throughput of 96 batches at once, automation involving laser inspection (rather than human inspection and manual recording by pencil of A,T,C,G) etc.

PCR

  • primers need to run in the 5’ to 3’ direction, where the 3’ end faces inward towards the gene. You need 2 primers, one for each opposing strand, and each running in the opposite direction.
  • Extra follow-up video on gel electrophoresis by Lori

Lecture 18: Human Genome and Positional Cloning

September 19 - October 8, 2020

Review of recombinant DNA

  • Making libraries, given any source DNA, make vector, transform target cell, select successfullly transformed cells. Review the Wikipoedia article on molecular cloning.
  • Finding specific genes by
    • Function / complementation
    • Protein expression, using antibodies, stick to appropriate colonies
  • Oxford Animations. Videos 1-3 are finished as of 9/19/2020
    • Next: watch videos #4-10 on Transcription, Regulation, mRNA Splicing, and Translation

October 5, 2020 - Genetic Mapping in Humans

  • David Botstein’s 1978 experiments for human crosses. Applying fruit fly genetics techniques to understand human mendelian genetic diseases like Huntington’s.
  • However, there were a number of challenges because humans ≠ fruit flies
    • fruit flies generation times are a lot shorter and many more offspring per mating pair
    • unethical to artificially match human mates the way you can with fruit flies in a lab
    • humans do not display obvious, single-gene traits like eye color which, in addition to wild-type (wt aka +) can exhibit alleles for sepia (se), cinnabar(cn), and white(w) colored eyes.
  • Instead, you search for “silent” small polymorphisms. It is a co-dominant trait. Simple, small single nucleotide substitution.
  • If you try a genetic marker on any chromosome other than the chromosome that hosts the Huntington’s Disease (HD) gene, then there will be independent assortment per Mendel.
  • However, if these sites of polymorphisms are on the same chromosome as HD, then it will be linked.
  • How did people search for human polymorphisms in the 1980’s? It was a very slow, painstaking process.
    • Technique #1: Use restriction endonuclease EcoRI to scan for its desired 5’-GAATTC-3’ or the complementary palindrome 3’-CTTAAG-5’. Along a human’s 3 billion bp’s, the many instances of 5’-GAATTC-3’ will have Single-Nucleotide Polymorphisms (SNP) where EcoRI will not cut. But different individual humans will different 5’-GAATTC-3’sites where an SNP keeps EcoRI from cutting. As such, when digesting the same chromosome across multiple human subjects with EcoRI, you will get different fragments. You can use this to identify the SNP’s that distinguish different individuals.
    • Technique #2: There are long strings of repeated CACACA in the human genome, termed (TA/CA)n. However, DNA Poly does not always copy the same number of CACA’s. e.g., Alice may have 15-CA repeats at a locus while Bob has 17-CA repeats at the same locus. By using the right primers and PCR, one can amplify this specific sequence and distinguish different individuals.
  • Nancy Wexler and Jim Gusella at MGH search along hundreds of polymorphisms to search for ones near the gene for Huntington’s disease. They published these results in 1983.
    • Even though in theory one might need to examine ~600 SNP sites to find one close to the HD gene, Gusella found it on his 12th SNP. He was very lucky! It had 1% recombination; aka, 1% of the time, it assorted with the HD gene. In other words, there are about 100 Mbp available for recombination in the human genome. Wexler and Gusella had found an SNP that recombined in a linked way with the presumed HD gene within 1 Mbp. 1 Mbp / 100 Mbp = 1%. In other words, given a random distribution across the whole human genome, 99% of the time this marker would recombine in the same hereditary pattern as the HD gene. Thus, it was “within 1%” of the HD gene in terms. Alternately, it was within 1 Mbp (~ One centimorgan cM which in humans is about 1Mbp though this varies widely among species and circumstances).
  • In 1985, David Botstein pulled Lander into his office and said that he had isolated the gene for cystic fibrosis, or at least within 15 Mbp. Again, given that the set of potential recombination was about 100 Mbp in humans, that means that they had eliminated 85% of the human genome as a potential locus of the CF gene. The marker was within 15 Mbp (aka 15 cM of the presumed CF gene).

Physical mapping

  • In contrast to the genetic mapping discussed in the section above, physcial mapping involves examining various plasmids you’ve plated out on a petrie plate.
  • Chromosomal walking using multiple probes on other side of the HD or CF gene.
  • But HD, Gusella had to take 10 years to sequence that specific HD gene! after being lucky with the 12th try to find the associated gene marker

Human Genome Project

  • Rough draft, about 90% complete, published February 2001.
  • Final draft, about 99.3% complete, published April 25, 2003, almost exactly 50 years after the Watson-Crick paper on DNA structure (April, 1953).
  • After 1st human genome sequence was completed, scientists could use it as the reference to compare all future human individuals sequenced. And from there, we collect single nucleotide deltas as a new SNP.
  • As of 2013, have collected 20 million distinct SNPs.
  • The gene chip is a glass slide with a little spot for each 25-bp long sequence of DNA. Via photolithography, we can have up to 2 million spots per glass slide, searching for millions of SNPs at once from a single blood sample from one human individual.
  • As a result of technology like the gene chip, have identified 3,500 Mendelian diseases in humans as of 2013.

Improvement in DNA Sequencing

  • Illumina sequencing, which is one of several Next Generation Sequencing (NGS) aka high-throughput sequencing . These are all advances over earlier Sanger dideoxy DNA sequencing.
  • Illumina sequencing provides only dideoxy NMP nuecleotide, and take a fluorescent picture for each nucleotide added.
  • 3 billion different spots can all sit on a single glass slide. Each spot has a probe aka primer with a specific sequence that will grab DNA that washes onto it
  • Each little spot has a mini-PCR reaction module which amplifies the DNA that attaches
  • This process can read about 100 bp before falling off per DNA strand. This means 3 billion spots * 100 bp per spot = 300 billion bp all measured on a single slide. And since you can do this operation in both strand directions, you can actually do 600 billion bp on a single slide!
  • Illumina has a comparison page for NGS vs. Sanger Sequencing. See also the overall Wiki page describing the history and breadth of DNA sequencing techniques.

Lecture 19: Secrets of the Human Genome

  • October 9-11, 2020
  • What are the elements of the genome? Genomics tries to look at the big picture, as an integrated whole–as distinct from its forebears biochemistry and genetics.
  • Draft genome announced June 2000, published February 15, 2001.
  • 99.3% complete draft announced April 2003, officially published October 2004.

General observations

  • Highly uneven distribution of genes. Some regions are very dense with genes, other regions are sparse with genes. On average, about 7 genes every million bases.
    • Some regions have 40 genes per Mb. Other regions have 0 genes across a Mb–called a “gene dessert”.
  • Less than 5% of the HG actually codes for proteins. In fact, per the coding vs. non-coding part of the genome cross-species omparison described below, we have shown that there are only 21,000 genes in the HG, aka only 1.5% of the genome consists of bp that code for protein.

LINE elements

  • Long Interspersed Nuclear Elements (LINE) about 7,000 bp long.
  • All they do is transcribe an mRNA which is translated into a reverse transcriptase. All the rTranscriptase does is create another LINE element that is placed back into the human genome.
  • There are about 100,000 LINE elements in the HG. And they have been a part of eukaryotic genomes for about 1 billion years.
  • Most of these 100k LINE elements are non-functional aka they do not succeed in generating a transcript.

SINE elements

  • Also the HG has Short Interspersed Nuclear Elements (SINE)
  • There are about 1 million SINE’s in the HG, and they range in size between 100-700 bp in length.
  • They do not code for a reverse transcriptase; instead, they piggyback on RTrans produced by LINE’s. In essence, they are parasitizing LINEs which are themselves parasitizing the regular nuclear genome.

Creating a molecular clock using LINE paralog family tree

  • If you sequence as many LINE’s as possible, you can create a family tree of changes. The ones with the most changes are the oldest. The most recent have not yet had time for many mutations / copying errors to creep in.
  • Very rough estimate: over the course of 100m yrs, expect about 0.2 change per base. Aka, 1 base in every 5nt or 2 bases in every 10nt will change every 100 million years.
  • That’s the same as saying that there is roughly 1 change per 50nt every 10 million years. aka 1 change in 500nt every 1 million years.

Phylogenetics for two species

  • Let’s compare multiple species. This means that we need to sequence the genomes of various species. e.g., The mouse genome rough draft was published in 2002, final draft completed around 2004
  • Let us use the beta-subunit of hemoglobin: β-globin as our reference. This protein is coded by the HBB gene which consists of three small exons: 140 bp, 222 bp, 252 bp in the human.
  • This same gene appears in mice, but the intron sequences in between the 3 exons are smaller. In other words, in total DNA distance, the mouse version is smaller (when one includes both exons and introns).
  • Why is the human version larger? B/c transposons have hopped into lengthen the introns, and perhaps more deletions in the mouse version?

Phylogenetics for multiple species, gene order

  • Tree of human, chimp, macaque, lemur, dog.
  • For several million bases, the order of genes is the same across many different species.
  • Hierarchy from most acceptable (aka least selection pressure against) to least acceptable (aka most selected against):
    • Silent substitution aka, single bp change that still codes for the same amino acid
    • Next, a bp change that changes amino acid to a very similar residue, e.g., a mutation that turns leucine to isoleucine
    • Next, a bp change that changes amino acid to a very different one (e.g., nonpolar residue into a very polar, positive charged amino acid)
    • Nonsense mutation aka, removal of a stop codon
    • Frameshift mutations
  • Chart showing that within a single gene that is orthologous across mouse, dog, rat, and human.
    • On the left, the intragenic sequences either are 100% conserved (no mutations) or only have small unimportant single nt silent substitutions.
    • On the right, we seen the intergenic sequences where there are much more significant base pair changes across the four species. Natural selection doesn’t need to operate here!
  • This analysis lets us know where all the “gene areas” are of the DNA and which are intergene areas where there are no protein-coding sequences. Based on this inter-species comparison, the HGP was able to show that rather than circa 1999 estimates of ~100,000 genes in the HG, there are only 21,000 genes in the human genome.

Conservation of non-coding sequences and gene regulation

  • Use transposons as a “dead” background rate of mutation
  • about 5-6% of the genome is conserved, even through it does not code for protein. What is it? Some of it is regulatory sequences.
  • Example: the Satb1 gene that is important in early embryo development. By examining the DNA on either side of this gene, we see that a lot of the surrounding DNA is not protein-encoding but is still highly conserved. This implies that a very complex regulatory system must surround Satb1 to make sure it acts properly in helping lay down the Bauplan for an embryo.
  • A lot of the most highly conserved regions of noncoding DNA surround the 200 or so genes most involved in embryonic development.

Non-coding RNA regions

  • Another type of non-protein encoding DNA regions that are conserved (along with regulatory regions) are areas that code for RNA that is not then translated into protein.
  • Called non-coding RNAs aka ncRNA
  • From 2009-2013, have discoverd at least 7,000 genes that code for non-coding RNAs. And this is an underestimate b/c they are still discovering them. And while some of these have known purposes but most of them we don’t know what they are for.
  • Abbreviated lncRNA’s (Long Intergenic Non-Coding RNA) aka aka long ncRNA aka lncRNA

Transposons as a force for evolution

  • Transposons can be a source of new regulatory systems, esp. a single regulatory sequence that is copied around the gene to simulaneously activate/repress many genes at once.
  • The hint is when these regulatory sequences show up. if they are in the middle of a recent transposon, e.g., in the 100-200 mya time frame where placental mammals separated from marsupials.

Mitochondrial DNA and other ancestors

  • The circular DNA in mitochondria are only about 16k long.
  • Can be used to build a family tree of similar/different mtDNA to relate various groups of humans from around the planet. And the time when these groups separated map very closely with other evidence of human migration. Origin in Africa, migration to Asia, Europe, Pacific Islands, Americas.
  • This maternal ancestor is called “Mitochondrial Eve”, who probably lived over 150-200k years ago.
  • mtDNA aka mDNA is highly conserved across eukaryotes b/c mitochondria play such a critical role in cellular respiration. However, they have a higher mutation rate compared to nuclear DNA b/c it has less effective repair mechanisms.
  • Out of Africa hypothesis. Exit from africa is about 70-100k years ago.
  • Markers to find other ancestors
    • Y-chromosome Adam. Probably never met Mitochondrial Eve. Coalescnce when you run backward.
    • Hemoglobin Harry (is probably an early eukaryotic ancestor).
  • Relationship between modern humans and Neanderthals
    • Examining Neanderthal bones that have preseved DNA fragments. Neanderthal mtDNA show that their split from modern humans was about 300k years ago.
    • Modern humans go back to about 150-200k. Multiple human migrations from Africa.
    • For several years, based on mtDNA evidence alone, scientists thought that there was no interbreeding between Neanderthals and modern humans
    • However, if we examine nuclear DNA and look for SNP in a particular part of the human genome and compare modern human individuals from Europe, Asia, and Africa, along with a Neanderthal genome you see this pattern:
      • No difference between Europeans
      • No difference between Asians
      • No difference between Europeans and Asians
      • BUT, difference between either Europeans or Asians compared to Africans
      • And that difference is somewhat closer to Neanderthal by about 3%. This implies that during the later modern human migration out of Africa, they interbreed a little bit with Neanderthals and share some SNPs with Neanderthals. These SNPs are shared across Eurasians, but not with ancestral modern human Africans.

Lecture 20: Observing the Genome to Probe Function

  • October 12-17, 2020
  • Polygenic disease
  • How do we artificially indel genes. Knock-in and knock-out of genes.

DNA Polymorphisms in Medicine

  • Use DNA chips aka “SNP chips” aka SNP array.

What about rare mendelian genetic diseases where you don’t have family info?

  • Need to examine the entire chromosome
  • One strategy: find patients who expresses the disease and scan all 3B bp for “spelling errors”.
    • In theory, if this disease is a dominant trait, then we will only find 1 spelling error around the presumed mutated gene.
    • And if the disease is a recessive trait, then we expect both copies of the gene to have a spelling error (and not necessarily the same error).
  • But there is a problem with the above strategy: there are actually many many spelling errors among “normal” wildtype human genome. On the order of about 1 spelling error every 1000 bp. Or for more heavily selected areas (aka in coding regions), maybe 1 spelling error every 2-3000 bp. So at that rate, there is a baseline of about 1m “spelling error” per wildtype or afflicted individual. So how can we figure out which error is a marker or mutation for the postulated disease-gene??

Strategy #2 for recessive trait

  • Good news is that we’ve actually sequenced thousands of individual human genomes and have a pretty comprehensive catalog of the usual locations for SNPs. For common variant misspellings
  • Now what about special variant genes that only occur less than 0.5% of the time?
  • It turns out that the average human has about 150 genes that have some sort of rare variant (that is at that frequency (0.5%).
  • However, if this disease is recessive, then we need both copies of the diploid chromosome to have a mutation. Given that we stated that this mutation only happens 1% of the time, then for it to happen twice in the same individual means 1% * 1% = 0.01% of the time aka 1 in 10,000.
    • And since there about 20,000 genes in the genome, that means that with a single recessive afflicted individual, you can do a good job pinpointing the particular gene.
    • And if you can gather just a few more recessive individuals, you can very precisely pinpoint the desired gene.
  • So in this case, we can find the gene without a linkage map! However, this technique will not work if the disease is a dominant trait.

Polygenic diseases

  • Classic Mendelian disease explains about 3-4000 diseases.
  • But most diseases, esp. the ones we hear most about, are not like clasical Mendelian genetics. Instead, they are polygenic diseases, e.g., many cancers, neurological disease, mental illnesses, cardiovascular disease, etc.
  • Example: PPAR-gamma helps regulate fat cells (adipocytes), fatty acid storage, and is involved in adult onset diabetes.
    • About 15% of the population has the SNP variant that protects against diabetes.
    • Conversely, about 85% of the population has the SNP variant that makes diabetes slightly more likely
    • Turns residue-12 from a proline to an alanine

International HapMap Project

  • Correlation map across the entire genome across all SNP markers and variant nucleotide spellings.
  • The strategy: Don’t try to sequence the entire genome for many individuals. This is what to do…we know that there are about 1-2 million locations for SNPs. We build a DNA chip where each of its 1M chip regions looks only at the known 1-2 million SNP locations.
  • We can then use that DNA chip to quickly screen thousands of humans. And quickly build a comprehensive library (HapMap)
  • The definition of haplotype is simply a block of DNA in the chromosome where several closely located SNPs are inherited together. In this context, we are using the second definition described at the Wiki article for Haplotype.
  • Example: Age-Related Macular Degeneration aka AMRD. From HapMap studies, we discovered that an SNP on the gene coding for complement factor H where the tyrosine located at residue 402 is replaced by a histidine. And this variant is correlated with a 35% higher chance of developing AMRD.
  • Example: Schizophrenia. Initial study in 2009 examined 7000 patients and found no particular SNPs that were well-correlated with schizophrenia. However, they tested larger and larger sets of humans. As of 2013, they tested 50,000 patients and found about 93 regions of the human genome that have statistically significant contribution to schizophrenia.
    • Not clear on exact mechanism. However, 4 of those 93 potential genes all code for subunits of the same multi-protein complex that is used in a certain neuronal membrane L-type calcium channel.

Global view of RNA variation

  • Examine a red blood cell or a skin cell. To measure the mRNAs, use reverse transcriptae to create cDNA and then sequence.
  • Another way to measure RNAs is using RNA chips. The chip allows you to see the expression level.
    • For example, given that humans have about 21,000 genes expressed at a given moment in the cell cycle. We design an RNA chip that has 21,000 different locations, one for each gene. Also, there are multiple copies of the gene template at each chip location.
    • Then, we extract and isolate all the mRNA from the cell label it all with a fluorescent tag.
    • Then, we wash all this RNA over the RNA chip. Bc there are multiple copies of identical DNA at each chip location, the genes with higher levels of expression will have more mRNA bonded.
    • This outputs a vector with 21,000 real number values representing the gene expression for all 21,000 genes for an individuals.
    • Repeat this process with a second human patient, outputing another vector with 21,000 scalar values
    • Ultimately, you have an array where each row represents one person’s whole mRNA expression over 21,000 cells. And each of the 21,000 columns represents the expression level of a particular gene (out of the 21,000 genes) across multiple individuals.
  • Example of leukemia and the subtypes of ALL versus AML. Each has a different pattern of gene expression.
    • But with 21,000 genes in 21-dimensional vector space, you can cluster various samples in that space and determine what different types of pathologies there are. So it’s much faster and less painstaking then the process Sidney Farber went through when using classical pathology he distinguished between the very similar looking cells that exhibit ALL vs. AML.
  • Example of different subtypes of breast cancer and which ones are best treated by certain types of chemotherapy

Protein localization in the genome

  • Focus on nuclear estrogen receptor (not the membrane variety). These are a type of nuclear receptor which bind directly to DNA to up-regulate or down-regulate activity of those genes.
  • How do we know which DNA sequences nuclear estrogen receptors bind to?
    • Use formaldehyde or another agent to chemically fix the nuclear estrogen receptor (NER) to its preferred DNA attachment locations.
    • Develop a monoclonal antibody (mAb) in a rabbit specific to the NER protein.
    • Break up DNA and then mix with mAb so that we can isolate the specific DNA fragments that have NER.
    • Sequence those specific fragments of DNA.
  • This technique is called ChIP-seq aka Chromatin Immunopreciptation-sequencing.

Lecture 21: Perturbing the Genome to Probe Function

  • Started October 17-18, 2020. Then completed November 2-4, 2020.

Adding a Gene

  • Up to this point, we are able to move from almost every point in the coat-of-arms to any other point.
    • We can sequence a gene to figure out the protein it codes.
    • Given a protein, we can determine it’s associated gene using the clone by protein expression method.
    • Given a function, we can isolate it’s associated protein through biochemical purification.
    • Also given a function, we can figure out its associated gene through a classic mutant hunt or by cloning the gene via: (a) complementation DNA or (b) positional
  • However, we are not able to go from Protein to Function or from Gene to Function. This lecture shows us how to go from Gene to Function. And via cloning by expresssion, we get Protein to Function “for free”
  • Opposite of the mutant hunt. We are going to do knock-in genes to create mutants.
  • Add plasmids to mutants missing a gene to revert to wildtype is cloning gene via complementation.
  • One can inject DNA into a fertilized mouse embryo and then implant that egg into a hormonally prepared mother mouse who will give birth to the transgenic offspring.
  • Another technique is to use embryonic stem cells (aka ESC).
    • Right after the blastocyst stage, an embryo enters inner cell mass (ICM) stage–also called embryoblast or pluriblast.
    • One can extract from the ICM an embryonic stem cell.
    • ESC cells are pluripotent
    • And then transform those ESC in a petri dish with some genetic indels. The reason we have the petri dish step is that you can do a lot of experimentation at this stage and do quality control all with cells before re-implantation in vivo.
    • And then reimplant those ESC into another ICM embryo.
    • When the ICM embryo comes to term and is born, it is a chimera with some of its original cells and some transformed cells.
    • How to tell? Simplest way is to start with a black mouse ICM and inject transgenic white mouse cells. The pups which are born striped black and white have a mix of both types of cells.
    • Finally, does the genetic modification breed true? It depends on which of the chimeric cells made it into the germline of the pup. If no, then mutations stay in the pup. If yes, then the mutations are passed on to the following generation.
  • What happens in the petri dish before re-implantation?
    • How do we target a desired gene we want to knock out? We use negative selection, aka, if the desired gene transfer did not happen, then the ES cell fails to survive.
    • First, we construct a synthetic piece of DNA that contains: (1) a copy of the desired gene, (2) an antibiotic resistance gene placed right in the middle of the desired gene, and (3) next to the gene we have a gene for thymidine kinase.
    • This means that when we inject the synthetic piece of DNA above into the cells, only when it recombines in exactly the right spot to knock out the host gene, will the cell survive b/c of the combination of anitbiotic resistance and thymidine kinase.
    • BTW, the thymidine kinase gene kills the host if the ES cell is placed in HAT medium.
    • This work leveraging homologous recombination and ES cells to create knockout mice was published in the 80s and early 90s was performed by Mario Capecchi, Oliver Smithies, and Martin Evans who were awarded the Nobel Prize in 2007.
  • Incidentally, the knockout strategy described above only works if there is an antibiotic that kills mammalian cells in vitro. These do exist and examples are neoymycin and geneticin.

Subtracting a Gene

  • How do we do this? If we just use a restriction enzyme, it will randomly cut all over the place. How do we target a specific gene knockout? (12:06 out of 31:48) on November 6.

###loxP and selective knockouts

  • November 4: Day after 2020 election, restarted this lecture
  • starts at 21:08 out of 31:48 in the second video segment of the lecture

RNA Interference (RISC)

TALENs

CRISPR

Lecture 22: Familial Hypercholesterolemia

  • October 18, 2020

Lecture 23: Cancer Biology

Lecture 24: Science and Society–DNA and the Law


Checklist

Unit 1: Introduction and the Biochemistry of Life

  • Lectures 1, 2, 3

Unit 2: Biochemistry: Proteins, Enzymes, Glycolysis

  • Lectures 4, 5

Unit 3: Genetics including meiosis

  • Lectures 6, 7
  • Finish problem set for Unit 3: Genetics I

Unit 4: Human and Biochemical Genetics

  • Lectures 8, 9
  • Watch lecture 8 follow-up video clip of Meisel PhD video on pedigree analysis
  • Finish problem set for Unit 4: Genetics II

Unit 5 : Molecular Biology I - DNA Replication

  • Lecture 10
  • Lecture 11

Unit 6 : Molecular Biology II - Transciption, Translation, Variations

  • Lecture 12: Transcription and Translation
  • Lecture 13: Variations between viruses, prokaryotes and eukaryotes
  • Lecture 14: A Tale of Two Genes: β-Galactosidase and β-Globin
  • Watch Sera’s video showing how lac operon is regulated by: (1) the lactose-binding repressor and (2) the ⤓glucose-⤒cAMP-bound activator
  • Cerego Memory Set

Unit 7: Recombinant DNA

  • Lecture 15: Cloning - Purifying a Gene
  • Lecture 16: Finding a Specific Gene in the Library, including Michelle’s follow-up video on Restriction Enzymes, EcoRI specific sequences in plasmids, etc.
  • Lecture 17: Analyzing a Gene with Gel Electrophoresis, Sequencing, and PCR
    • Watch Lori’s follow up lab video on gel electrophoresis
  • Finish Cerego Memory Set

Unit 8: Genomics I - Human Genome

  • Lecture 18: Human Genome and Positional Cloning
  • Lecture 19: Secrets of the Human Genome
  • Finish watching supplemental videos (see Lecture 18; video segments 16 and 18) on Sanger and Illumina Sequencing by Niall Lennon.

Unit 9: Genomics II - Observing and Perturbing the Genome to Probe Function

  • Lecture 20: Observing the Genome to Probe Function
  • Lecture 21: Perturbing the Genome to Probe Function
  • Rewatch Lec21 last half on RNAi, TALEN, CRISPR

Unit 10: Disease, Science, and Society

  • Lecture 22: Familial Hypercholesterolemia
  • Lecture 23: Cancer Biology
  • Lecture 24: Science and Society–DNA and the Law

Miscellaneous notes