BioNotes.Org

ML, vim, biology, math, and more

2023 Gen AI

  • 2/09 - Got stable diffusion working on the Mac Studio and documented all the steps.
  • 2/12 - Review Goodfellow, Bengio, and Courville on history of autoencoders
  • 2/13 - more GBC on autoencoders, refers to GOFAI/symbolic approach as “knowledge base” approach. But I think KB was really an 80s subset of symbolic overall approach.
  • 2/21 - read Stephen Wolfram and Murray Shanhan Feb 2023 articles on LLM
  • 2/23 - more articles on history of BERT, transformers, MUM, etc.
    • This 18 minute YouTube video by Mean Gene Hacks compares 3 LLMs that all have about 175 B parameters: OpenAI’s GPT-3, BigScience’s BLOOM, and Facebook’s OPT-175
  • 2/24 - Today, FB released LLaMA (Large Langauge Model Meta AI) in 4 sizes 7 billion parameters, 13 billion parameters, 33B parameters, and 65B parameters. Yann LeCun claims that LLaMa-13B outperforms GPT-3 (even though the latter has 175B). And LLaMA-65B is competitive with the best models like Chinchilla70B and PaLM-540B.
  • 2/26 - Mt. AI / volcanic island / infinite skycraper. Elevators vs. Staircases
  • 3/01 - Links on BERT and AI
  • 3/03 - Adam Gopnik New Yorker article on DALL-E
  • 3/05 - Walter Benjamin 1936 essay “The Work of Art in the Age of Mechanical Reproduction” (pdf link)
  • 3/06 - Werner Schweibenz 2018 essay “The Work of Art in the Age of Digital Reproduction” in Museum International
  • 3/06 William J. Mitchell The Reconfigured Eye: Visual Truth in the Post-Photographic Era 1992
  • 3/07 links to commentary on papers related to Episode 5. I wrote some notes and created specific webpages for each of the following papers:
  • 3/11 Exciting! See this Hacker News thread about running FB’s latest LLaMA locally on Apple Silicon
    • This is the github repo with relevant stuff by ggerganov
    • Super useful notes by Simon Willison who got the 7B version running on a M2 MacBook Pro with 64GB of RAM. From this comment in the same HN thread.
    • applied to allowed to download the 250GB collection of all 4 models.
  • 3/12 Success! dali etc
  • 3/15 From this HN comment in this submission about LLVM, I thought about buying Transformers for NLP: Build, train, and fine-tune deep neural network architectures with Python, PyTorch, TensorFlow, BERT and GPT-3, 2nd edition. Amazon link
    • important to get the second edition!
    • as of 3/15, kindle edition is $19.59. Paperback is $37.79. Both come with free PDF.
    • 2nd edition published March 25, 2022
  • Good guide to all the relevant papers over the last 9 years on transformers, LLMs etc. by Sebastian Raschka. Published February, 2023.

    Created this page for business notes on Generative AI (March 15, 2023)

  • Yann LeCun reposted this March 2, 2023 lecture by Professor Pascale Fung on ChatGPT: What it can and cannot do
    • Watched up to 3:30 where Prof Fung describes history and mapping Shannon’s model of communication to Speech Recognition and Machine Translation.
    • Source –> Transmitter/Encoder –> Channel/SpeechRecog/MachineTranslator –> Receiver/Decoder –> Destination/Output
  • 3/16 Read a bit about Noam Shazeer, co-author of the first Transformer paper, worked on Google’s LaMDAsystem with project leader Daniel De Freitas who is now Noam’s cofounder at Character.ai
  • 3/17 Runway cofounded by Cristobal Valenzuela has launched a video gen product named Gen-1 using Stable Diffusion .
  • 3/20 Created new page on Stanford paper “On the Opportunities and Risks of Foundation Models”.
  • 3/20 Found two useful articles from Lilian Weng’s blog:
  • 3/20 Found a bunch of interesting resources re: Foundation models
  • HN thread about an alpaca tuned llama-7b chatbot. llama-30b coming soon.
  • Rodney Brooks What Will Transformers Transform
    • ‘Generative Pre-trained Transformer’ models (GPT) are now the rage and have inspired Kissinger and Noam Chomsky. That sure is some hype level
    • References Wolfram’s excellent explainer
    • “By the way, even since the earliest days of AI, the 1955 proposal for the 1956 workshop on AI, the document in which the term AI first appears anywhere, the goal of the researchers was to produce general intelligence. That AGI is a different term than AI now is due to a bunch of researchers a dozen or so years ago deciding to launch a marketing campaign for themselves by using a new buzz acronym. ‘AGI’ is just ‘AI’ as it was known for the first 50+ years of its existence. Hype produced the term ‘AGI’ with which we are now saddled.”
    • Quotes unconfirmed reports that GPT-4 has 1 trillion parameters, but that has been specifically debunked by Sam Altman and others. (compared with GPT-3 with 175-billion parameters)
    • All successful systems need to have a person in the loop.
      • “This is true of language translation systems where a person is reading the output and, just as they do with children, the elderly, and foreigners, adapts quickly to the mistakes the person or system makes, and fill in around the edges to get the meaning, not the literal interpretation.”
      • “This is true of speech understanding systems where we talk to Alexa or Google Home, or our TV remote, or our car. We talk to each of them slightly differently, as we humans quickly learn how to adapt to their idiosyncracies and the forms they can understand and not understand.
      • “This is true of our search engines, where we have learned how to form good queries that will get us the information we actually want, the quickest.
      • “This is true of mobile robots in hospitals, taking the dirty sheets and dishes to be cleaned, or bringing up prescriptions from the hospital pharmacy, where there is a remote network operations center that some unseen user is waiting to take over control when the robot gets confused.”
    • Amara’s Law, overestimate the effect of the tech in the short run and underestimate it in the long run.
      • John McCarthy’s estimate that the computers of the 1960’s were powerful enough to support AGI
      • Minsky and Michie and Nilsson each believing that search algorithms were the key to intelligence,
      • Neural networks (volume 3, perceptrons) [[I wasn’t around for the first two volumes; McCulloch and Pitts in 1943, Minsky in 1953]],
      • First order logic, Resolution theorem proving, MacHack (chess 1), fuzzy logic, STRIPS,
      • Knowledge-based systems (and revolutionizing medicine),
      • Neural networks (volume 4, back propagation), the primal sketch, self driving cars (Dickmanns, 1987),
      • Reinforcement learning (rounds 2 and 3), SOAR,
      • Support vector machines, self driving cars (Kanade et al, 1997),
      • Deep Blue (chess 2), self driving cars (Thrun, 2007), Bayesian inference, Watson (Jeopardy, and revolutionizing medicine),
      • Neural networks (volume 5, deep learning), Alpha GO, reinforcement learning (round 4), generative images, and now large language models.
    • All have heralded the imminence of human level intelligence in machines. All were hyped up to the limit, but mostly in the days when very few people were even aware of AI, so very few people remember the levels of hype. I’m old. I do remember all these, but have probably forgotten quite a few…
      • “None of these things have lived up to that early hype.

27 March 2023

Created this page dedicated to Transformers papers and tutorials

Created this page dedicated to setting up Python environments

31 March 2023

  • vicuna 13B is an online competitor based on LLaMA 13B but with different training

3 April 2023

  • Interesting way of explaining uncanny confidence of LLMs by Devin Coldeway 4/03 in TechCrunch article “The Great Pretender”

10 April 2023

  • “On Efficient Training of Large-Scale Deep Learning Models: A Literature Review” by Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao. arXiv link and HN thread
  • Out of curiosity, I began probing ChatGPT, Bing, and Google on how they describe the difference between ‘ordinary training’, pre-training, and fine tuning. See archive of my chat histories at ChatGPT to see more.

11 April 2023

13 April 2023

14 April 2023

15 April 2023

  • Found another reference to this good intermediate level discussion how transformers work. The Illustrated Transformer by Jay Alammar. Originally noted on 3/01.
    • See also this previous post by Jay about Attention
  • Researchers at Allen Institute find a way to 6x the toxicity of ChatGPT. Allen Institute blog post and TC article
  • Quote from Betaworks CEO John Borthwick: “This is the biggest change in technology in my lifetime. We’ve been building, accelerating and investing in and around machine learning for the last decade, and in the last 12 months, everything’s changed — the launch of generative visual models like [OpenAI’s] DALL-E 2 last year, the open and affordable access to these models with the availability of stability and GPT. AI has the potential to affect every sector, and every part of how we live, work, play and even die.” Part of Betaworks AI camp announcement

16 April 2023

17 April 2023

  • Excellent compilation of various LLM resources by Sebastian Raschka. Blog post and associated HN thread with more resources
  • Review article on ChatGPT related papers by Zhang, Zhang, et al. From abstract: ‘According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.’
  • Together released RedPajama, a project to create ‘reproducible, fully-open, leading language model. RedPajama is a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.’.
    • three components: (1) Pre-training data, (2) Base models trained at scale with data from (1), and (3) Instruction tuning data and models for usability and safety
    • Released LLaMA training dataset of 1.2 trillion tokens

22 April 2023

24 April 2023

  • New paper increases context window from 32k tokens for GPT-4 to 1M tokens using the Recurring Memory Transformer architecture (aka RMT). HN thread Point from this [HN comment]:
    • ‘For LLMs there are at least three different ways of “learning”:
      1. Pre-training for text prediction, using unsupervised learning
      2. Fine-tuning e.g. to follow instructions or to reject certain queries, using supervised and/or reinforcement learning (optional)
      3. In-context learning using “few-shot prompts” as examples
    • ‘Now the last two can have similar effects. For example, you might fine-tune a foundation (only pre-trained) model to follow instructions, or you don’t and instead just modify your prompt such that it looks like a dialogue between a human and a helpful chat assistant. But neither can replace the extensive pre-training phase, which is what gives the model all its intelligence.’
    • ‘One other disanalogy between fine-tuning and in-context learning appears to be that the model can’t exactly remember the data it was fine-tuned with, while it “knows” exactly everything in its context window. That is its working memory, so to speak.’

27 April 2023

28 April 2023

30 April

01 May 2023

  • Mojo–a new programming language from Chris Lattner’s, new company Modular AI. HN thread
  • Project to run LLMs on mobile devices and diverse hardware. ‘Everything runs locally with no server support and accelerated with local GPUs on your phone and laptops. Supported platforms include: iphone/ipad, Metal GPUs and Intel/ARM MacBooks, AMD and NVIDIA GPUs via Vulkan on Windows and Linux, NVIDIA GPUs via CUDA, WebGPU on browsers through WebLLM.’

03 May 2023

04 May 2023

05 May 2023

08 May 2023

15 May 2023

18 May 2023

20 May 2023

22 May 2023

  • New paper by Meta about how fine-tuning is much less important than pre-training ‘LIMA: Less is More for Alignment’ – Minimal finetuning is still effective suggesting that bulk of work is done during pre-training. Uses 65-B version of LLaMA

23 May 2023

26 May 2023

27 May 2023

28 May 2023

06 June 2023

  • Need to investigate running LLMs locally with ggml.ai which works with llama.cpp whisper, etc. Optimized for Apple Silicon. HN thread

20 June 2023

  • SimpleAIchat on GitHub – a new and concise Python package for easily interacing with chat apps like GPT-4 with robust features and minimal code complexity. HN thread

27 June 2023

05 July 2023

11 July 2023

26 August 2023

  • Meta released two large models recently, one for code, one for translations in the past week.

20 September 2023

26 September 2023

09 October 2023

  • A lot of updates added to /5-slides, /71-gpt3, /72-gpt4, /74-llama2
  • Anthropic has a new paper that pushes forward capability of interpreting ANNs. Instead of trying to understand a single artificial neuron, this is able to find a better unit of analysis, called features. A feature by their definition corresponds to linear combinations of a neural activations.

13 October 2023

25 October 2023

13 November 2023

11 December 2023

2024 Gen AI

01 January 2024

  • On HN front page today, found a reference to this arXiv page 600-page book ‘Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory’. Published October, 2023 from a course given at ETH Zurich by academics Arnulf Jentzen, Beno Kuckuck, Philippe von Wurstemberger.
  • From the same HN thread, found this draft book in progress. Understanding Deep Learning by Simon J.D. Prince. Main page here.
    • Nice because the free PDF of this book section 1.6 has nice references as of Dec 2023 about good textbooks and other resources on a various approaches to DL, including math focused, coding-focused, computer vision optimized, reinforcement learning focused, etc.
    • Also dives into argmin and argmax pretty early. Handy wiki article on arg max functions.
  • From same thread, and also referenced in the Prince UDL book, began looking at Bishop (2006) Pattern Recognition and Machine Learning textbook. Per this HN comment, Bishop has a simpler notation than the 600-pg ETH book by Jentzen et al. Bishop’s notation is clear like Goodfellow (2016) but is a little mathematically deeper.
    • Most importantly, Bishop has a good self-contained introduction to the relevant probability theory.

07 January 2024

G

12 January 2024

  • Multiple resources on the latest in vector database in this HN thread

15 January 2024

  • Run-through of Hyung Won Chung’s slides from OpenAI. Originally sent by Seymour October, 2023.
  • JP Morgan AI Research published DocLLM a multi-modal LLM that can interpret invoices, receipts, purchase orders, and other visually laid out forms. Posted to FB group on 1/12/2024.
  • November 2023 survey of Multi-Modal LLMs at arXiv posted on 1/15/2024 at FB Group AI+DL.

17 January 2024

15 February 2024

20 February 2024

21 February 2024

24 February 2024

  • Quick summary overview of Mamba–which is a way for Transformer/Attention architectures to reduce the quadratic growth in computational demand as sequence length increases. Mamba: The Easy Way by Jack Cook and HN thread
  • Feb 2024 survey of LLMs by Shervin Minaee, T. Mikolov, N. Nikzad, …, X. Amatriain, Jianfeng Gao. arXiv link.

01 April 2024

  • After multi-year delay, Grant Sanderson posted a new YouTube video in his “What is a neural network?” series. Chapter 5: But what is GPT?

08 April 2024

  • Paper last week on how undifferentiated / unstructured large volume of AI Agents are all you need. Title: ‘More Agents Is All You Need’ – arXiv link and HN thread

18 November 2024

  • LoRA versus Full Fine Tuning: An Illusion of Equivalence. arXiv and HN thread