Generative AI

2023 Gen AI

2/09 - Got stable diffusion working on the Mac Studio and documented all the steps.
2/12 - Review Goodfellow, Bengio, and Courville on history of autoencoders
2/13 - more GBC on autoencoders, refers to GOFAI/symbolic approach as “knowledge base” approach. But I think KB was really an 80s subset of symbolic overall approach.
2/21 - read Stephen Wolfram and Murray Shanhan Feb 2023 articles on LLM
2/23 - more articles on history of BERT, transformers, MUM, etc.
- This 18 minute YouTube video by Mean Gene Hacks compares 3 LLMs that all have about 175 B parameters: OpenAI’s GPT-3, BigScience’s BLOOM, and Facebook’s OPT-175
2/24 - Today, FB released LLaMA (Large Langauge Model Meta AI) in 4 sizes 7 billion parameters, 13 billion parameters, 33B parameters, and 65B parameters. Yann LeCun claims that LLaMa-13B outperforms GPT-3 (even though the latter has 175B). And LLaMA-65B is competitive with the best models like Chinchilla70B and PaLM-540B.
2/26 - Mt. AI / volcanic island / infinite skycraper. Elevators vs. Staircases
3/01 - Links on BERT and AI
- From Weights and Biases. An Introduction to BERT and How to Use It by Mukilan Krishnakumar. Good simple diagrams here.
- Transformers Explained, Understanding the Model Behind GPT-3, BERT, and T5 from May 2021. Gets quite detailed with the math etc.
- Towards Data Science article. A little simple but still useful intro from July 2022. Evolution of Large Language Models–BERT, GPT3, MUM and PaML. Unfortunately uses up a member-only medium article. Here is the Box link.
- Good diagrams at intermediate level going pretty in-depth into how transformers work. The Illustrated Transformer by Jay Alammar.
3/03 - Adam Gopnik New Yorker article on DALL-E
3/05 - Walter Benjamin 1936 essay “The Work of Art in the Age of Mechanical Reproduction” (pdf link)
3/06 - Werner Schweibenz 2018 essay “The Work of Art in the Age of Digital Reproduction” in Museum International
3/06 William J. Mitchell The Reconfigured Eye: Visual Truth in the Post-Photographic Era 1992
3/07 links to commentary on papers related to Episode 5. I wrote some notes and created specific webpages for each of the following papers:
- Adam Gopnik 2023 article
- Walter Benjamin 1936
- Werner Schweibenz 2018
3/11 Exciting! See this Hacker News thread about running FB’s latest LLaMA locally on Apple Silicon
- This is the github repo with relevant stuff by ggerganov
- Super useful notes by Simon Willison who got the 7B version running on a M2 MacBook Pro with 64GB of RAM. From this comment in the same HN thread.
- applied to allowed to download the 250GB collection of all 4 models.
3/12 Success! dali etc
3/15 From this HN comment in this submission about LLVM, I thought about buying Transformers for NLP: Build, train, and fine-tune deep neural network architectures with Python, PyTorch, TensorFlow, BERT and GPT-3, 2nd edition. Amazon link
- important to get the second edition!
- as of 3/15, kindle edition is $19.59. Paperback is $37.79. Both come with free PDF.
- 2nd edition published March 25, 2022
Good guide to all the relevant papers over the last 9 years on transformers, LLMs etc. by Sebastian Raschka. Published February, 2023.
Created this page for business notes on Generative AI (March 15, 2023)
Yann LeCun reposted this March 2, 2023 lecture by Professor Pascale Fung on ChatGPT: What it can and cannot do
- Watched up to 3:30 where Prof Fung describes history and mapping Shannon’s model of communication to Speech Recognition and Machine Translation.
- Source –> Transmitter/Encoder –> Channel/SpeechRecog/MachineTranslator –> Receiver/Decoder –> Destination/Output
3/16 Read a bit about Noam Shazeer, co-author of the first Transformer paper, worked on Google’s LaMDAsystem with project leader Daniel De Freitas who is now Noam’s cofounder at Character.ai
3/17 Runway cofounded by Cristobal Valenzuela has launched a video gen product named Gen-1 using Stable Diffusion .
- see also this Decoder article and this MIT Tech Review piece
3/20 Created new page on Stanford paper “On the Opportunities and Risks of Foundation Models”.
3/20 Found two useful articles from Lilian Weng’s blog:
- Prompt Engineering
- Updated version of Transformer Family v2
3/20 Found a bunch of interesting resources re: Foundation models
- 2023 MAD landscape posted by Matt Turck. web version here. 404’s occasionally. Scroll to bottom right of blue ML+AI section to see box on Closed Source Models”
- Snorkel.ai intro guide dated March 1, 2023.
- Alan Thompson’s Life Architect post Inside language models which is updated recently enough to include GPT-4 and to note that LLaMA has been leaked.
HN thread about an alpaca tuned llama-7b chatbot. llama-30b coming soon.
Rodney Brooks What Will Transformers Transform
- ‘Generative Pre-trained Transformer’ models (GPT) are now the rage and have inspired Kissinger and Noam Chomsky. That sure is some hype level
- References Wolfram’s excellent explainer
- “By the way, even since the earliest days of AI, the 1955 proposal for the 1956 workshop on AI, the document in which the term AI first appears anywhere, the goal of the researchers was to produce general intelligence. That AGI is a different term than AI now is due to a bunch of researchers a dozen or so years ago deciding to launch a marketing campaign for themselves by using a new buzz acronym. ‘AGI’ is just ‘AI’ as it was known for the first 50+ years of its existence. Hype produced the term ‘AGI’ with which we are now saddled.”
- Quotes unconfirmed reports that GPT-4 has 1 trillion parameters, but that has been specifically debunked by Sam Altman and others. (compared with GPT-3 with 175-billion parameters)
- All successful systems need to have a person in the loop.
  - “This is true of language translation systems where a person is reading the output and, just as they do with children, the elderly, and foreigners, adapts quickly to the mistakes the person or system makes, and fill in around the edges to get the meaning, not the literal interpretation.”
  - “This is true of speech understanding systems where we talk to Alexa or Google Home, or our TV remote, or our car. We talk to each of them slightly differently, as we humans quickly learn how to adapt to their idiosyncracies and the forms they can understand and not understand.
  - “This is true of our search engines, where we have learned how to form good queries that will get us the information we actually want, the quickest.
  - “This is true of mobile robots in hospitals, taking the dirty sheets and dishes to be cleaned, or bringing up prescriptions from the hospital pharmacy, where there is a remote network operations center that some unseen user is waiting to take over control when the robot gets confused.”
- Amara’s Law, overestimate the effect of the tech in the short run and underestimate it in the long run.
  - John McCarthy’s estimate that the computers of the 1960’s were powerful enough to support AGI
  - Minsky and Michie and Nilsson each believing that search algorithms were the key to intelligence,
  - Neural networks (volume 3, perceptrons) [[I wasn’t around for the first two volumes; McCulloch and Pitts in 1943, Minsky in 1953]],
  - First order logic, Resolution theorem proving, MacHack (chess 1), fuzzy logic, STRIPS,
  - Knowledge-based systems (and revolutionizing medicine),
  - Neural networks (volume 4, back propagation), the primal sketch, self driving cars (Dickmanns, 1987),
  - Reinforcement learning (rounds 2 and 3), SOAR,
  - Support vector machines, self driving cars (Kanade et al, 1997),
  - Deep Blue (chess 2), self driving cars (Thrun, 2007), Bayesian inference, Watson (Jeopardy, and revolutionizing medicine),
  - Neural networks (volume 5, deep learning), Alpha GO, reinforcement learning (round 4), generative images, and now large language models.
- All have heralded the imminence of human level intelligence in machines. All were hyped up to the limit, but mostly in the days when very few people were even aware of AI, so very few people remember the levels of hype. I’m old. I do remember all these, but have probably forgotten quite a few…
  - “None of these things have lived up to that early hype.

27 March 2023

Downloaded recent Yann LeCun slides and summer 2022 paper summarizing his proposal to approach more human/animal like learning/intelligence for machines
Articles by Erich Grunewald from this HN thread:
- Prospect for an AI Winter
- Against LLM Reductionism

Created this page dedicated to Transformers papers and tutorials

Created this page dedicated to setting up Python environments

31 March 2023

vicuna 13B is an online competitor based on LLaMA 13B but with different training

3 April 2023

Interesting way of explaining uncanny confidence of LLMs by Devin Coldeway 4/03 in TechCrunch article “The Great Pretender”

10 April 2023

“On Efficient Training of Large-Scale Deep Learning Models: A Literature Review” by Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao. arXiv link and HN thread
Out of curiosity, I began probing ChatGPT, Bing, and Google on how they describe the difference between ‘ordinary training’, pre-training, and fine tuning. See archive of my chat histories at ChatGPT to see more.
- see also this stackexchange answer which draws on this ML glossary – look at answers for pre-training and fine-tuning.

11 April 2023

Listing of decentralized and open-source LLMs from this HN comment
Found on HN thread, this open-source orchestrator and performance evaluator of multiple LLMs PhaseLLM and related collaborative data community

13 April 2023

From Seymour in Slack, George Ho’s survey from a few years ago (May 2020) of using Transformers in NLP
from HN, productizing mature LLM eng pipelines

14 April 2023

From 3/28, Cerebras-GPT is a family of open, compute-efficient, LLMs, ranging from 0.11B to 13B parameters. Trained using Chinchilla formula
A nice concise, mathematically formal description of Transformers by Mary Phuong and Marcus Hutter at DeepMind, published 7/19/2022: Formal Algorithms for Transformers. from this HN comment which also had this HN comment about Hopf algebras, including Adam Nemecek’s 2/03/2023 paper Coinductive guide to inductive transformer heads and the older 2012 paper Hopf algebras and Markov chains: Two examples and a theory and a recommendation to check out this website about Geometric Algebra aka Clifford Algebra. Recommend watching this 44-minut intro YouTube video
Dan Fu, Michael Poli, Chris Re 3/28/2023 “From Deep to Long Learning?”
From this HN thread, found this article by Eugene Yan at Amazon “Experimenting with LLMs to Research, Reflect, and Plan”

15 April 2023

Found another reference to this good intermediate level discussion how transformers work. The Illustrated Transformer by Jay Alammar. Originally noted on 3/01.
- See also this previous post by Jay about Attention
Researchers at Allen Institute find a way to 6x the toxicity of ChatGPT. Allen Institute blog post and TC article
Quote from Betaworks CEO John Borthwick: “This is the biggest change in technology in my lifetime. We’ve been building, accelerating and investing in and around machine learning for the last decade, and in the last 12 months, everything’s changed — the launch of generative visual models like [OpenAI’s] DALL-E 2 last year, the open and affordable access to these models with the availability of stability and GPT. AI has the potential to affect every sector, and every part of how we live, work, play and even die.” Part of Betaworks AI camp announcement

16 April 2023

Batch computing and AI and associated HN thread
New ‘Consistency Models’ are an upgrade over previous diffusion models for image generation and related visual tasks. Paper and TC article

17 April 2023

Excellent compilation of various LLM resources by Sebastian Raschka. Blog post and associated HN thread with more resources
Review article on ChatGPT related papers by Zhang, Zhang, et al. From abstract: ‘According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.’
Together released RedPajama, a project to create ‘reproducible, fully-open, leading language model. RedPajama is a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.’.
- three components: (1) Pre-training data, (2) Base models trained at scale with data from (1), and (3) Instruction tuning data and models for usability and safety
- Released LLaMA training dataset of 1.2 trillion tokens

22 April 2023

New article by Sebastian Raschka with a comprehensive introduction to ‘Finetuning LLMs’. Associated HN thread
- Review again Jay’s 2020 Illustrated Transformer and his preceding article on Attention

24 April 2023

New paper increases context window from 32k tokens for GPT-4 to 1M tokens using the Recurring Memory Transformer architecture (aka RMT). HN thread Point from this [HN comment]:
- ‘For LLMs there are at least three different ways of “learning”:
  1. Pre-training for text prediction, using unsupervised learning
  2. Fine-tuning e.g. to follow instructions or to reject certain queries, using supervised and/or reinforcement learning (optional)
  3. In-context learning using “few-shot prompts” as examples
- ‘Now the last two can have similar effects. For example, you might fine-tune a foundation (only pre-trained) model to follow instructions, or you don’t and instead just modify your prompt such that it looks like a dialogue between a human and a helpful chat assistant. But neither can replace the extensive pre-training phase, which is what gives the model all its intelligence.’
- ‘One other disanalogy between fine-tuning and in-context learning appears to be that the model can’t exactly remember the data it was fine-tuned with, while it “knows” exactly everything in its context window. That is its working memory, so to speak.’

27 April 2023

e2eml school’s Transformers from scratch nice diagrams on seq2seq etc
Harvard SEAS Jupyter notebook annotated version of original Transformer / attention-is-all-you-need paper
arXiv paper from 3/22/2023 ‘Sparks of AGI: Early experiments from GPT-4
A Cookbook of Self-Supervised Learning paper and HN thread

28 April 2023

Shared by Yann LeCun on FB – ‘A Practical Guide for LLMs’ with a history of taxonomic relationships between the various LLMs. Comments on FB from Yann:
- ‘A survey of LLMs with a practical guide and evolutionary tree.
- ‘Number of LLMs from Meta = 7
- ‘Number of open source LLMs from Meta = 7
- ‘The architecture nomenclature for LLMs is somewhat confusing and unfortunate.
- ‘What’s called “encoder only” actually has an encoder and a decoder (just not an auto-regressive decoder).
- ‘What’s called “encoder-decoder” really means “encoder with auto-regressive decoder”
- ‘What’s called “decoder only” really means “auto-regressive encoder-decoder”’
GPT4Free - allows various projects without an OpenAI API key. Very empty HN thread and project home
Arxiv paper from Feb, 2023 ‘Hyena Hierarchy: Towards Larger Convolutional Language Models’ and fragmentary HN comment
‘Choose Your Weapon: Survival Strategies for Depressed AI Academics’ from March, 2023

30 April

A Brief History of LLaMA and HN thread
- this comment indicates that this GH repo runs unquantized 7b and 13b models on an M2 GPU which means it’s a little slower but much much more energy efficient.

01 May 2023

Mojo–a new programming language from Chris Lattner’s, new company Modular AI. HN thread
Project to run LLMs on mobile devices and diverse hardware. ‘Everything runs locally with no server support and accelerated with local GPUs on your phone and laptops. Supported platforms include: iphone/ipad, Metal GPUs and Intel/ARM MacBooks, AMD and NVIDIA GPUs via Vulkan on Windows and Linux, NVIDIA GPUs via CUDA, WebGPU on browsers through WebLLM.’
- HN thread

03 May 2023

Recommended to read this nice article explaning RLHF – ‘Illustrating Reinforcement Learning from Human Feedback (RLHF)’

04 May 2023

Two good intro articles by Assembly AI:
- The Full Story of LLMs and RLHF and associated HN thread
- Intro to Generative AI

05 May 2023

Introductory 2021 blog post by Pinecone ‘What is a Vector Database?’ and HN thread
Deepgram blogpost ‘Augmenting LLMs Beyond Basic Text Completion and Transformation’ and HN thread

08 May 2023

Lastest on which consumer hardware (including Apple Silicon) is effective for home training in this comment thread as part of this HN post about RasaGPT which simplifies integration of multiple NLP, NLU, machine translation, etc using tools like Rasa, FastAPI, LangChain etc.
Explanation of current landscape of diffusion models and why Reflected Diffusion Models might be the next step plus HN thread

15 May 2023

StarCoder and StarCoderBase Foundation model StarCoderBase and finetuned tool called StarCoder. HN thread
Ash Vardanian implemented a 1000-line C++ vector database as announced in this short HN thread. Main blogpost here and main HN thread

18 May 2023

ACM article on Cargo Cult AI and HN thread

20 May 2023

Dialogue between Don Knuth and Stephen Wolfram regarding Knuth’s test questions to ChatGPT. HN thread
Rodney Brooks says to ‘calm down about GPT-4’ in IEEE and HN thread
Rich Sutton of Reinforcement Learning textbook fame and his The Bitter Lesson reappeared on HN

22 May 2023

New paper by Meta about how fine-tuning is much less important than pre-training ‘LIMA: Less is More for Alignment’ – Minimal finetuning is still effective suggesting that bulk of work is done during pre-training. Uses 65-B version of LLaMA

23 May 2023

RWKV: Reinventing RNNs in the Transformers Era New paper thaT tries to rebuild RNNs to get the benefits of Transformer attention while scaling more efficiently. PDF
Meta launches mutltilingual model from the prior SOTA 100 languages to 1100+ languages
Sebastian Raschka has another post up about ‘Why the original transformer drawing is wrong and some other historical tidbits about LLMs’. HN thread

26 May 2023

HN thread about building CLI tools to work with ChatGPT and LLMs by Simon Willison
Stanford paper on AlpacaFarm which has been RHLFed to ‘beat ChatGPT-3.5’. HN thread
Paper written by several AI/ML PhD students about how to do NLP research in the age of LLMs and HN thread

27 May 2023

New paper ‘QLoRA: Efficient Finetuning of Quantized LLMs’ and HN thread
- Follow up August article by Databricks ‘Efficient Fine-Tuning with LoRA: A Guide to Optimal Parameter Selection for Large Language Models’

28 May 2023

Good PDF ‘The Little Book of Deep Learning’ and related HN thread
FB publishes paper ‘MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers’ with effusive commentary and HN thread
5/27/2023 casual listing of AI chips from this HN thread ‘Ask HN: What is an AI chip and how does it work?
a16z collects together various deep learning resources in The AI Canon and HN thread
Xorvoid criticism of LLMs called ‘ChatGPT: A Mental Model’ and critical back and forth at HN

06 June 2023

Need to investigate running LLMs locally with ggml.ai which works with llama.cpp whisper, etc. Optimized for Apple Silicon. HN thread

20 June 2023

SimpleAIchat on GitHub – a new and concise Python package for easily interacing with chat apps like GPT-4 with robust features and minimal code complexity. HN thread

27 June 2023

Lilian Weng has another detailed overview post up on her excellent blog about ‘Autonomous Agents’. HN thread

05 July 2023

From February, 2023, ‘A human just defeated an AI in Go. Here’s why that matters’, and HN thread
How metaphors shape our understanding of computing and AI ‘https://windowsontheory.org/2023/06/28/metaphors-for-ai-and-why-i-dont-like-them/’
July 2023 Bret Victor / worrydream update HN thread. Older Vimeo 2014-era videos Victor’s CV: Short 15-minute Seeing Spaces and longer 55-minute Humane Representation of Thought

11 July 2023

New personal guide to LangChain and associated HN thread

26 August 2023

Meta released two large models recently, one for code, one for translations in the past week.

20 September 2023

Some Llama 2 links:
- 7/27 article by Lucas Pauker Some explanations on how to fine-tune a model and performance of pre-training vs fine-tuning.
  - Comment by [Jeremy Howard}(https://news.ycombinator.com/item?id=36900969) and progress of what GPT and BERT did in terms of simplifying fine-tuning. From this HN thread
- 8/01 article by Ollama about how an uncensored locally running version of Llama-2 compares. HN thread with relevant top comment
Interesting Simon Willison blog about using Claude to summarize Hacker News
Ongoing comparison of best GPU CLouds for Nvidia A100 (2020 vintage) vs. H100 (2022). Blog post and HN thread

26 September 2023

Intro Oracle article on RAG from 9/19/2023.
LangSmith videos from Google search on 8/28 https://www.youtube.com/watch?v=tFXm5ijih98 https://www.youtube.com/watch?v=bE9sf9vGsrM https://www.youtube.com/watch?v=odxlHNLWAk4 https://www.youtube.com/watch?v=Weod3-ZPaPM https://www.youtube.com/watch?v=ll-Xit_Khq0
Vector database available in GCP?
- Paid search result
- GCP integrates Vector Search in Managed Databases
Above links/notes copied to 2-post in 2-astro

09 October 2023

A lot of updates added to /5-slides, /71-gpt3, /72-gpt4, /74-llama2
Anthropic has a new paper that pushes forward capability of interpreting ANNs. Instead of trying to understand a single artificial neuron, this is able to find a better unit of analysis, called features. A feature by their definition corresponds to linear combinations of a neural activations.

13 October 2023

Updated version of the paper ‘Textbooks are all you need’ - October, 2023 that includes the 1.5 version of the phi model from Microsoft Research. Phi-1.5 is an LLM that could make training AI radically cheaper and faster.
- See also this KD Nuggets article about the same Phi-1.5 model. Only has 1.3B parameters but performs well!
September 2023 paper showing how effectively foundation models can scale to long contexts (aka 32k tokens). ‘Effective Long-Context Scaling of Foundation Models.’
Good article by Jeremy Howard and author Sylvain Gugger on the history of the AdamW algorithm as a competitor to SGD.
New article from Lightning.ai on ‘Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments’.

25 October 2023

New Jina AI release launches open-source 8k text embedding and HN thread. Also Simon Willison has already built a Jina plugin for this and explains embeddings in general in this blog post

13 November 2023

Maybe CNNs are not quite obsolete. Places where convolutional neural nets perform as well as transformers. ‘The convolution empire strikes back’
Paper: ‘Evaluating Large Language Models: A Comprehensive Survey’. arXiv link. PDF.
Important summary article on AI and Open Source in 2023 by Sebastian Raschka.
Rust and Web Assembly language for faster lighterweight LLMs? article and HN thread

11 December 2023

Funding announcement of Liquid AI, MIT startup focused on LTC aka Liquid Time Constant ANNs from research in late 2020/early 2021. Follow-up with additional resources in this LTC-SE paper applied to embedded systems in April 2023.

2024 Gen AI

01 January 2024

On HN front page today, found a reference to this arXiv page 600-page book ‘Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory’. Published October, 2023 from a course given at ETH Zurich by academics Arnulf Jentzen, Beno Kuckuck, Philippe von Wurstemberger.
From the same HN thread, found this draft book in progress. Understanding Deep Learning by Simon J.D. Prince. Main page here.
- Nice because the free PDF of this book section 1.6 has nice references as of Dec 2023 about good textbooks and other resources on a various approaches to DL, including math focused, coding-focused, computer vision optimized, reinforcement learning focused, etc.
- Also dives into argmin and argmax pretty early. Handy wiki article on arg max functions.
From same thread, and also referenced in the Prince UDL book, began looking at Bishop (2006) Pattern Recognition and Machine Learning textbook. Per this HN comment, Bishop has a simpler notation than the 600-pg ETH book by Jentzen et al. Bishop’s notation is clear like Goodfellow (2016) but is a little mathematically deeper.
- Most importantly, Bishop has a good self-contained introduction to the relevant probability theory.

07 January 2024

Good overview and history of various RAG variants for Retrieval-Augmented Generation for LLMs.
Re-found and re-read piece by Sebastian Mellen on ‘Summarization is the Killer Use Case for LLMs’ with HN comment by Simon Willson and general HN thread.

12 January 2024

Multiple resources on the latest in vector database in this HN thread

15 January 2024

Run-through of Hyung Won Chung’s slides from OpenAI. Originally sent by Seymour October, 2023.
JP Morgan AI Research published DocLLM a multi-modal LLM that can interpret invoices, receipts, purchase orders, and other visually laid out forms. Posted to FB group on 1/12/2024.
November 2023 survey of Multi-Modal LLMs at arXiv posted on 1/15/2024 at FB Group AI+DL.

17 January 2024

HN comment about how it’s more than just focusing on chat interaction that one has to be thoughtful about extracting and searching to get relevant context. “you can’t just throw chunks into a vector DB”. Part of this HN thread responding to this article on ‘RAG using data & role of knowledge graphs’.

15 February 2024

New OpenAI text-to-video model called Sora
- Blog post
- Research paper
- #1 item on HN homepage, Has 1200 comments posted in the last 6 hours alone.

20 February 2024

Are we at peak vector database? and HN thread.
Sebastian Raschka article on Coding LoRA from Scratch and HN thread
Another Raschka article Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch from 2/18/2024.
Interesting 2016 blog post examining why debugging, not math, is the reason Why machine learning is hard and HN thread
Training a small (~10m parameter) transformer following Andrej Karpathy’s tutorial. Beyond Self-Attention: How a Small Language Model Predicts the Next Token and HN thread

21 February 2024

New Andrej Karpathy YouTube video focused on tokenizing. After he left OpenAI a few weeks ago. HN thread

24 February 2024

Quick summary overview of Mamba–which is a way for Transformer/Attention architectures to reduce the quadratic growth in computational demand as sequence length increases. Mamba: The Easy Way by Jack Cook and HN thread
- Mamba created by Albert Gu and Tri Dao. Sasha Rush aka Alexander Rush wrote Mamba: The Hard Way.
Feb 2024 survey of LLMs by Shervin Minaee, T. Mikolov, N. Nikzad, …, X. Amatriain, Jianfeng Gao. arXiv link.

01 April 2024

After multi-year delay, Grant Sanderson posted a new YouTube video in his “What is a neural network?” series. Chapter 5: But what is GPT?

08 April 2024

Paper last week on how undifferentiated / unstructured large volume of AI Agents are all you need. Title: ‘More Agents Is All You Need’ – arXiv link and HN thread

2023 Gen AI

Created this page for business notes on Generative AI (March 15, 2023)

27 March 2023

Created this page dedicated to Transformers papers and tutorials

Created this page dedicated to setting up Python environments

31 March 2023

3 April 2023

10 April 2023

11 April 2023

13 April 2023

14 April 2023

15 April 2023

16 April 2023

17 April 2023

22 April 2023

24 April 2023

27 April 2023

28 April 2023

30 April

01 May 2023

03 May 2023

04 May 2023

05 May 2023

08 May 2023

15 May 2023

18 May 2023

20 May 2023

22 May 2023

23 May 2023

26 May 2023

27 May 2023

28 May 2023

06 June 2023

20 June 2023

27 June 2023

05 July 2023

11 July 2023

26 August 2023

20 September 2023

26 September 2023

09 October 2023

13 October 2023

25 October 2023

13 November 2023

11 December 2023

2024 Gen AI

01 January 2024

07 January 2024

12 January 2024

15 January 2024

17 January 2024

15 February 2024

20 February 2024

21 February 2024

24 February 2024

01 April 2024

08 April 2024