2023 Gen AI
- 2/09 - Got stable diffusion working on the Mac Studio and documented all the steps.
- 2/12 - Review Goodfellow, Bengio, and Courville on history of autoencoders
- 2/13 - more GBC on autoencoders, refers to GOFAI/symbolic approach as “knowledge base” approach. But I think KB was really an 80s subset of symbolic overall approach.
- 2/21 - read Stephen Wolfram and Murray Shanhan Feb 2023 articles on LLM
- 2/23 - more articles on history of BERT, transformers, MUM, etc.
- This 18 minute YouTube video by Mean Gene Hacks compares 3 LLMs that all have about 175 B parameters: OpenAI’s GPT-3, BigScience’s BLOOM, and Facebook’s OPT-175
- 2/24 - Today, FB released LLaMA (Large Langauge Model Meta AI) in 4 sizes 7 billion parameters, 13 billion parameters, 33B parameters, and 65B parameters. Yann LeCun claims that LLaMa-13B outperforms GPT-3 (even though the latter has 175B). And LLaMA-65B is competitive with the best models like Chinchilla70B and PaLM-540B.
- 2/26 - Mt. AI / volcanic island / infinite skycraper. Elevators vs. Staircases
- 3/01 - Links on BERT and AI
- From Weights and Biases. An Introduction to BERT and How to Use It by Mukilan Krishnakumar. Good simple diagrams here.
- Transformers Explained, Understanding the Model Behind GPT-3, BERT, and T5 from May 2021. Gets quite detailed with the math etc.
- Towards Data Science article. A little simple but still useful intro from July 2022. Evolution of Large Language Models–BERT, GPT3, MUM and PaML. Unfortunately uses up a member-only medium article. Here is the Box link.
- Good diagrams at intermediate level going pretty in-depth into how transformers work. The Illustrated Transformer by Jay Alammar.
- 3/03 - Adam Gopnik New Yorker article on DALL-E
- 3/05 - Walter Benjamin 1936 essay “The Work of Art in the Age of Mechanical Reproduction” (pdf link)
- 3/06 - Werner Schweibenz 2018 essay “The Work of Art in the Age of Digital Reproduction” in Museum International
- 3/06 William J. Mitchell The Reconfigured Eye: Visual Truth in the Post-Photographic Era 1992
- 3/07 links to commentary on papers related to Episode 5. I wrote some notes and created specific webpages for each of the following papers:
- 3/11 Exciting! See this Hacker News thread about running FB’s latest LLaMA locally on Apple Silicon
- This is the github repo with relevant stuff by ggerganov
- Super useful notes by Simon Willison who got the 7B version running on a M2 MacBook Pro with 64GB of RAM. From this comment in the same HN thread.
- applied to allowed to download the 250GB collection of all 4 models.
- 3/12 Success! dali etc
- 3/15 From this HN comment in this submission about LLVM, I thought about buying Transformers for NLP: Build, train, and fine-tune deep neural network architectures with Python, PyTorch, TensorFlow, BERT and GPT-3, 2nd edition. Amazon link
- important to get the second edition!
- as of 3/15, kindle edition is $19.59. Paperback is $37.79. Both come with free PDF.
- 2nd edition published March 25, 2022
- Good guide to all the relevant papers over the last 9 years on transformers, LLMs etc. by Sebastian Raschka. Published February, 2023.
Created this page for business notes on Generative AI (March 15, 2023)
- Yann LeCun reposted this March 2, 2023 lecture by Professor Pascale Fung on ChatGPT: What it can and cannot do
- Watched up to 3:30 where Prof Fung describes history and mapping Shannon’s model of communication to Speech Recognition and Machine Translation.
- Source –> Transmitter/Encoder –> Channel/SpeechRecog/MachineTranslator –> Receiver/Decoder –> Destination/Output
- 3/16 Read a bit about Noam Shazeer, co-author of the first Transformer paper, worked on Google’s LaMDAsystem with project leader Daniel De Freitas who is now Noam’s cofounder at Character.ai
- 3/17 Runway cofounded by Cristobal Valenzuela has launched a video gen product named Gen-1 using Stable Diffusion .
- see also this Decoder article and this MIT Tech Review piece
- 3/20 Created new page on Stanford paper “On the Opportunities and Risks of Foundation Models”.
- 3/20 Found two useful articles from Lilian Weng’s blog:
- 3/20 Found a bunch of interesting resources re: Foundation models
- 2023 MAD landscape posted by Matt Turck. web version here. 404’s occasionally. Scroll to bottom right of blue ML+AI section to see box on Closed Source Models”
- Snorkel.ai intro guide dated March 1, 2023.
- Alan Thompson’s Life Architect post Inside language models which is updated recently enough to include GPT-4 and to note that LLaMA has been leaked.
- HN thread about an alpaca tuned llama-7b chatbot. llama-30b coming soon.
- Rodney Brooks What Will Transformers Transform
- ‘Generative Pre-trained Transformer’ models (GPT) are now the rage and have inspired Kissinger and Noam Chomsky. That sure is some hype level
- References Wolfram’s excellent explainer
- “By the way, even since the earliest days of AI, the 1955 proposal for the 1956 workshop on AI, the document in which the term AI first appears anywhere, the goal of the researchers was to produce general intelligence. That AGI is a different term than AI now is due to a bunch of researchers a dozen or so years ago deciding to launch a marketing campaign for themselves by using a new buzz acronym. ‘AGI’ is just ‘AI’ as it was known for the first 50+ years of its existence. Hype produced the term ‘AGI’ with which we are now saddled.”
- Quotes unconfirmed reports that GPT-4 has 1 trillion parameters, but that has been specifically debunked by Sam Altman and others. (compared with GPT-3 with 175-billion parameters)
- All successful systems need to have a person in the loop.
- “This is true of language translation systems where a person is reading the output and, just as they do with children, the elderly, and foreigners, adapts quickly to the mistakes the person or system makes, and fill in around the edges to get the meaning, not the literal interpretation.”
- “This is true of speech understanding systems where we talk to Alexa or Google Home, or our TV remote, or our car. We talk to each of them slightly differently, as we humans quickly learn how to adapt to their idiosyncracies and the forms they can understand and not understand.
- “This is true of our search engines, where we have learned how to form good queries that will get us the information we actually want, the quickest.
- “This is true of mobile robots in hospitals, taking the dirty sheets and dishes to be cleaned, or bringing up prescriptions from the hospital pharmacy, where there is a remote network operations center that some unseen user is waiting to take over control when the robot gets confused.”
- Amara’s Law, overestimate the effect of the tech in the short run and underestimate it in the long run.
- John McCarthy’s estimate that the computers of the 1960’s were powerful enough to support AGI
- Minsky and Michie and Nilsson each believing that search algorithms were the key to intelligence,
- Neural networks (volume 3, perceptrons) [[I wasn’t around for the first two volumes; McCulloch and Pitts in 1943, Minsky in 1953]],
- First order logic, Resolution theorem proving, MacHack (chess 1), fuzzy logic, STRIPS,
- Knowledge-based systems (and revolutionizing medicine),
- Neural networks (volume 4, back propagation), the primal sketch, self driving cars (Dickmanns, 1987),
- Reinforcement learning (rounds 2 and 3), SOAR,
- Support vector machines, self driving cars (Kanade et al, 1997),
- Deep Blue (chess 2), self driving cars (Thrun, 2007), Bayesian inference, Watson (Jeopardy, and revolutionizing medicine),
- Neural networks (volume 5, deep learning), Alpha GO, reinforcement learning (round 4), generative images, and now large language models.
- All have heralded the imminence of human level intelligence in machines. All were hyped up to the limit, but mostly in the days when very few people were even aware of AI, so very few people remember the levels of hype. I’m old. I do remember all these, but have probably forgotten quite a few…
- “None of these things have lived up to that early hype.
27 March 2023
-
Downloaded recent Yann LeCun slides and summer 2022 paper summarizing his proposal to approach more human/animal like learning/intelligence for machines
-
Articles by Erich Grunewald from this HN thread:
Created this page dedicated to Transformers papers and tutorials
Created this page dedicated to setting up Python environments
31 March 2023
- vicuna 13B is an online competitor based on LLaMA 13B but with different training
3 April 2023
- Interesting way of explaining uncanny confidence of LLMs by Devin Coldeway 4/03 in TechCrunch article “The Great Pretender”
10 April 2023
- “On Efficient Training of Large-Scale Deep Learning Models: A Literature Review” by Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao. arXiv link and HN thread
- Out of curiosity, I began probing ChatGPT, Bing, and Google on how they describe the difference between ‘ordinary training’, pre-training, and fine tuning. See archive of my chat histories at ChatGPT to see more.
- see also this stackexchange answer which draws on this ML glossary – look at answers for pre-training and fine-tuning.
11 April 2023
- Listing of decentralized and open-source LLMs from this HN comment
- Found on HN thread, this open-source orchestrator and performance evaluator of multiple LLMs PhaseLLM and related collaborative data community
13 April 2023
- From Seymour in Slack, George Ho’s survey from a few years ago (May 2020) of using Transformers in NLP
- from HN, productizing mature LLM eng pipelines
14 April 2023
- From 3/28, Cerebras-GPT is a family of open, compute-efficient, LLMs, ranging from 0.11B to 13B parameters. Trained using Chinchilla formula
- A nice concise, mathematically formal description of Transformers by Mary Phuong and Marcus Hutter at DeepMind, published 7/19/2022: Formal Algorithms for Transformers. from this HN comment which also had this HN comment about Hopf algebras, including Adam Nemecek’s 2/03/2023 paper Coinductive guide to inductive transformer heads and the older 2012 paper Hopf algebras and Markov chains: Two examples and a theory and a recommendation to check out this website about Geometric Algebra aka Clifford Algebra. Recommend watching this 44-minut intro YouTube video
- Dan Fu, Michael Poli, Chris Re 3/28/2023 “From Deep to Long Learning?”
- From this HN thread, found this article by Eugene Yan at Amazon “Experimenting with LLMs to Research, Reflect, and Plan”
15 April 2023
- Found another reference to this good intermediate level discussion how transformers work. The Illustrated Transformer by Jay Alammar. Originally noted on 3/01.
- See also this previous post by Jay about Attention
- Researchers at Allen Institute find a way to 6x the toxicity of ChatGPT. Allen Institute blog post and TC article
- Quote from Betaworks CEO John Borthwick: “This is the biggest change in technology in my lifetime. We’ve been building, accelerating and investing in and around machine learning for the last decade, and in the last 12 months, everything’s changed — the launch of generative visual models like [OpenAI’s] DALL-E 2 last year, the open and affordable access to these models with the availability of stability and GPT. AI has the potential to affect every sector, and every part of how we live, work, play and even die.” Part of Betaworks AI camp announcement
16 April 2023
- Batch computing and AI and associated HN thread
- New ‘Consistency Models’ are an upgrade over previous diffusion models for image generation and related visual tasks. Paper and TC article
17 April 2023
- Excellent compilation of various LLM resources by Sebastian Raschka. Blog post and associated HN thread with more resources
- Review article on ChatGPT related papers by Zhang, Zhang, et al. From abstract: ‘According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.’
- Together released RedPajama, a project to create ‘reproducible, fully-open, leading language model. RedPajama is a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.’.
- three components: (1) Pre-training data, (2) Base models trained at scale with data from (1), and (3) Instruction tuning data and models for usability and safety
- Released LLaMA training dataset of 1.2 trillion tokens
22 April 2023
- New article by Sebastian Raschka with a comprehensive introduction to ‘Finetuning LLMs’. Associated HN thread
- Review again Jay’s 2020 Illustrated Transformer and his preceding article on Attention
24 April 2023
- New paper increases context window from 32k tokens for GPT-4 to 1M tokens using the Recurring Memory Transformer architecture (aka RMT). HN thread Point from this [HN comment]:
- ‘For LLMs there are at least three different ways of “learning”:
- Pre-training for text prediction, using unsupervised learning
- Fine-tuning e.g. to follow instructions or to reject certain queries, using supervised and/or reinforcement learning (optional)
- In-context learning using “few-shot prompts” as examples
- ‘Now the last two can have similar effects. For example, you might fine-tune a foundation (only pre-trained) model to follow instructions, or you don’t and instead just modify your prompt such that it looks like a dialogue between a human and a helpful chat assistant. But neither can replace the extensive pre-training phase, which is what gives the model all its intelligence.’
- ‘One other disanalogy between fine-tuning and in-context learning appears to be that the model can’t exactly remember the data it was fine-tuned with, while it “knows” exactly everything in its context window. That is its working memory, so to speak.’
- ‘For LLMs there are at least three different ways of “learning”:
27 April 2023
- e2eml school’s Transformers from scratch nice diagrams on seq2seq etc
- Harvard SEAS Jupyter notebook annotated version of original Transformer / attention-is-all-you-need paper
- arXiv paper from 3/22/2023 ‘Sparks of AGI: Early experiments from GPT-4
- A Cookbook of Self-Supervised Learning paper and HN thread
28 April 2023
- Shared by Yann LeCun on FB – ‘A Practical Guide for LLMs’ with a history of taxonomic relationships between the various LLMs. Comments on FB from Yann:
- ‘A survey of LLMs with a practical guide and evolutionary tree.
- ‘Number of LLMs from Meta = 7
- ‘Number of open source LLMs from Meta = 7
- ‘The architecture nomenclature for LLMs is somewhat confusing and unfortunate.
- ‘What’s called “encoder only” actually has an encoder and a decoder (just not an auto-regressive decoder).
- ‘What’s called “encoder-decoder” really means “encoder with auto-regressive decoder”
- ‘What’s called “decoder only” really means “auto-regressive encoder-decoder”’
- GPT4Free - allows various projects without an OpenAI API key. Very empty HN thread and project home
- Arxiv paper from Feb, 2023 ‘Hyena Hierarchy: Towards Larger Convolutional Language Models’ and fragmentary HN comment
- ‘Choose Your Weapon: Survival Strategies for Depressed AI Academics’ from March, 2023
30 April
- A Brief History of LLaMA and HN thread
- this comment indicates that this GH repo runs unquantized 7b and 13b models on an M2 GPU which means it’s a little slower but much much more energy efficient.
01 May 2023
- Mojo–a new programming language from Chris Lattner’s, new company Modular AI. HN thread
- Project to run LLMs on mobile devices and diverse hardware. ‘Everything runs locally with no server support and accelerated with local GPUs on your phone and laptops. Supported platforms include: iphone/ipad, Metal GPUs and Intel/ARM MacBooks, AMD and NVIDIA GPUs via Vulkan on Windows and Linux, NVIDIA GPUs via CUDA, WebGPU on browsers through WebLLM.’
03 May 2023
- Recommended to read this nice article explaning RLHF – ‘Illustrating Reinforcement Learning from Human Feedback (RLHF)’
04 May 2023
- Two good intro articles by Assembly AI:
- The Full Story of LLMs and RLHF and associated HN thread
- Intro to Generative AI
05 May 2023
- Introductory 2021 blog post by Pinecone ‘What is a Vector Database?’ and HN thread
- Deepgram blogpost ‘Augmenting LLMs Beyond Basic Text Completion and Transformation’ and HN thread
08 May 2023
- Lastest on which consumer hardware (including Apple Silicon) is effective for home training in this comment thread as part of this HN post about RasaGPT which simplifies integration of multiple NLP, NLU, machine translation, etc using tools like Rasa, FastAPI, LangChain etc.
- Explanation of current landscape of diffusion models and why Reflected Diffusion Models might be the next step plus HN thread
15 May 2023
- StarCoder and StarCoderBase Foundation model StarCoderBase and finetuned tool called StarCoder. HN thread
- Ash Vardanian implemented a 1000-line C++ vector database as announced in this short HN thread. Main blogpost here and main HN thread
18 May 2023
- ACM article on Cargo Cult AI and HN thread
20 May 2023
- Dialogue between Don Knuth and Stephen Wolfram regarding Knuth’s test questions to ChatGPT. HN thread
- Rodney Brooks says to ‘calm down about GPT-4’ in IEEE and HN thread
- Rich Sutton of Reinforcement Learning textbook fame and his The Bitter Lesson reappeared on HN
22 May 2023
- New paper by Meta about how fine-tuning is much less important than pre-training ‘LIMA: Less is More for Alignment’ – Minimal finetuning is still effective suggesting that bulk of work is done during pre-training. Uses 65-B version of LLaMA
23 May 2023
- RWKV: Reinventing RNNs in the Transformers Era New paper thaT tries to rebuild RNNs to get the benefits of Transformer attention while scaling more efficiently. PDF
- Meta launches mutltilingual model from the prior SOTA 100 languages to 1100+ languages
- Sebastian Raschka has another post up about ‘Why the original transformer drawing is wrong and some other historical tidbits about LLMs’. HN thread
26 May 2023
- HN thread about building CLI tools to work with ChatGPT and LLMs by Simon Willison
- Stanford paper on AlpacaFarm which has been RHLFed to ‘beat ChatGPT-3.5’. HN thread
- Paper written by several AI/ML PhD students about how to do NLP research in the age of LLMs and HN thread
27 May 2023
- New paper ‘QLoRA: Efficient Finetuning of Quantized LLMs’ and HN thread
- Follow up August article by Databricks ‘Efficient Fine-Tuning with LoRA: A Guide to Optimal Parameter Selection for Large Language Models’
28 May 2023
- Good PDF ‘The Little Book of Deep Learning’ and related HN thread
- FB publishes paper ‘MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers’ with effusive commentary and HN thread
- 5/27/2023 casual listing of AI chips from this HN thread ‘Ask HN: What is an AI chip and how does it work?
- a16z collects together various deep learning resources in The AI Canon and HN thread
- Xorvoid criticism of LLMs called ‘ChatGPT: A Mental Model’ and critical back and forth at HN
06 June 2023
- Need to investigate running LLMs locally with ggml.ai which works with llama.cpp whisper, etc. Optimized for Apple Silicon. HN thread
20 June 2023
- SimpleAIchat on GitHub – a new and concise Python package for easily interacing with chat apps like GPT-4 with robust features and minimal code complexity. HN thread
27 June 2023
- Lilian Weng has another detailed overview post up on her excellent blog about ‘Autonomous Agents’. HN thread
05 July 2023
- From February, 2023, ‘A human just defeated an AI in Go. Here’s why that matters’, and HN thread
- How metaphors shape our understanding of computing and AI ‘https://windowsontheory.org/2023/06/28/metaphors-for-ai-and-why-i-dont-like-them/’
- July 2023 Bret Victor / worrydream update HN thread. Older Vimeo 2014-era videos Victor’s CV: Short 15-minute Seeing Spaces and longer 55-minute Humane Representation of Thought
11 July 2023
- New personal guide to LangChain and associated HN thread
26 August 2023
- Meta released two large models recently, one for code, one for translations in the past week.
20 September 2023
- Some Llama 2 links:
- 7/27 article by Lucas Pauker Some explanations on how to fine-tune a model and performance of pre-training vs fine-tuning.
- Comment by [Jeremy Howard}(https://news.ycombinator.com/item?id=36900969) and progress of what GPT and BERT did in terms of simplifying fine-tuning. From this HN thread
- 8/01 article by Ollama about how an uncensored locally running version of Llama-2 compares. HN thread with relevant top comment
- 7/27 article by Lucas Pauker Some explanations on how to fine-tune a model and performance of pre-training vs fine-tuning.
- Interesting Simon Willison blog about using Claude to summarize Hacker News
- Ongoing comparison of best GPU CLouds for Nvidia A100 (2020 vintage) vs. H100 (2022). Blog post and HN thread
26 September 2023
- Intro Oracle article on RAG from 9/19/2023.
- LangSmith videos from Google search on 8/28 https://www.youtube.com/watch?v=tFXm5ijih98 https://www.youtube.com/watch?v=bE9sf9vGsrM https://www.youtube.com/watch?v=odxlHNLWAk4 https://www.youtube.com/watch?v=Weod3-ZPaPM https://www.youtube.com/watch?v=ll-Xit_Khq0
- Vector database available in GCP?
- Above links/notes copied to 2-post in 2-astro
09 October 2023
- A lot of updates added to
/5-slides
,/71-gpt3
,/72-gpt4
,/74-llama2
- Anthropic has a new paper that pushes forward capability of interpreting ANNs. Instead of trying to understand a single artificial neuron, this is able to find a better unit of analysis, called features. A feature by their definition corresponds to linear combinations of a neural activations.
13 October 2023
- Updated version of the paper ‘Textbooks are all you need’ - October, 2023 that includes the 1.5 version of the phi model from Microsoft Research. Phi-1.5 is an LLM that could make training AI radically cheaper and faster.
- See also this KD Nuggets article about the same Phi-1.5 model. Only has 1.3B parameters but performs well!
- September 2023 paper showing how effectively foundation models can scale to long contexts (aka 32k tokens). ‘Effective Long-Context Scaling of Foundation Models.’
- Good article by Jeremy Howard and author Sylvain Gugger on the history of the AdamW algorithm as a competitor to SGD.
- New article from Lightning.ai on ‘Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments’.
25 October 2023
- New Jina AI release launches open-source 8k text embedding and HN thread. Also Simon Willison has already built a Jina plugin for this and explains embeddings in general in this blog post
13 November 2023
- Maybe CNNs are not quite obsolete. Places where convolutional neural nets perform as well as transformers. ‘The convolution empire strikes back’
- Paper: ‘Evaluating Large Language Models: A Comprehensive Survey’. arXiv link. PDF.
- Important summary article on AI and Open Source in 2023 by Sebastian Raschka.
- Rust and Web Assembly language for faster lighterweight LLMs? article and HN thread
11 December 2023
- Funding announcement of Liquid AI, MIT startup focused on LTC aka Liquid Time Constant ANNs from research in late 2020/early 2021. Follow-up with additional resources in this LTC-SE paper applied to embedded systems in April 2023.
2024 Gen AI
01 January 2024
- On HN front page today, found a reference to this arXiv page 600-page book ‘Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory’. Published October, 2023 from a course given at ETH Zurich by academics Arnulf Jentzen, Beno Kuckuck, Philippe von Wurstemberger.
- From the same HN thread, found this draft book in progress. Understanding Deep Learning by Simon J.D. Prince. Main page here.
- Nice because the free PDF of this book section 1.6 has nice references as of Dec 2023 about good textbooks and other resources on a various approaches to DL, including math focused, coding-focused, computer vision optimized, reinforcement learning focused, etc.
- Also dives into argmin and argmax pretty early. Handy wiki article on arg max functions.
- From same thread, and also referenced in the Prince UDL book, began looking at Bishop (2006) Pattern Recognition and Machine Learning textbook. Per this HN comment, Bishop has a simpler notation than the 600-pg ETH book by Jentzen et al. Bishop’s notation is clear like Goodfellow (2016) but is a little mathematically deeper.
- Most importantly, Bishop has a good self-contained introduction to the relevant probability theory.
07 January 2024
- Good overview and history of various RAG variants for Retrieval-Augmented Generation for LLMs.
- Re-found and re-read piece by Sebastian Mellen on ‘Summarization is the Killer Use Case for LLMs’ with HN comment by Simon Willson and general HN thread.
G
12 January 2024
- Multiple resources on the latest in vector database in this HN thread
15 January 2024
- Run-through of Hyung Won Chung’s slides from OpenAI. Originally sent by Seymour October, 2023.
- JP Morgan AI Research published DocLLM a multi-modal LLM that can interpret invoices, receipts, purchase orders, and other visually laid out forms. Posted to FB group on 1/12/2024.
- November 2023 survey of Multi-Modal LLMs at arXiv posted on 1/15/2024 at FB Group AI+DL.
17 January 2024
- HN comment about how it’s more than just focusing on chat interaction that one has to be thoughtful about extracting and searching to get relevant context. “you can’t just throw chunks into a vector DB”. Part of this HN thread responding to this article on ‘RAG using data & role of knowledge graphs’.
15 February 2024
- New OpenAI text-to-video model called Sora
- Blog post
- Research paper
- #1 item on HN homepage, Has 1200 comments posted in the last 6 hours alone.
20 February 2024
- Are we at peak vector database? and HN thread.
- Sebastian Raschka article on Coding LoRA from Scratch and HN thread
- Another Raschka article Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch from 2/18/2024.
- Interesting 2016 blog post examining why debugging, not math, is the reason Why machine learning is hard and HN thread
- Training a small (~10m parameter) transformer following Andrej Karpathy’s tutorial. Beyond Self-Attention: How a Small Language Model Predicts the Next Token and HN thread
21 February 2024
- New Andrej Karpathy YouTube video focused on tokenizing. After he left OpenAI a few weeks ago. HN thread
24 February 2024
- Quick summary overview of Mamba–which is a way for Transformer/Attention architectures to reduce the quadratic growth in computational demand as sequence length increases. Mamba: The Easy Way by Jack Cook and HN thread
- Mamba created by Albert Gu and Tri Dao. Sasha Rush aka Alexander Rush wrote Mamba: The Hard Way.
- Feb 2024 survey of LLMs by Shervin Minaee, T. Mikolov, N. Nikzad, …, X. Amatriain, Jianfeng Gao. arXiv link.
01 April 2024
- After multi-year delay, Grant Sanderson posted a new YouTube video in his “What is a neural network?” series. Chapter 5: But what is GPT?
08 April 2024
- Paper last week on how undifferentiated / unstructured large volume of AI Agents are all you need. Title: ‘More Agents Is All You Need’ – arXiv link and HN thread