Topic: LLMs

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for LLMs, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Language models can explain neurons in language models

    PaperMay 9, 2023OpenAISteven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders

    Understanding the internal mechanisms of massive language models is a critical bottleneck for AI safety and alignment. Given the billions of parameters in modern models, manual human inspection of ...

  2. Scaling Laws for Neural Language Models

    PaperJan 23, 2020arXivJared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

    We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, ...

  3. Language Models are Unsupervised Multitask Learners

    PaperFeb 14, 2019OpenAIAlec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever

    Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific data...

  4. Language Models are Few-Shot Learners

    PaperMay 28, 2020arXivTom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

    Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic i...

  5. Improving language models by retrieving from trillions of tokens

    PaperDec 8, 2021arXivSebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre

    We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our R...

  6. Mathematical discoveries from program search with large language models

    PaperDec 14, 2023NatureBernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, Pushmeet Kohli

    Large language models (LLMs) have demonstrated impressive capabilities in code generation, but their ability to discover novel mathematical knowledge has been limited by hallucinations and lack of ...

  7. Gemma: Open Models Based on Gemini Research and Technology

    PaperFeb 21, 2024arXivGemma Team, Google DeepMind

    We introduce Gemma, a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma models are offered in two sizes: a 7 bi...

  8. Competition-level code generation with AlphaCode

    PaperDec 8, 2022ScienceYujia Li, David Choi, Junyoung Chung, Nate Kushman, Oriol Vinyals

    Programming represents a complex problem-solving task that requires deep logic and algorithmic reasoning. We present AlphaCode, a system that writes computer programs at a competitive level. AlphaC...

  9. GPT-4 Technical Report

    PaperMar 15, 2023arXivOpenAI

    We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT...

  10. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    PaperDec 8, 2021arXivJack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Eleni Elia, Danilo J. Rezende, Vinyals, Simonyan

    Language modelling provides a step towards intelligent communication systems by harnessing large datasets and expressive models. We provide an analysis of Transformer-based language model architect...

Top Entities In This Topic

Related Topics

FAQ

What does this LLMs page rank?

It ranks public content for LLMs using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to LLMs topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to LLMs topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in LLMs?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to LLMs topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to LLMs topic page on Attendemia and is written so it still makes sense without reading other sections on the page.