Topic: AI Safety

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for AI Safety, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. TruthfulQA: Measuring How Models Mimic Human Falsehoods

    PaperSep 8, 2021arXivStephanie Lin, Jacob Hilton, Owain Evans

    We propose TruthfulQA, a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including healt...

  2. Zoom In: An Introduction to Circuits

    PaperMar 10, 2020DistillChris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter

    Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...

  3. Concrete Problems in AI Safety

    PaperJun 21, 2016arXivDario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

    Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such poten...

  4. Keeping Your Data Safe When an AI Agent Clicks a Link

    BlogFeb 28, 2026OpenAIOpenAI

    This post explores security challenges that arise when autonomous AI agents interact with external links and web resources. It discusses how malicious prompts and links could lead to data exfiltrat...

  5. Sandboxing Agency: Isolation Protocols for Third-Party Tool Use

    PaperFeb 21, 2026arXivLiu et al., Wang et al.

    Current agents often utilize third-party tools (APIs, web browsers, databases) with full authority, creating a 'Tools-as-Attack-Vector' problem. We introduce 'Agency Sandboxing,' a software enginee...

  6. Intelligent AI Delegation

    PaperFeb 12, 2026arXivNenad Tomašev, Kevin R. McKee, Jack Rae, Iason Gabriel, Vukosi Marivate, Milind Tambe, Demis Hassabis, Charles Blundell

    As advanced AI agents evolve beyond query-response models, their utility is increasingly defined by how effectively they can decompose complex objectives and delegate sub-tasks. We propose an adapt...

Top Entities In This Topic

Related Topics

FAQ

What does this AI Safety page rank?

It ranks public content for AI Safety using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in AI Safety?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.