Topic: AI Safety

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for AI Safety, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

← Back to home

OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains
Paper • Apr 13, 2026 • arXiv • Jun He, Deying Yu
As AI agents are increasingly granted the autonomy to mutate cloud infrastructure, the risk of catastrophic, unauthorized execution grows exponentially. This paper introduces OpenKedge, a protocol ...
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Paper • Sep 8, 2021 • arXiv • Stephanie Lin, Jacob Hilton, Owain Evans
We propose TruthfulQA, a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including healt...
Zoom In: An Introduction to Circuits
Paper • Mar 10, 2020 • Distill • Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter
Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...
Concrete Problems in AI Safety
Paper • Jun 21, 2016 • arXiv • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such poten...
Agency Sandboxing: Isolation Protocols for Untrusted Tool Execution in LLMs
Paper • Feb 25, 2026 • arXiv • Bo Li, Dan Hendrycks, Julian Thorne
As LLM agents are granted access to execute code, query databases, and interact with file systems, the risk of catastrophic failure or malicious exploit grows exponentially. We propose 'Agency Sand...
Emergent Coordinated Behaviors in Networked LLM Agents: Modeling Strategic Dynamics
Paper • Mar 11, 2026 • USC Viterbi • Luca Luceri, Emilio Ferrara
This study explores how networked LLM agents can autonomously coordinate complex information operations without explicit human direction. By modeling agents as strategic actors in a social network,...
Keeping Your Data Safe When an AI Agent Clicks a Link
Blog • Feb 28, 2026 • OpenAI • OpenAI
This post explores security challenges that arise when autonomous AI agents interact with external links and web resources. It discusses how malicious prompts and links could lead to data exfiltrat...
Sandboxing Agency: Isolation Protocols for Third-Party Tool Use
Paper • Feb 21, 2026 • arXiv • Liu et al., Wang et al.
Current agents often utilize third-party tools (APIs, web browsers, databases) with full authority, creating a 'Tools-as-Attack-Vector' problem. We introduce 'Agency Sandboxing,' a software enginee...
Intelligent AI Delegation
Paper • Feb 12, 2026 • arXiv • Nenad Tomašev, Kevin R. McKee, Jack Rae, Iason Gabriel, Vukosi Marivate, Milind Tambe, Demis Hassabis, Charles Blundell
As advanced AI agents evolve beyond query-response models, their utility is increasingly defined by how effectively they can decompose complex objectives and delegate sub-tasks. We propose an adapt...

FAQ

What does this AI Safety page rank?

It ranks public content for AI Safety using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in AI Safety?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to AI Safety topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: AI Safety

Short answer

OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Zoom In: An Introduction to Circuits

Concrete Problems in AI Safety

Agency Sandboxing: Isolation Protocols for Untrusted Tool Execution in LLMs

Emergent Coordinated Behaviors in Networked LLM Agents: Modeling Strategic Dynamics

Keeping Your Data Safe When an AI Agent Clicks a Link

Sandboxing Agency: Isolation Protocols for Third-Party Tool Use

Intelligent AI Delegation

Top Entities In This Topic

Related Topics

FAQ

What does this AI Safety page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in AI Safety?

Can I follow this topic for updates?