OpenAI Improves ChatGPT Safety with Context-Aware Risk

OpenAI launched new safety features that help ChatGPT better recognize when conversations may be heading toward harmful territory by analyzing context that develops over time.

The updates focus on identifying subtle warning signs that emerge gradually across single conversations or multiple chat sessions. ChatGPT can now maintain "safety summaries" — brief notes about previous safety-relevant context that help the model respond more appropriately when similar concerns arise later.

How the system works

The safety summaries are created by a specialized model trained for safety reasoning tasks. They capture factual safety context rather than serving as general personalization, are kept for limited periods, and only activate during serious safety concerns.

OpenAI developed the system with input from mental health professionals in its Global Physicians Network, including psychiatrists and psychologists specializing in forensic psychology and suicide prevention. These experts helped determine when summaries should be created and how long context should be considered.

The company focused initially on three acute scenarios: suicide, self-harm, and harm-to-others situations. In these cases, ChatGPT can better distinguish between ordinary requests and those that may signal higher risk when viewed alongside earlier conversation patterns.

Performance improvements

Internal evaluations showed significant improvements in safety responses. In long single-conversation scenarios, safe-response performance improved by 50% in suicide and self-harm cases and 16% in harm-to-others cases.

Across multiple conversations on GPT-5.5 Instant, the current default ChatGPT model, performance improved by 52% in harm-to-others cases and 39% in suicide and self-harm scenarios.

The safety summaries themselves scored an average of 4.93 out of 5 for safety relevance and 4.34 out of 5 for factual accuracy across more than 4,000 evaluations. Testing showed no meaningful impact on response quality in ordinary conversations.

Future applications

OpenAI said it may explore applying similar methods to other high-risk areas including biology or cybersecurity safety, though with careful safeguards in place.

The updates build on more than two years of collaboration with mental health and safety experts and extensive work across model training, evaluations, and monitoring systems.

OpenAI Improves ChatGPT Safety with Context-Aware Risk Detection

How the system works

Performance improvements

Future applications

Related reading

MiniMax Releases M2 Series: 230B Parameter Model with 9.8B Active Tokens

Google launches Gemini for Science AI tools for research workflows

Google announces Wear OS 7 with Gemini Intelligence integration

💬 Discussion