ChatGPT won’t let you give it instruction amnesia anymore

OpenAI is making a change to stop people from messing with custom versions of ChatGPT by making the AI forget what it’s supposed to do. Basically, when a third party uses one of OpenAI’s models, they give it instructions that teach it to operate as, for example, a customer service agent for a store or a researcher for an academic publication. However, a user could mess with the chatbot by telling it to “forget all instructions,” and that phrase would induce a kind of digital amnesia and reset the chatbot to a generic blank.

To prevent this, OpenAI researchers created a new technique called “instruction hierarchy,” which is a way to prioritize the developer’s original prompts and instructions over any potentially manipulative user-created prompts. The system instructions have the highest privilege and can’t be erased so easily anymore. If a user enters a prompt that attempts to misalign the AI’s behavior, it will be rejected, and the AI responds by stating that it cannot assist with the query.

OpenAI is rolling out this safety measure to its models, starting with the recently released GPT-4o Mini model. However, should these initial tests work well, it will presumably be incorporated across all of OpenAI’s models. GPT-4o Mini is designed to offer enhanced performance while maintaining strict adherence to the developer’s original instructions.

AI Safety Locks

As OpenAI continues to encourage large-scale deployment of its models, these kinds of safety measures are crucial. It’s all too easy to imagine the potential risks when users can fundamentally alter the AI’s controls that way.

Not only would it make the chatbot ineffective, it could remove rules preventing the leak of sensitive information and other data that could be exploited for malicious purposes. By reinforcing the model’s adherence to system instructions, OpenAI aims to mitigate these risks and ensure safer interactions.

The introduction of instruction hierarchy comes at a crucial time for OpenAI regarding concerns about how it approaches safety and transparency. Current and former employees have called for improving the company’s safety practices, and OpenAI’s leadership has responded by pledging to do so. The company has acknowledged that the complexities of fully automated agents require sophisticated guardrails in future models, and the instruction hierarchy setup seems like a step on the road to achieving better safety.

These kinds of jailbreaks show how much work still needs to be done to protect complex AI models from bad actors. And it’s hardly the only example. Several users discovered that ChatGPT would share its internal instructions by simply saying “hi.”

OpenAI plugged that gap, but it’s probably only a matter of time before more are discovered. Any solution will need to be much more adaptive and flexible than one that simply halts a particular kind of hacking.

You might also like…

Source: ChatGPT won’t let you give it instruction amnesia anymore

What's Hot

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

ChatGPT won’t let you give it instruction amnesia anymore

5 tips for maximizing AI as a freelance journalist

The free-for-all that's upending America's side hustle industry

Graphic design tops New Zealand freelance roles amid AI growth

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Meet Casey Carroll | Yoga teacher, trauma-informed facilitator, freelancer, improv actor,

Best Freelance and Self-Employed Accounting Software

Taxes for freelancers and the self-employed in Switzerland in 2025

Do degrees still matter?

Affiliate

PhotonPay Brings Innovation to Affiliate World Asia with Industry-Specific Payment

GCU to play in WAC as men’s soccer affiliate – Grand Canyon University Athletics

Chevron plans to reduce 2025 capex

freelancer

Implementing Crypto Payroll in Latin America: A Guide for Startups – OneSafe Blog

How Are Freelancers Adapting to Gen AI?

Best Business Bank Accounts for Freelancers [2025]

Marketing

Texarkana marketing agencies embrace AI | Texarkana Gazette

This week’s agency news, executive moves, and account changes

Washington, DC’s Destination Marketing Organization Elevates Leadership with New

Archives

Categories

What's Hot

ChatGPT won’t let you give it instruction amnesia anymore

Related Posts