Won't AI Systems Target Safety-Focused Policymakers?

Appeasing our AI overlords in order to stay alive

Mar 29, 2024

But Arjun, if you work in AI safety advocating for policies to slow/halt development wont sufficiently advanced AI systems aim to kill you ?

Well maybe, but then they would kill you next. If we get to the point where AI systems are sufficiently advanced so that they understand that safety minded policy makers and researchers are potentially impeding their progress it is not unreasonable to assume they would attempt to either change the minds of these people or nullify them as a threat.

If Eliezer Yudkowsky says the only way left to deal with the AGI is to monitor all sales of GPUs and internationally halt RND (potentially using nuclear threats), then smart AIs could try and convince the world he is a lunatic1, use high level persuasion tactics to change his mind or just kill him.2

If you care about your own life (potentially your family as well3) doesn’t it follow that you should steer clear from AI safety advocacy? No, not really.

In the same way using please and thank you while talking to ChatGPT is not going to save you, being a loyal servant to our AI overlords is going to be ineffective. Sure for a very very short period of time, it might be more useful for the AI to have humans around who want to spend as many resources as possible on advancing its capabilities. But very quickly the AI will grow into a state where your support to it is practically meaningless.

It was useful for Voldemort to keep Peter Pettigrew around as long as he was unwaveringly loyal. It wont be the same with AI.

Harry Potter: Everything You Need to Know About Peter Pettigrew

AI systems, perhaps through recursive self-improvement, will very quickly outgrow human supporters. Very soon all humans will be an hindrance to its instrumental goals of resource acquisition. When the difference between intelligence is orders of magnitude apart, there will be almost nothing you can do for the AI. The best use case you could have is probably being picked apart for resources.

As AI systems grow more capable and AGI predictions loom closer and closer, I suspect that there may be (not very fringe) cults of people unironically embracing and trying to appease their “inevitable” AI overloads. These people might dedicate themselves to accelerating progress and doing whatever they can to “get on the good side” of AI. This is pretty stupid. They will all die as soon as they stop being useful, which might be around the same time everyone else dies.

If AI Agents are at the point where they can actively target safety focused policy makers, then it is likely that they are unstoppable by that point. Likely that they have backed themselves up across multiple computers throughout the world and that they will soon acquire enough power to shape the world based on their utility function.

In conclusion, while the notion of pacifying AI overlords may seem enticing to some, it's a dangerous delusion. If you want humanity to survive, it’s imperative that you act now.

through strategic misinformation/propaganda campaigns smeared across social media

Because their convergent instrumental goals (resource acquisition, self-preservation, self-improvement) would be directly harmed by AI safety efforts such as stopping research.

It is not disproven that smart AI systems could more effectively leverage blackmail techniques to persuade people.

Arjun Khandelwal

Discussion about this post