This paper, like almost all discourse on the public choice of AI safety, pressuposes the basic framing that "investing in AI safety" is a thing we can do as a society: money goes in, safety comes out. The very term "AI safety" brings to mind boring engineering or public health projects. Sure, perhaps it takes a while, or the amount of safety per dollar invested is uncertain, but surely if we invest enough money we'll have safe AI, right? Now, if you spend enough of your time actually talking to AI safety researchers and reading their work, you'll realize that this is categorically false. Not only has all the investment in AI safety to date produced zero AI safety by any reasonable metric; there is not even a good understanding on what AI safety might even be or how we would measure it, let alone what would count as a proper AI safety program. To call this field "pre-paradigmatic" would be an understatement. Investing any amount in this field without a clear thesis for how it translates into actual safety outcomes, let alone 16% of GDP, would be tragically irresponsible and premature, and soil the reputation of the term "AI safety" for generations to come.
As an extreme example, Yudkowsky and others claim that current AI technologies are intrinsically unsafe (the instrumental convergence argument) and already on the verge of TAI, and therefore "AI safety" would require completely dismantling existing AI systems and replacing them with... nothing until we figure out what "safe TAI" might look like. So unless anyone counts funding people agitating for nuking datacenters as "AI research", by this one definition the RoI of *any* AI safety research would be zero.
It would be too cynical for me to say that all this discourse about "how much we should invest on AI safety research" is self-serving justification from AI safety researchers to get them employed, especially after EA funding has largely dried out. But it sure looks like it.
So, what's the alternative? Well, it's to follow the lead of every other engineering discipline and build safety into the systems themselves. But this has to be "passive" or "intrinsic" safety. I agree with a softer version of Yudkowsky -- LLM-based AI is generally not something one would be able to make intrinsically safe, and everyone who is trying to do so will find themselves in a dramatically worse version of the unintended consequences phenomenon that happened in nuclear power plants, where increased requirements for "safety features" made the systems complex, brittle and -- tada -- unsafe.
Luckily, there is lots of excellent work happening in neurosymbolic AI and related fields (here's the self-serving part), where we build systems that only use LLMs in "non-critical" areas and otherwise keep to verifiable, repeatable and transparent engineered behavior. (This has multiple other advantages beyond just safety, BTW.) But this is unsexy and -- critically -- very cheap, certainly not "measurable percentage of GDP" worthy...
This paper, like almost all discourse on the public choice of AI safety, pressuposes the basic framing that "investing in AI safety" is a thing we can do as a society: money goes in, safety comes out. The very term "AI safety" brings to mind boring engineering or public health projects. Sure, perhaps it takes a while, or the amount of safety per dollar invested is uncertain, but surely if we invest enough money we'll have safe AI, right? Now, if you spend enough of your time actually talking to AI safety researchers and reading their work, you'll realize that this is categorically false. Not only has all the investment in AI safety to date produced zero AI safety by any reasonable metric; there is not even a good understanding on what AI safety might even be or how we would measure it, let alone what would count as a proper AI safety program. To call this field "pre-paradigmatic" would be an understatement. Investing any amount in this field without a clear thesis for how it translates into actual safety outcomes, let alone 16% of GDP, would be tragically irresponsible and premature, and soil the reputation of the term "AI safety" for generations to come.
As an extreme example, Yudkowsky and others claim that current AI technologies are intrinsically unsafe (the instrumental convergence argument) and already on the verge of TAI, and therefore "AI safety" would require completely dismantling existing AI systems and replacing them with... nothing until we figure out what "safe TAI" might look like. So unless anyone counts funding people agitating for nuking datacenters as "AI research", by this one definition the RoI of *any* AI safety research would be zero.
It would be too cynical for me to say that all this discourse about "how much we should invest on AI safety research" is self-serving justification from AI safety researchers to get them employed, especially after EA funding has largely dried out. But it sure looks like it.
So, what's the alternative? Well, it's to follow the lead of every other engineering discipline and build safety into the systems themselves. But this has to be "passive" or "intrinsic" safety. I agree with a softer version of Yudkowsky -- LLM-based AI is generally not something one would be able to make intrinsically safe, and everyone who is trying to do so will find themselves in a dramatically worse version of the unintended consequences phenomenon that happened in nuclear power plants, where increased requirements for "safety features" made the systems complex, brittle and -- tada -- unsafe.
Luckily, there is lots of excellent work happening in neurosymbolic AI and related fields (here's the self-serving part), where we build systems that only use LLMs in "non-critical" areas and otherwise keep to verifiable, repeatable and transparent engineered behavior. (This has multiple other advantages beyond just safety, BTW.) But this is unsexy and -- critically -- very cheap, certainly not "measurable percentage of GDP" worthy...