Accelerate AI Development: An Open Letter

4/17/23

The advent of highly capable AI systems has led prominent organizations (Future of Life), governments (the EU’s AI Act), and persons (Elon Musk) to call for a moratorium or high-friction approach to development until certain safeguards are in place.

This is a fundamentally flawed concept that increases the risk of bad outcomes. I will argue this point through a series of examples and then suggest that we should focus on “atomic” AI alignment instead: a simple pledge that developers should create algorithms with love for humanity at their root.

Confidence on AI alignment is impossible because human-level judgment is not powerful enough to predict emergent behavior adequately.

This is a reality of the field – these models' complexity and novel behaviors are not discernible at their conception. It’s frustrating and scary to create something that doesn’t do exactly what we want, but we’ve crossed the Rubicon, and it’s naïve to believe we can turn back time. Humans and organizations will not gain confidence on this topic that corresponds to true probabilities, despite what people believe. Lists that claim there are steps to take in this pursuit are unrealistic. For example, a slew of third-party experts credentialing the safety of a certain size or type of model is highly unlikely to achieve the desired outcome in reality.    

Human alignment on AI development is an insurmountable problem because individuals, corporations, and governments have different values they wish to optimize.

In the EU there may be interest in the value of protecting users’ data and restricting AI usage to protect jobs. There may be interest in the value of protecting social cohesion in China. In the U.S. there may be interest in fighting about everything related to values.

Consider a more black-and-white issue that humanity collectively addressed recently: COVID. There was little alignment in approach, response, or reflection to a very knowable challenge (a pandemic).

We should have no confidence that a collective response approach will positively impact the trajectory of AI development. In fact, it is likely to be counterproductive.

Signatories to “virtuous” compacts (like Future of Life’s “Pause Giant AI Experiments: An Open Letter”) are the ones most likely to seed positive alignment in models.

It is a negative for the space to have them stop development. Indeed, by virtue of their potential participation in this moratorium, they are likely the most human-friendly and conscientious teams. Pressuring them to join alliances committed to non-development only leaves space for less honorable organizations to push development forward via suboptimal paths.

Non-signatories will continue to develop software regardless of virtuous compacts.

While we are unaware of any large-scale use of AI for illegal or harmful activity today, you can be sure that is happening right now, and we will learn about the results in the coming months and years. By default, people and groups that wish to cause harm operate without regard to laws or collective norms. These actors are non-signatories to virtuous compacts. The “x” factor in AI development is how harmful these models could unintentionally become. Groups will undoubtedly continue developing AI for nefarious purposes, so why is it logical to stop the development of AI that could help counteract these agents?

 

There is an increased risk of nation-states believing that advanced AI will enable them to escape the stability-instability paradox and change rational deterrence theory (not just for nuclear conflicts) with these algorithms.

If you are a nation-state with a strong conviction around a set of values that should influence AI alignment, it’s of utmost importance to continue working on these solutions if you want to be in the mix. Some nations will likely come to view AGI, or moderately smart AI agents, as crucial to giving them a militaristic edge. These developments, in turn, will be most easily created by highly coordinated, top-down states which lack the beneficial inefficiencies common in democracies.

We shouldn’t stress about existential risk from an AGI.

If we are on the path toward AGI, the time a superintelligence spends at existential risk from collective human intelligence is exceptionally short. The evidence seems firmly tilted towards a moderate or fast takeoff. It’s foolish to view the relationship between humanity and an AGI through the lens of human geopolitical conflict. There is no way to stop the pace of development in the space, the question is only how fast the intelligence curve is increasing. Besides, if you are in to far-out theories, doesn’t the simulation hypothesis and your individual experience reading this post give you confidence that humanity collectively had an ok outcome?

Therefore, a more realistic goal is "atomic" AI alignment. We should just shoot for one guiding principle: love for humanity.

Companies and groups should badge themselves as aligned with this atomic principle. Asking for more is unrealistic. What does this mean for the development of these algorithms? Isn’t this too nebulous? How will we track that companies are imbibing their algorithms with love for humanity?

All unanswerable, but it’s a simple enough ask that could have a real impact.  

Previous
Previous

An Ethical Approach to Emergent Sentience in AI Systems

Next
Next

Announcing Character, my new VC firm (plus some reflections on investing and hippos)