The control problem of advanced artificial intelligence

In this article, I will talk about the problem of control on advanced artificial intelligence.

What is advanced artificial intelligence?

Artificial intelligence is a set of technologies that mimic or replace human reasoning, creativity, or judgment. Over the past few years, “deep learning” (“deep learning” – a specific methodology for training large AI models that requires huge investments on the scale of hundreds of millions of dollars) has yielded results on the path to expanding the capabilities of AI, as can be seen in the example of Chat-GPT or Stable Diffusion.

Hypothesisscaling laws“(“scaling laws“) suggests that the current technological architecture of AI models, albeit with small changes, is capable of greater intelligence and it simply requires more resources, such as training data, processing power and training time, electricity, and therefore funding. This hypothesis is correct. or not at the current level of AI research is unclear.

Since the extinction of the Neanderthals 40,000 years ago, human societies have not lived alongside other systems or species with a level of intelligence comparable to ours. Some researchers and companies hope that people, as the creators of this new technology, will be able to make it friendly to our interests.

Concepts in AI Security and Ethics

AI ethics is a system of moral principles and methods for developing and using AI. Practical AI ethics issues include bias in social media algorithms, misuse of AI for misinformation, copyright issues with AI training materials and AI-produced materials.

AI Security is a study of the security implications of AI systems, especially ones at least as advanced as OpenAI GPT-3.

“The Problem of Control” (the Control Problem) is a question of how creators and users can effectively manage intelligent AI systems.

The “control problem” posing has been criticized for doubting that humans will be able to fully “control”, “manage”, or even understand AI systems that are highly advanced, at least in some critical aspects.

Instead, research over the past two decades has focused on AI alignment (AI alignment), that is, to ensure that the goals and behavior of advanced AI correspond to the intentions, desires and values ​​​​of people (or at least some of them).

Why is it important to align AI with human values?

If the artificial intelligence system inconsistentit will ignore or misinterpret the wishes of users and creators.

Instrumental convergence is the tendency of sentient beings to pursue similar sub-goals, even if their ultimate goals are completely different. For example, humans and animals are territorial (i.e., they often seek power over a certain area) to achieve a variety of their goals: from subsistence to commercial success.

This means that advanced uncoordinated AI may seek to seize resources, launch cyberattacks, or otherwise wreak havoc on society if that helps it achieve its goals.

Since AI is a type of software that typically runs on many machines in data centers, it’s easy to assume that future AI samples could copy and parallelize your thinking. This means that not even the smartest system will be able to think faster than people. Some models, such as LLaMA, are sized several tens of gigabytes and run on consumer-grade laptopswhich means it will be difficult for people to turn off all copies of it if (or when) it becomes necessary.

What are the areas of research in AI alignment?

Almost all the problems of AI harmonization are currently not solved either at the theoretical or at the practical level. But there is several notable research programs on the following topics:

  • “The Problem of Conformity of Values” is the main sub-problem, consisting in the transfer of human preferences to AI (potentially idealized).

  • Fixability (corrigibility) is that the AI ​​system follows the requests of people to correct the course of their actions or to turn off.

  • Fraud Preventionthat is, the transparency of AI behavior.

  • Mechanistic interpretability is the study of the inner workings of neural networks (including their opaque weight matrices). Interpretability can help detect deception.

  • Revealing hidden knowledgethat is trying to find out from the AI ​​what it knows.

  • Internal reconciliation ensures that the inner workings of the AI ​​(including “mesa optimizationand “incorrect specification of goals”) will not undermine its outward alignment with human values.

  • Scalable Negotiation is that as AI becomes more and more intelligent, it stays aligned with human values.

At the moment, there is no real evidence or theorems that prove that AI matching and, as a result, the “control problem” is in principle solvable.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *