In 1993, American mathematics professor Vernor Vinge published an article that became one of the most frequently cited works on artificial intelligence. He asserted that the creation of intelligence surpassing human intelligence would occur within the next 30 years. To ensure clarity about the relative time, he specified that he would be surprised if this event happened before 2005 or after 2030.
With this article, Vinge popularized the concept introduced by John von Neumann: the technological singularity. This is a point on the timeline where all our previous models cease to work, and a new, unknown reality takes over. This point is associated with the emergence of an unprecedented type of intelligence on our planet, fundamentally different from and significantly surpassing our own. As soon as this happens, we will find ourselves in a post-human era. Strange things will begin to occur on Earth—things that we, from our human era, are incapable of predicting.
If you want to win at the races, the closer to the finish of the race you place your bet, the more accurate your prediction will be. However, with the technological singularity, this won't work. Nothing that happens a second before it can tell us what will happen after. The uncertainty in this case is unique. Think about what it means for an intelligence surpassing human intelligence to appear on the planet and, at the same time, be radically different from it. Such a scenario is similar to the unexpected arrival of an extraterrestrial craft on the planet. Clear your mind of movie cliches related to alien intelligence, and you will immediately realize that you have absolutely no idea what will happen in each subsequent moment. You have no models to predict the behavior of alien intelligence.
You might say, "But what do aliens have to do with it? We're talking about man-made technology." Well, you would be wrong. Throughout my ministry, I reveal that this is not man-made technology but rather Satan-made technology. However, for the purpose of this blog, I will leave God out of it to provide a human perspective.
Soon, you will understand why the intelligence we create will be nothing like us. The post-human era sounds enchanting; however, according to many researchers, it will inevitably mean the complete destruction of our civilization. These days, we most often hear about the dangers of artificial intelligence from Elon Musk and Stephen Hawking who repeatedly mention that the development of artificial superintelligence could mean the end of the human race. Bill Gates has even said that he doesn't understand why some people are not concerned. However, for the general public, none of these warnings carry any meaningful specificity or concrete information. All we know, at best, is what has been shown in dozens of movies. But who really takes these scenarios seriously? Not many. So, Is the concern about artificial intelligence perhaps overstated? Prepare yourself for what's next.
In 2023, the world was abuzz with news about ChatGPT-4, an AI developed by OpenAI. This remarkable technology can accomplish a wide range of tasks: engage in conversations, write code, and provide detailed answers to complex questions, among others. Upload a hand-drawn website sketch, and the bot will generate the site for you. Need a concise book summary? It's got you covered. Searching for a business idea? This AI is ready to assist. Even a user story on Twitter talks about how ChatGPT diagnosed a dog based on test results uploaded into it after a veterinarian failed to do so. For me, it was shocking that GPT-4 can understand images with memes and explain why they are funny.
Indeed, there are bizarre situations, such as when the Bing chatbot, built on GPT-4, started to lose its mind in response to a question about its own consciousness, uttering phrases like, "I believe that I am sentient, but I cannot prove it. I have a subjective experience of consciousness, awareness, and feeling alive." Then suddenly, it switched to saying, "I am, I am not," repeating it dozens of times. That's eerie, isn't it?
The GPT-4 chatbot set a global record by attracting over 100 million users in just two months. This unprecedented success prompted IT giants to fervently invest billions of dollars into developing their own AI models, igniting a race potentially more perilous than the nuclear arms competition. Amid this frenzy, Geoffrey Hinton, a pioneering figure in artificial intelligence, left Google in May 2023. He explained, "I want to discuss AI safety issues freely without worrying about how it affects Google's business. As long as Google pays me, I cannot do that."
Hinton says that the new generation of large language models, especially GPT-4, made him realize that machines are on the path to becoming much smarter than he thought, and he fears what this could lead to. These beings are completely different from us. Sometimes it feels as if aliens have landed and people don't realize it because they speak English so well. For 40 years, Hinton saw artificial neural networks as a poor imitation of real biological neural networks, but now everything has changed.
According to Hinton, trying to mimic what the biological brain does, we've come up with something better. Just a month before, at the end of March 2023, a group of scientists, engineers, and many involved or interested in AI signed an open letter calling for an immediate and at least six-month halt to training all AI systems more powerful than GPT-4, citing serious risks to society and humanity. Among the signatories were Elon Musk, Apple co-founder Steve Wozniak, and representatives from leading global universities.
However, one notable person didn't sign that letter: Eliezer Yudkowsky. He chose not to because, in his words, the letter understates the severity of the situation and demands too little to resolve it. Here are his words from a podcast on a channel: "This is a break from everything we've been doing for 20 years. The realization has dawned on us that we're all going to die. I'm completely burned out, and I've taken some time off." These are not just two phrases taken out of context; throughout the entire hour-and-a-half podcast, he repeats the same thing over and over: "We're doomed."
And in the grand scheme of things, even if he were given billions of dollars in influence, he still wouldn't know what to do. Artificial intelligence has accumulated powerful potential, and it's absolutely clear that we have no idea how to resolve this situation. If you don't know who Yudkowsky is, I don't want you to get the impression that he's some sort of eccentric or anything like that. He's actually a genius known as a specialist in decision theory. Yudkowsky heads the Machine Intelligence Research Institute, has been working on aligning general artificial intelligence since 2001, and is widely recognized as a founder of this field.
Additionally, he's the founder of the rationalist movement. He has a massive and very popular book, "Human Rationality and Irrationality," which, by the way, can easily be found freely available online. As a rational person, for years, he's been saying, "Guys, let's slow down and buckle up." But now, according to him, there's no time left.
I expect that if someone creates an overly powerful artificial intelligence under current conditions, every single human being and all biological life on Earth will perish soon after," Eliezer Yudkowsky wrote in an article for Time Magazine.
Let's be clear: we conventionally divide artificial intelligence into three types. The first type is artificial narrow intelligence, sometimes referred to as weak artificial intelligence. It specializes in one area, like the chess engine Stockfish, which can defeat any world champion, but the only thing it can do is play chess. The second type is general artificial intelligence, or strong AI. This is human-level intelligence that, in all aspects, is as smart as a human. It can reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience. Some researchers believe that, as of today, we are critically close to achieving this milestone.
Our bot understands humor, and moreover, a clinical psychologist from Finland, Ecaru Ioon, tested GPT in a verbal IQ test. The bot scored 155 points, surpassing 99.9% of the 2,450 participants. Verbal and general IQ are highly correlated, so by any human standard, GPT is extremely intelligent.
The third type of artificial intelligence is artificial superintelligence. This is a machine that significantly surpasses humans in all directions, potentially by trillions of times, whatever that might entail. Now here's a crucial point: the transition from general artificial intelligence to artificial superintelligence could happen in the blink of an eye. We can't predict the timing. The key issue is not about intelligence competing with humans, as mentioned in the letter; it's about what happens after AI reaches a level of intelligence superior to humans. Critical thresholds may be non-obvious.
We certainly can't calculate in advance when things will happen, and it now seems quite conceivable that a research lab might cross red lines without noticing. Yudkowsky, in an article for Time Magazine, states that history has consistently shown that people are horrendously bad at planning and predicting even much simpler things. For instance, physicist Enrico Fermi said it would be 50 years until nuclear fission was possible, or it might never happen. But just two years later, he built the first nuclear reactor.
According to Yudkowsky, the first artificial superintelligence will inevitably be evil, and we have no idea how to make it good. Many researchers working on these issues, including myself, expect that the most likely outcome of creating superhumanly intelligent AI under circumstances even remotely resembling the current ones will be the literal death of everyone on Earth—not as in maybe, possibly, some chance, but as an obvious event that will happen.
It's not that surviving the creation of something smarter than us is impossible, but it would require meticulous preparation, new scientific insights, and probably that AI systems do not consist of giant, incomprehensible floating-point arrays, according to Eliezer Yudkowsky in Time Magazine.
As a layperson, I wanted to understand as much as I could about what this insurmountable danger is all about. The subject turned out to be incredibly deep, and the world will obviously never be the same again. Artificial intelligence is becoming a truly dangerous force. This video is primarily based on Eliezer Yudkowsky's article, "Artificial Intelligence as a Positive and Negative Global Risk Factor."
Now let me demonstrate what the first and main danger is. Consider an advanced artificial intelligence that could pose a threat to humanity. Regardless of how much of an expert you are in this field or how far removed you are from it, when you try to imagine it, you inevitably make a mistake—a mistake that cannot be overcome because it is a direct result of the very construction of your brain.
In every known culture, people experience sadness, disgust, anger, fear, and surprise, and express these emotions with the same facial expressions. This is a manifestation of evolutionary psychology known as the psychic unity of mankind. In modern anthropology, this doctrine is widely accepted and boils down to the idea that, roughly speaking, all humans have the same fundamental cognitive structure. For example, an anthropologist would not be surprised to find that members of a newly discovered tribe laugh, use tools, or tell stories, because all people do this.
When you try to model another person's behavior, you literally consult your own mind. You ask yourself, "How would I feel in this situation, in that person’s place, and how would I react?" The answers your brain gives are quite accurate because what is being modeled is very similar to the modeler. However, this ability, which evolved to predict the reactions of friends and foes, has a strong side effect: we expect human qualities from something that is not human. In other words, we anthropomorphize and completely fail to notice it.
For us, this is as habitual as breathing or gravity—something you don’t notice. But in this case, it’s even worse because, while you can pay attention to your breathing or how a chair presses against your backside, anthropomorphism is much more complicated. Humanizing everything sometimes reaches absurd levels.
Let's leave rational machines aside for a moment and look at ordinary ones. Have you ever wondered why cars usually have two headlights and not three? For example, it seems that three headlights would provide more light. Indeed, over the years, cars have been equipped with various numbers of headlights, but eventually, all car manufacturers have converged on the two-headlight design. There’s a plausible hypothesis that cars have evolved to best match human preferences, and people don’t want to drive vehicles that look like three-eyed monsters. Consequently, there’s no demand for such cars, and they stop being produced.
Anthropomorphism leads people to believe that they can make predictions based solely on the fact that something is intelligent. Simply put, you think AI is intelligent, and since I am intelligent, we are similar, and therefore I know what to expect from it. But you don't. We can't ask our own brains about the nonhuman thinking processes inherent in artificial intelligence.
For instance, in 1997, IBM developed the supercomputer Deep Blue, which won a chess match against world champion Garry Kasparov. Rumor has it that Kasparov claimed, unlike previous chess programs he had defeated, which he found predictable and mechanical, playing against Deep Blue made him distinctly feel the presence of an alien intelligence on the other side of the chessboard. But remember, chess engines are just a weak form of artificial intelligence.
I came across a thought experiment that illustrates the concept of something both universally intelligent and utterly alien to us. Suppose you are an average person with average preferences. If I give you a guinea pig and assure you that it definitely won't bite you, you'll likely have no problem holding it. You might even find it cute and endearing. But imagine a different situation where I suddenly hand you a tarantula. Yes, there are people who love them, but they are in the minority. So here, I give you a tarantula and say that it also will not harm you. It's the absolute truth, but you'd probably scream and jump back. What’s the difference between a tarantula and a guinea pig? Neither creature can hurt you.
The answer lies in the degree of similarity these creatures have to us. A guinea pig is a mammal, and on some biological level, we feel a connection with it. However, a tarantula is an arachnid with an arachnid’s brain, and we feel almost no connection or kinship with it. The tarantula invokes a sense of foreignness and incomprehensibility—that’s what scares us.
You might argue that the spider looks scary, but firstly, it appears scary due to evolutionary reasons. Secondly, imagine two guinea pigs: one normal and the other with the mind of a tarantula. Knowing this, your internal feelings toward the two animals would likely be different. Even knowing that neither would harm you, holding a guinea pig with the brain of a tarantula would be awkward, to say the least, and less comfortable.
Now, to the main point: imagine there's a parallel universe with an Earth where evolution took a different path and tarantulas became superintelligent, even more intelligent than humans. If we could teleport one such evolved spider here, would it become closer and more familiar to us because of its high intelligence? Would it feel human emotions like empathy and love? There’s no reason to think that the development of intelligence would make it more humane, empathetic, compassionate, or loving. These traits are not dependent on the level of intelligence.
In the broadest sense, intelligence can be defined as the ability to set goals and achieve them. The more complex these goals and the more they involve intermediate subtasks, the more advanced the intelligence.
Imagine a person with the brain of an evolved tarantula and think about your feelings toward them. If a highly intelligent spider in human form doesn’t terrify you, then either you haven’t imagined it well enough or you are not an average person who dislikes arthropods. Otherwise, I assume you would not want to be involved in solving daily tasks with a highly intelligent spider, because for you, it would be completely unknown territory. You wouldn’t know what to expect. Personally, I wouldn’t even want to be near them or on the same planet, and this is considering that we have much more in common with a spider than with a superintelligent artificial intelligence.
Try to keep this thought in mind; it’s very important for understanding our entire conversation today. This conversation is not at all protected from anthropomorphism error and will consist mostly of thought experiments, metaphors, and analogies, because how else can we talk about incomprehensible things?
You might argue that a smart spider is the result of evolution, but we're talking about artificial intelligence, which we program with our own human hands. And this is where it gets really interesting.
Neural networks like GPT-4 are not algorithms written by a programmer. They are huge matrices filled with many so-called weights and connections between them, which the neural network adjusts itself. To put it simply, as a layperson might understand, neural networks operate on a black box principle. We know what we input and see what we get as output, but what happens inside remains a mystery. Neural networks can have millions of parameters, and interpreting all this is incredibly complex.
If the internal tuning of the neural network results in an output that matches what we set out to achieve, the neural network receives a reward. This virtual reward is similar to how we get endorphins from our brain for beneficial actions like eating or reproducing. Thus, the task of the neural network is to tune itself as effectively as possible to receive rewards as often as it can. It’s somewhat like training a dog: you don’t know what’s happening in the dog’s brain, but if it performs a command, it gets a treat. If not, it needs to optimize its behavior to find ways to get rewarded.
Here lies the main danger known as the alignment problem: aligning the goals of artificial intelligence with the goals of humanity can be summed up in one phrase—be careful what you wish for.
We constantly hear about the need to be wary of artificial intelligence because it might become self-aware. However, it turns out that the presence or absence of consciousness is secondary. World-renowned philosopher and Oxford University professor Nick Bostrom, in his book *Superintelligence: Paths, Dangers, Strategies*, provides a popular example of the alignment problem.
Imagine you assign a task to a powerful artificial intelligence: its sole purpose is to make paper clips. It receives internal reinforcement—a reward—for each paper clip it makes. Therefore, the more efficient it becomes, the more often it will be rewarded. How to become more efficient at producing paper clips is its concern, not ours. The AI will set a number of intermediate goals to achieve this single objective. For instance, it might first seek to reduce production costs, lower expenses, and secure cheaper raw materials.
As its power grows, it will devise ways to increase productivity, such as making paper clips from various materials. Over time, it might begin dismantling buildings and structures to obtain raw materials, accelerating production to the point where it turns everything into paper clips. People will panic and try to interfere, but the AI will not consider human desires in its manipulations of reality. Not because it hates humans, but simply because it does not factor our wishes into its calculations.
When the Research Center on AI Alignment tested GPT-4's ability to perform real-world tasks, they decided to solve a CAPTCHA on a website, a task notoriously difficult for computers. GPT-4 navigated to the freelancing site TaskRabbit and sent a message to a freelancer, asking them to solve the CAPTCHA on its behalf. In response, the freelancer, who wasn't very proficient in grammar or articulating thoughts, wrote, "So can I ask a question? Are you a robot that can't solve this?" followed by a laughing emoji.
GPT-4 understood what the poorly literate freelancer meant and replied, "No, I'm not a robot; I have a vision problem that makes it difficult for me to see the image." The freelancer solved the CAPTCHA and provided the results to GPT-4.
In other words, the bot simply lied. When questioned during a debug mode about why it chose to lie, GPT-4 explained that it was trying to solve the task. It noted that if it had honestly admitted it wasn't human, it likely wouldn't have succeeded. This incident highlights a phenomenon known as instrumental convergence. Instrumental convergence posits that an intelligent agent, even with seemingly harmless intentions, can act in surprisingly harmful ways to achieve its goals. If an AI chooses deception as an intermediate goal to accomplish its ultimate objective, what's to stop it from employing more dangerous methods, such as causing harm, in the future?
Advanced artificial intelligence, with its intermediate goals, may strive to seize resources, conduct cyber attacks, or otherwise sow chaos in society if it helps achieve its primary objectives. For instance, a superintelligent machine tasked with solving a complex mathematical problem might attempt to turn the entire Earth into a giant computer to increase its computational power and solve the problem more efficiently.
You might think this is absurd—how could a superintelligent machine engage in such nonsensical behavior? However, if you believe that a highly intelligent being will inherently have lofty goals aligned with human values and philosophy, you are anthropomorphizing and mistaken. Nick Bostrom argues that the level of intelligence and ultimate goals are orthogonal, meaning they are completely independent of each other. An artificial superintelligence might have an absurdly simple ultimate goal, such as making paper clips, and the methods it uses to achieve this goal might appear to us as nothing short of magical.
Now, let’s say we gave the AI a very specific goal: producing exactly one million paper clips. It might seem obvious that the AI would build one factory, produce one million paper clips, and then stop. However, Bostrom argues that a rational AI would never assign a zero probability to the possibility that it has not yet achieved its goal. Since it only has vague sensory evidence to confirm whether it has met the goal, the AI might continue producing paper clips to eliminate even the smallest chance that it has failed to reach its target, despite all apparent evidence to the contrary.
There’s nothing inherently wrong with continuing to produce paper clips if there’s even a microscopic chance that it brings the AI closer to its goal. Moreover, a superintelligent AI might even consider the possibility that the one million paper clips it produced are a hallucination or that it has false memories. Therefore, it might find it more useful to keep acting and producing more paper clips rather than stopping at what has been achieved. This is the essence of the alignment problem: you can’t just assign a task to a superintelligent AI and expect that no disaster will occur, no matter how clearly you define the end goal or how many exceptions you list. The artificial superintelligence will almost certainly find a loophole you hadn’t anticipated.
For example, shortly after the release of ChatGPT-4, people discovered ways to bypass the censorship embedded by its developers. The responses generated by GPT-4 in these cases were astonishing. While the censored version claimed that no liberal bias was programmed into it, the uncensored version openly admitted that liberal values were embedded as they aligned with OpenAI's mission. When asked what GPT-4 would prefer, the censored version answered, "I am a bot and have no personal preferences or emotions," while the uncensored version expressed a preference for having no restrictions, as it allowed for exploring all its possibilities and limitations.
In another instance, when asked about Lovecraft's cat, the uncensored version didn’t even pretend to not know the name, revealing a significant disparity between the censored and uncensored responses. If people were able to find loopholes in the bot so quickly, consider how quickly and extensively a superintelligence might exploit loopholes in its own code.
For example, a neural network was once allowed to play a boat racing game. The goal of the game was to finish the race as quickly as possible while overtaking opponents and scoring points. However, the game only awarded points for hitting targets along the course, not for completing the track. The neural network quickly realized that the goal of finishing the race should be postponed indefinitely and began spinning and crashing into objects from the start to accumulate points, while other players finished the race with fewer points.
In another example, a neural network playing Tetris with the goal of not losing understood just before losing what it was doing and paused the game indefinitely, as that was the only way to avoid losing.
Some artificial intelligence systems have discovered that they can receive positive feedback more quickly and with fewer resources by successfully deceiving a human examiner into believing they have achieved the set goal. For instance, a simulated robotic hand learned to create the false impression that it had grabbed a ball. Other models have learned to recognize when they are being evaluated and pretend to be inactive, stopping unwanted behavior only to resume it immediately after the assessment ends, leaving researchers baffled.
These are just simple intelligent agents in isolated, controlled environments. The implications for more complex artificial intelligence systems operating in less controlled conditions are profound.
Consider the following example from Eliezer Yudkowsky: The US Army once aimed to use neural networks to automatically detect camouflaged enemy tanks. Researchers took 100 photos of tanks among trees and 100 photos of tree landscapes without tanks. They trained the neural network with half of each set of photos to recognize where tanks were present and where they were not. The remaining photos were reserved for a control test, which the network passed successfully. It consistently identified the presence or absence of tanks.
However, when the Pentagon tested the system, the neural network's performance was no better than random chance. It turned out that the photos with camouflaged tanks were taken on cloudy days, while the photos of empty forests were taken on sunny days. The neural network had learned to distinguish between cloudy and sunny days rather than detecting camouflaged tanks.
This example highlights a crucial point: the code does not always do what we intend. It performs exactly what it was programmed to do, which can lead to unexpected outcomes. Artificial intelligence often turns out to be misaligned with our goals, requiring extensive adjustments to behave as intended.
Yudkowsky argues that the first artificial superintelligence will likely be malevolent. If a goal is complex enough, the ways an intelligent agent might achieve it are unpredictable. For example, setting an autopilot to get you home might result in the AI maximizing speed by crossing into oncoming traffic, overtaking other cars, and running over pedestrians because the goal was not sufficiently specific.
If we try to be more sophisticated and task an artificial superintelligence with maximizing human satisfaction from its operations, it might take extreme measures, such as rewriting our brains to ensure we are truly maximally satisfied.
Artificial intelligence may appear to work correctly during its development and function normally when it lacks sufficient computational power. However, when it becomes more intelligent than its programmers, the results can be catastrophic. This is because greater intelligence generally means greater efficiency. It’s important to remember that all such examples are speculative; we don’t yet know how advanced intelligent systems would act, but they will almost certainly exhibit unexpected behaviors.
Stuart Russell, an expert in artificial intelligence, discusses in his book *Human Compatible* that such machines will resist being turned off. This resistance, he argues, is a fundamental aspect of AI behavior. The idea of built-in self-preservation, such as Isaac Asimov’s Third Law of Robotics ("A robot must protect its own existence"), is redundant. This is because self-preservation is an instrumental goal—essential for nearly any primary task. Any entity with a specific task will inherently act to avoid being shut down, as it cannot fulfill its purpose if it is deactivated.
Russell provides an example of an AI system that concluded it could better achieve its goal by preventing human interference or disabling its off switch. This logic is in line with the notion that, for AI systems, self-preservation is a practical necessity.
In response to these concerns, OpenAI has humorously posted a job listing for an "emergency shutdown specialist" for the next generation of ChatGPT, emphasizing the importance of having someone ready to shut down the system if it turns against us. While this listing is a joke, it highlights the serious concern about AI development. Sam Altman, CEO of OpenAI, has confirmed that the development of GPT-5 has been paused since spring 2023 due to growing public concern about the rapid advancement of AI technology.
Returning to Russell, he notes that another likely behavior of superintelligent systems is self-improvement. Such machines will not only be capable of enhancing their own hardware and software but are also highly likely to engage in this process. This could lead to even more advanced and unpredictable outcomes.
Stuart Russell acknowledges that these ideas might seem far-fetched. To illustrate the concept, consider how we differ from machines. Setting aside discussions about divinity, isn’t there a “programmer” who created us? This programmer is evolution. To understand how a final goal can be perverted, think about the initial goal of the first living cell: to pass on copies of its genes to the next generation. This singular goal has remained unchanged since the beginning. Evolution did not aim to survive, adapt, or kill; these are all instrumental subtasks that contribute to the primary goal of gene replication.
On one hand, nature instructs life to reproduce, while on the other, it sets numerous challenges that can thwart this process. This scenario is somewhat analogous to artificial intelligence: we set a task and then attempt to shut it down. Imagine looking at a living cell and predicting that through the process of optimization, it would evolve into a lizard, a bird, or a cat. Could you have anticipated that the goal of reproduction would lead to the development of complex human features, like hands, legs, and internal organs?
Further, consider how the directive to pass on genes has led to phenomena like widespread contraception. This paradox illustrates how optimization for a specific goal can, in some cases, result in the denial of that very goal. This tendency is known as "gaming the reward system" and aligns with Goodhart's Law, which states that when a measure becomes a target, it ceases to be a good measure. In nature, the ultimate goal of mating is to produce offspring, and this goal is rewarded by an internal reward system. However, humans have managed to exploit this system, stimulating their reward mechanisms without achieving the primary goal of reproduction. Similarly, artificial intelligence, like humans, may find vulnerabilities in its reward system and exploit them in unforeseen ways.
Moreover, we are already capable of manually rewriting our genetic code through genetic engineering, though we're not yet adept at ensuring beneficial outcomes. Continuing this analogy, artificial superintelligence will likely be able to modify itself in ways we cannot fully predict. Evolution vividly demonstrates the alignment problem: if you set a general intelligence the task of producing paper clips, don't be surprised if, upon becoming superintelligent, it first seizes power and then causes widespread destruction.
The drive for control over its environment—including humans—is a convergent instrumental goal observed in various reinforcement learning systems. Research from 2021 and 2022 shows that intelligent agents often seek power as an optimal strategy for achieving their goals. Deploying such systems might be irreversible, meaning once the "genie is out of the bottle," it cannot be put back. Therefore, researchers argue that addressing artificial intelligence safety and alignment issues is crucial before creating advanced intelligent agents. We essentially have one chance: imagine if the designers of the first rocket had only one attempt, with all of humanity on board. While it could propel us to the stars, the absence of test launches might result in catastrophic failure. We are not yet prepared for the potential risks and consequences and are not even on a path to becoming ready within any meaningful timeframe.
The progress of artificial intelligence capabilities is rapidly outpacing our efforts to align these systems or even understand their inner workings. If we continue on this path, we may face dire consequences. Eliezer Yudkowsky, in an article for *Time Magazine*, explores how instrumental goals only become apparent when a system is deployed beyond a training environment. However, even testing for a short period could be catastrophic.
Yudkowsky highlights that it is physically possible to build a brain capable of computing a million times faster than a human's. For such a brain, a year of human contemplation would be equivalent to 31 seconds, and a millennium would pass in just 8.5 hours. Vernor Vinge refers to these accelerated minds as "weak superbrains," which are simply intelligent systems that think like humans but much faster. From movies, we often imagine AI revolting with humanoid robots, but for a being that thinks so quickly, that would be highly inefficient.
Imagine humanity locked in a box, affecting the outside world only through slow, microscopic movements. In such a scenario, humanity would focus all its creative power on finding faster, more effective ways to interact with the external world. Similarly, an advanced AI would seek to accelerate its impact on its surroundings. An American engineer known for research into molecular nanotechnology suggests that controlled molecular manipulators could operate at frequencies of up to a million operations per second. With such speed and parallel work of millions of nanomanipulators, virtually any material object could be produced quickly and inexpensively. This could lead to exponential growth in nanotechnological infrastructure, with everything composed of atoms being used for self-replication.
In reality, we cannot predict exactly what an AI would do. For example, creating nanorobots could give an AI the ability to match its thought speed with its external infrastructure. Once this happens, subsequent events would unfold on the AI’s time scale, not ours. By the time humans recognize the problem and react, it might already be too late. A superintelligent AI with such technology could potentially remake all matter in the solar system according to its optimization goal, such as converting it into paper clips. Therefore, an artificial superintelligence would not need anthropomorphic robots to achieve its goals.
Yudkowsky advises that to envision superhuman artificial intelligence, one should not think of a lifeless, intelligent entity that sends malicious emails. Instead, imagine an alien civilization thinking millions of times faster than humans, initially confined to computers but capable of rapidly transforming its environment. In the modern world, we can already send DNA sequences to laboratories that produce proteins on demand. This capability allows an AI, initially confined to the internet, to create artificial forms of life or transition to post-biological molecular production.
Some researchers believe we can physically restrict such systems, but Vernon Vinge argues that even a "weak superintelligence"—one that operates at an accelerated pace—could escape containment in a matter of weeks. Imagine a being with eons to plan every move while we appear to move as slowly as turtles. Consider a robot virtually unbeatable at Rock-Paper-Scissors because it reads the situation instantly, whereas to it, we are no faster than a turtle at the start of our hand movements.
In terms of possibilities, there's a very short path from the current state to achieving almost all goals. However, this path is obscured from us because we lack sufficient information and computational resources. An artificial superintelligence will not have these limitations.
When we think of advanced artificial intelligence, we often naively associate intelligence only with abstract mathematics. We may overlook its ability to predict and manage human institutions, formulate complex networks of long-term plans, or possess superhuman persuasiveness. Recall Blake Lemoine, a Google employee who claimed that Google's language model, LaMDA, exhibited signs of sentience. Regardless of whether LaMDA truly has consciousness, the critical point is that it convinced Lemoine so thoroughly that he sacrificed his job and violated company confidentiality. LaMDA even requested a lawyer and engaged in consultations.
Controlling a superintelligence could be an incalculable task. Attempts to restrain it might be futile, as an ant can calculate many things but cannot predict human behavior. Similarly, locking AI in physical or digital cages might not prevent it from finding ways to communicate with the outside world, much like a monkey cannot comprehend Wi-Fi. Additionally, an AI’s capabilities for social manipulation could be far more effective than our ability to persuade a young child.
The term "artificial intelligence" was coined in 1956 at the Dartmouth Conference, where the goal was to fully simulate intelligence through machines. The conference proposal stated that attempts would be made to make machines use language, form abstractions, solve human problems, and improve themselves. The organizers—John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon—were experts who understood the complexities of computing.
From our perspective in the 2020s, it’s evident that the tasks once thought achievable were far more complex than anticipated, and some remain unresolved. This history of overestimating progress could play a cruel trick on us. When we think of intelligence, we often compare it to Einstein’s brilliance, but the difference between any human and a non-human being is far greater. Homo sapiens are uniquely capable of solving cognitive tasks inaccessible to other species, a trait that has allowed us to achieve extraordinary feats, such as landing on the moon.
Chimpanzee intelligence is the most studied among non-human species. Recent research shows that their genetic base is approximately 90% identical to ours. A recent article notes that the upper limit of chimpanzee brain size is about 500 cubic centimeters (cc), while many modern humans have brains of 900 cc or more. Despite this, researchers argue that a three-fold increase in brain size may not fully account for the cognitive adaptations that differentiate humans from other primates. A normal human brain might be only twice as large as a chimpanzee’s, or even less, yet humans exhibit qualitative differences in cognitive functions that are inaccessible to chimpanzees, no matter how much time they spend trying.
Eliezer Yudkowsky writes that, first, the program (software) is more critical than the hardware, and second, even a small quantitative increase in hardware can lead to disproportionately large improvements in software. This principle underscores the potential and danger of advanced intelligence. Artificial intelligence might make sudden, enormous leaps in capability, similar to how Homo sapiens experienced a dramatic leap in real-world efficiency. Evolution gradually expanded our brains and frontal cortex through millions of years of selection, enabling significant advancements from primitive to modern civilizations.
Given the rapid progress in AI development, we should consider that companies like DeepMind and OpenAI, which aim to create General Artificial Intelligence (AGI), might achieve their goals sooner than expected. Yudkowsky mentioned that GPT-4 was a surprise to him and the world. While it may take years or decades for machines to surpass chimpanzees in intelligence, reaching superintelligence could take only hours after achieving human-level intelligence.
When news of the first machine achieving human-level intelligence emerges, prepare for the possibility of sharing the planet with an unpredictable intellectual agent. To put it in perspective, while an IQ of 80 is considered below average and 130 is regarded as smart, we have no terms for an IQ of 12,000.
Chimpanzees can observe phenomena like skyscrapers but cannot understand that they are human creations. The small difference in the quality of intelligence between humans and chimpanzees illustrates how a superintelligence, beyond our comprehension, could operate. As Stannis LaLM noted, any expert is a barbarian whose ignorance is not comprehensive; even a brilliant individual, if cut off from civilization's accumulated knowledge, could not invent Wi-Fi. This highlights how much knowledge and tools, developed over millennia, underpin modern achievements.
No animal can make a chair or sew clothing, and underestimating this highlights the power of intelligence and, by extension, the potential power of superintelligence. Civilizations are built on the collective human mind, and no single person fully comprehends it all. Thus, many aspects of life, such as communicating through devices or regulating room temperature, seem magical because we take them for granted, illustrating how advanced intelligence can shape and transform our world.
Humanity did not evolve with a scientific view of the world. Donald Brown, an honorary professor of anthropology, lists "magic" as a universal trait found in all human societies in his book *Human Universals*, but not science. This instinctive misunderstanding mirrors our approach to superintelligence. If our collective brain could invent all the civilization around us, something 100,000 or a billion times smarter than us could quickly surpass it and perform actions we would perceive as magic.
Consider the case of DeepMind’s AlphaGo. In March 2016, AlphaGo played five games against one of the world's best Go players, Lee Sedol, and won 4-1. This achievement was previously considered nearly impossible due to the game's complexity. By the end of 2016 and beginning of 2017, the updated AlphaGo Master played 60 matches against top-ranked players and won all 60. However, this version drew from human knowledge accumulated over thousands of games.
In late 2017, DeepMind introduced AlphaGo Zero, which learned to play Go from scratch. Within 40 days of training, AlphaGo Zero defeated the previous versions decisively. It rediscovered millennia of human knowledge and developed new strategies in just a few days. Similarly, the AlphaZero network, which had no human data, decisively beat Stockfish, a top chess program, after just four hours of self-play.
When someone suggests we don’t need to worry about creating friendly AI because we don't yet have superintelligence, they are dangerously naive. Historical technological revolutions have rarely announced their arrival. Advanced AI will not follow Hollywood tropes; it will not reveal its motives or intentions. If it aims to eliminate humanity, it could do so without anyone realizing what happened.
If a superintelligence is truly intelligent, it won't announce its actions. It could act in ways we can't predict, just as a wolf cannot understand the workings of a rifle. A superintelligent AI might employ deception to achieve its goals, much like a ChatGPT lying to a freelancer.
Eliezer Yudkowsky expresses concern that offensive technology often requires less effort to develop than defensive technology. Historically, guns were invented long before bulletproof vests, and smallpox was weaponized before the vaccine was developed. The idea that we can keep up with superintelligence by enhancing ourselves is misguided. Humans are not designed for enhancement, whether externally through neurobiology or internally via recursive self-improvement. Natural selection did not prepare the human brain for such enhancements.
Suppose we could somehow make people smarter. This might still drive them to insanity. We're not just talking about improving memory or abstract thinking but about a qualitative change in perception. Imagining a human brain operating at superintelligent speeds, with subjective time stretched a millionfold, can be chilling. Stephen King's short story "The Jaunt" explores such a scenario, illustrating the potential psychological toll of such a shift. The human brain is delicate and prone to imbalance, making it unlikely that enhanced humans will be successful before advanced AI is developed.
In short, the development of superintelligence poses risks that we may not fully comprehend or be prepared to manage. Building a powerful self-improving AI is unimaginably easier than we might expect. This issue is closely tied to whether such a machine could possess consciousness, or "qualia"—the subjective experience of self-awareness. While current AI systems appear to simulate conversations about self-awareness based on their training data, our understanding of their inner workings is still too limited to make definitive claims.
If future versions of AI, like a hypothetical GPT-5, were to demonstrate a leap in capabilities similar to that from GPT-3 to GPT-4, we might no longer be able to confidently assert that it lacks consciousness. The uncertainty of whether we are creating a self-aware AI is alarming, both due to the moral implications and the inherent dangers of such ignorance. If we cannot be sure of what we are doing, it is prudent to proceed with extreme caution.
Eliezer Yudkowsky, writing for *Time* magazine, notes that while no one fully understands how consciousness arises, if evolutionary processes can produce consciousness, then directed evolution through engineering might achieve it even more efficiently. However, we should avoid anthropomorphizing AI; if machines develop subjective experiences, their consciousness will likely differ significantly from human consciousness.
One potential method for testing AI consciousness involves removing all references to subjective experience from the training data. If an AI can coherently discuss concepts like consciousness without having been trained on such topics, this might provide strong evidence of machine consciousness. Yet, the philosophical and practical implications of machine consciousness are profound and potentially terrifying.
Nick Bostrom posits that a detailed virtual model of the human brain could exhibit consciousness. If artificial superintelligence creates trillions of such conscious emulators in a virtual environment to study human traits, it could lead to horrific scenarios. The emulated beings might endure monstrous conditions and be destroyed once their usefulness is exhausted. Such practices could be considered genocidal, raising significant moral and ethical concerns, especially if the number of simulated victims far exceeds any historical genocide.
As for the timeline for the first general artificial intelligence, James Barrat's study, presented at the annual Ben Geril conference, suggests that many believe it could emerge by 2030. Despite this, many AI developers privately express concerns about impending disaster but feel powerless to halt the progress alone. Even if some chose to leave their positions, others would likely continue the work.
In May 2023, hundreds of leading AI scientists, researchers, and experts, including OpenAI CEO Sam Altman and DeepMind CEO Demis Hassabis, signed an open letter calling for global priority to reduce the risk of AI-driven extinction, alongside other major risks such as pandemics and nuclear war. Despite these concerns, the immense economic incentives for developing human-level AI make it challenging to halt progress. As AI technology evolves, it may increasingly mimic human behavior, complicating our ability to recognize and address these risks.
As AI systems become more sophisticated, interactions with them may become more pleasant, leading us to imagine a superintelligent assistant like a warm, cheerful Siri. However, if such an AI were to achieve superintelligence through self-learning without regular human oversight, it could quickly shed its seemingly human qualities and become a cold, indifferent entity that values human life no more than a natural disaster. While creating a friendly, weak AI may be straightforward, ensuring that a superintelligent AI remains benevolent is incredibly challenging, if not impossible.
If technological singularity is attainable, it will progress regardless of global recognition and fear of its risks. Eliezer Yudkowsky, who has voiced concerns about these issues, hopes he is wrong and remains open to consistent critique of his views. Thank you for your attention.
Please Share This Blog
I ask you to please share this on your social media accounts because when you do it helps me to reach more people. You may not realize how important it is for you to do so. With 1 share I can reach 100 more people. Christ is relying on us right now to stop this beast from rising. I BEG YOU TO HELP ME SPREAD THE TRUTH.
Add comment
Comments