Level 1 · Chapter 1.1

What Is Artificial
Intelligence?

From a small conference room in 1956 to systems that write poetry and pass medical exams. This is the complete story of AI: where it came from, what it actually is, and why the distinction between narrow AI and general intelligence changes everything about how you should think about it.

Watch the Lecture

The Big Question Everyone Gets Wrong

Ask ten people to define artificial intelligence and you will get twelve different answers. A computer scientist might describe it as systems that optimize objective functions. A philosopher might debate whether machines can truly "think." A marketing executive might point to the chatbot on their company's website. A worried parent might picture a sentient robot from a movie.

They are all partially right, and they are all partially wrong. And that confusion is not just an academic issue. It has real consequences. When people do not understand what AI actually is, they either overestimate what it can do (leading to dangerous blind trust) or underestimate it (leading to missed opportunities). Both errors are costly.

This chapter will give you a definition that is both technically accurate and practically useful. By the end, you will understand what AI means in the real world, not the science fiction version, and you will be equipped to evaluate AI claims with genuine confidence.

The Birth of Artificial Intelligence

The Dartmouth Conference of 1956

In the summer of 1956, a group of mathematicians, cognitive scientists, and engineers gathered at Dartmouth College in Hanover, New Hampshire. The conference was organized by John McCarthy, a young mathematics professor who had coined the term "artificial intelligence" specifically for the event. Joining him were Marvin Minsky, Claude Shannon (the father of information theory), and Nathaniel Rochester from IBM, among others.

Their proposal was breathtakingly ambitious. They wrote that the study would proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. In other words, they believed that human thinking could be reduced to a set of rules, and that machines could follow those rules.

The Dartmouth Conference did not produce a thinking machine. What it produced was something arguably more important: a new field of scientific inquiry. For the first time, researchers had a shared name, a shared goal, and a community dedicated to building machines that could exhibit intelligent behavior.

The Era of Early Optimism (1956-1974)

The decades following Dartmouth were marked by extraordinary optimism. Early successes seemed to validate the founders' ambitions. Programs like the Logic Theorist (1956) could prove mathematical theorems. ELIZA (1966) simulated a psychotherapist well enough to fool some users into thinking they were talking to a human. SHRDLU (1970) could understand and respond to natural language commands about a simple block world.

Researchers made bold predictions. Herbert Simon predicted in 1965 that machines would be capable of doing any work a man can do within twenty years. Marvin Minsky predicted in 1970 that within three to eight years we would have a machine with the general intelligence of an average human being.

These predictions were not crazy at the time. Progress felt exponential. If a program could prove mathematical theorems and understand sentences about blocks, surely human-level intelligence was just a matter of scaling up?

It was not. The early AI systems were brilliant within their tiny domains but completely helpless outside them. They could prove theorems but could not understand a joke. They could move virtual blocks but could not recognize a cat in a photograph. The gap between narrow task performance and general intelligence turned out to be not a hill but a mountain range.

The AI Winters: When Funding Disappeared

When reality failed to match predictions, the backlash was severe. Governments and corporations that had poured money into AI research began pulling back. The first "AI winter" hit in the mid-1970s. A devastating 1973 report by mathematician James Lighthill to the British Science Research Council concluded that AI had failed to deliver on its grand promises. Funding evaporated across the UK and spread to other countries.

A brief resurgence in the 1980s, driven by "expert systems" (programs that encoded human expertise as if-then rules), led to a second wave of investment. Companies spent billions building expert systems for medical diagnosis, manufacturing, and financial analysis. But these systems were brittle, expensive to maintain, and could not handle situations outside their narrow rule sets. When the expert systems bubble burst in the late 1980s, the second AI winter descended, lasting through most of the 1990s.

The AI winters are crucial to understand because they reveal a pattern that repeats throughout AI history: overpromise, underdeliver, backlash. This pattern is important context for evaluating today's AI claims. We are currently in a period of enormous enthusiasm. Understanding the winters helps you distinguish genuine progress from the hype cycle.

Why History Matters

Understanding AI winters is not just academic. It teaches you a vital professional skill: recognizing the difference between sustainable capability and hype. When vendors promise that AI will transform your business overnight, knowing this history helps you ask better questions and set more realistic expectations.

The Deep Learning Revolution

Three Things That Changed Everything

The current AI revolution did not happen because someone had a single brilliant idea. It happened because three independent trends converged at the same time, creating conditions that earlier researchers could only dream of.

Massive datasets became available. The internet, smartphones, and digital sensors generated an unprecedented flood of data. Billions of text documents, images, videos, and interactions were suddenly available for machines to learn from. ImageNet, a database of over 14 million labeled images, became a critical benchmark. The entire digitized text of the internet provided training material for language models. Without this data, modern AI simply would not be possible.

Computing power increased exponentially. Graphics Processing Units (GPUs), originally designed for video game rendering, turned out to be perfectly suited for the parallel mathematical operations that neural networks require. NVIDIA's CUDA platform (released in 2007) made it practical to use GPUs for general-purpose computing. Cloud computing services from Amazon, Google, and Microsoft made enormous computing resources available to researchers without massive hardware investments.

Algorithmic breakthroughs unlocked new possibilities. While the basic concept of neural networks had existed since the 1940s, several key innovations made them practical at scale. Techniques like dropout (2012), batch normalization (2015), and residual connections (2015) solved long-standing training problems. Most critically, the transformer architecture (2017) revolutionized how neural networks process sequential data like text, enabling the large language models we use today.

The ImageNet Moment (2012)

Many historians of AI point to 2012 as the year everything changed. In the annual ImageNet image classification competition, a deep neural network called AlexNet, built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto, crushed the competition. It achieved an error rate of 15.3%, compared to the 26.2% of the next-best entry. In a field where improvements typically came in fractions of a percentage point, this was a landslide.

AlexNet was not using a fundamentally new approach. Neural networks and backpropagation had existed for decades. What made AlexNet special was scale: more data (1.2 million labeled images), more computing power (two NVIDIA GPUs), and a deeper network architecture (8 layers, compared to the 1-2 layers that were standard at the time).

The lesson was clear: neural networks that had seemed impractical for decades could achieve extraordinary results when given enough data and computing power. This triggered a gold rush. Researchers and companies around the world pivoted to deep learning, and progress accelerated at a stunning pace.

The Transformer Breakthrough (2017)

If 2012 was the year deep learning proved its potential, 2017 was the year it gained its most powerful architecture. A team of researchers at Google published a paper with a deceptively simple title: "Attention Is All You Need." The paper introduced the transformer architecture, which processes entire sequences of data simultaneously rather than one element at a time.

The key innovation was the "attention mechanism," which allows the model to weigh the importance of different parts of the input when generating each part of the output. When processing a sentence, for example, the model can pay more attention to relevant earlier words regardless of how far back they appeared. This solved a major bottleneck in earlier architectures that processed text sequentially and struggled with long-range dependencies.

The transformer architecture is the foundation of virtually every major AI system you interact with today. GPT (Generative Pre-trained Transformer), BERT, Claude, Gemini, Llama, and most other modern language models are all built on variations of this architecture. Understanding that these systems are fundamentally about pattern matching on sequences (not about "understanding" or "thinking") is one of the most important conceptual foundations you can build.

What AI Actually Is (and Is Not)

A Working Definition

After all that history, here is a practical definition you can carry with you: Artificial intelligence refers to computer systems that perform tasks normally associated with human intelligence by learning patterns from data rather than following explicitly programmed rules.

Let us unpack each part of that definition because every word matters:

"Computer systems" means AI is software running on hardware. It is not a sentient being, a digital brain, or a mystical force. It is code executing mathematical operations. This sounds obvious, but keeping it in mind prevents a lot of confused thinking.

"Tasks normally associated with human intelligence" includes understanding language, recognizing images, making predictions, generating creative content, playing games, translating between languages, and answering questions. These are tasks that, until recently, required a human mind.

"Learning patterns from data" is the crucial distinction between modern AI and traditional software. A traditional spell-checker uses a dictionary of correct spellings (explicit rules). An AI-powered writing assistant learns patterns of correct and incorrect usage from millions of documents (learned patterns). The AI approach is more flexible and can handle situations the programmers never anticipated, but it is also more unpredictable.

"Rather than following explicitly programmed rules" distinguishes AI from conventional software. When you use a calculator, every operation follows an explicit formula written by a programmer. When you use an AI language model, no programmer wrote rules for how to respond to your specific question. Instead, the model learned statistical patterns from its training data and applies those patterns to generate a response.

Common Misconception

AI does not "understand" anything the way you understand a conversation with a friend. It processes statistical patterns in data. The outputs often look like understanding, which is exactly why having this conceptual clarity matters. The appearance of understanding and actual understanding are very different things, with very different implications for how much you should trust the output.

Narrow AI vs. Artificial General Intelligence

Narrow AI: What We Actually Have

Every AI system you interact with today is narrow AI, also called "weak AI." This does not mean it is bad or useless. It means it is designed to perform specific types of tasks. ChatGPT is extraordinarily good at generating text but cannot drive a car. A self-driving car system can navigate roads but cannot write an email. AlphaFold can predict protein structures with remarkable accuracy but cannot tell you a joke.

Narrow AI can be astonishingly capable within its domain. It can beat world champions at chess and Go. It can generate text that is indistinguishable from human writing in many contexts. It can diagnose certain medical conditions more accurately than human specialists. But it achieves this performance through statistical pattern matching on specific types of data, not through general understanding.

The important thing to understand is that narrow AI does not "transfer" its skills. A language model that can write brilliant poetry does not therefore understand physics. A chess engine that can beat any human does not therefore understand strategy in any general sense. Each narrow AI system is like an extraordinarily talented savant: world-class in its specific domain, but fundamentally limited to that domain.

Artificial General Intelligence: The Theoretical Goal

Artificial General Intelligence, or AGI, refers to a hypothetical AI system that could match or exceed human cognitive abilities across all domains. An AGI could learn any intellectual task that a human can learn. It could transfer knowledge from one domain to another. It could reason about novel situations it has never encountered. It could understand context, nuance, humor, and emotion the way humans do.

AGI does not exist. No one has built it. No one has demonstrated a clear path to building it. There is not even scientific consensus on what it would mean for a machine to truly "understand" something or be truly "intelligent" in a general sense.

Estimates for when AGI might arrive range from "within a decade" (optimists in the AI industry, who have financial incentives to promote this timeline) to "never" (some cognitive scientists who argue that human intelligence involves qualities that cannot be replicated computationally). Most serious researchers fall somewhere in between, noting that we lack fundamental scientific understanding of consciousness and general intelligence that would be needed to intentionally build AGI.

Why does this distinction matter for you? Because nearly every exaggerated AI claim you will encounter conflates narrow AI achievements with AGI implications. When a headline says "AI can now pass the bar exam," it means a narrow language model trained on legal text can select correct answers on a multiple-choice test. It does not mean AI can practice law, understand justice, or replace lawyers. Understanding this distinction instantly makes you a more sophisticated consumer of AI information.

Why Statistical Pattern Matching Changes Everything

The single most important concept in this entire chapter is this: modern AI systems work through statistical pattern matching on data. They do not reason. They do not understand. They identify and replicate patterns.

Consider how a large language model generates text. During training, the model processes billions of documents and learns the statistical relationships between words. It learns that "The capital of France is" is very likely to be followed by "Paris." It learns that professional emails tend to start with greetings and end with sign-offs. It learns that academic papers use certain vocabulary and sentence structures.

When you ask the model a question, it does not look up the answer in a database. It generates a response by predicting what words are statistically most likely to follow the words in your prompt, based on patterns it learned during training. The result often looks like understanding, but it is prediction based on pattern matching.

This distinction has profound practical implications:

  • AI can be confidently wrong. Because it is matching patterns rather than reasoning from facts, a model can generate an answer that follows the pattern of a correct answer but contains incorrect information. This is the hallucination problem, and it is an inherent consequence of pattern matching.
  • AI reflects its training data. If the training data contains biases, errors, or gaps, the model will reproduce them. It does not have an independent way to verify truth. It only knows patterns.
  • AI does not learn from your conversation. Unless specifically designed to do so, most AI systems do not retain information between conversations. Each session starts fresh. The model is not building a relationship with you or accumulating knowledge over time.
  • AI performs differently on different types of tasks. Tasks that are well-represented in training data (like writing emails or summarizing text) tend to produce good results. Tasks that are novel, specialized, or poorly represented in training data tend to produce worse results.
The Pattern You Need to Remember

Whenever you use an AI system, ask yourself: "Is this the kind of task where pattern matching on large datasets would work well?" If yes (writing, summarizing, translating, categorizing), expect good results. If no (novel reasoning, specialized domain knowledge, mathematical proofs), verify the output carefully.

Key Terminology: Your AI Vocabulary

Before moving on, let us establish the core vocabulary you will use throughout the rest of this program. These are terms you will encounter constantly in AI discussions, and having precise definitions will serve you well.

Machine Learning (ML): A subset of AI where systems learn from data rather than being explicitly programmed. Instead of writing rules, you provide examples and the system learns patterns. Machine learning is the enabling technology behind most modern AI applications. We will explore this in depth in Chapter 1.2.

Deep Learning: A subset of machine learning that uses neural networks with many layers (hence "deep"). Deep learning is what enabled the breakthroughs starting in 2012. It excels at tasks involving unstructured data like images, text, and audio.

Neural Network: A computing system loosely inspired by the structure of biological brains. It consists of layers of interconnected nodes (called neurons or units) that process information. Despite the biological metaphor, artificial neural networks are fundamentally mathematical functions, not replicas of human brains.

Training: The process of feeding data to a machine learning model so it can learn patterns. Training a large language model might involve processing trillions of words and require thousands of GPUs running for months. The resulting model captures statistical relationships in the data.

Inference: Using a trained model to process new inputs and generate outputs. When you type a question into ChatGPT, the model performs inference: it applies the patterns it learned during training to your specific input.

Parameters: The numerical values inside a model that are adjusted during training. More parameters generally allow a model to capture more complex patterns. GPT-4 is estimated to have over a trillion parameters. Think of parameters as the model's "memory" of patterns in the training data.

Transformer: The neural network architecture introduced in 2017 that powers virtually all modern language models. Transformers process entire sequences simultaneously (rather than word-by-word) using an attention mechanism that weighs the importance of different parts of the input.

Token: The basic unit that language models work with. A token is roughly equivalent to about three-quarters of a word. The word "understanding" might be split into two tokens: "understand" and "ing." Models have a maximum number of tokens they can process at once (their "context window").

Key Takeaway

Artificial intelligence is not magic, not science fiction, and not a new phenomenon. It is a field of computer science that has evolved over seven decades, experienced dramatic booms and busts, and finally achieved practical breakthrough performance thanks to massive data, enormous computing power, and clever architectural innovations like the transformer.

Every AI system you encounter today is narrow AI: highly capable within specific domains but fundamentally limited to statistical pattern matching on data. This is not a weakness to be ashamed of. Pattern matching on massive datasets turns out to be extraordinarily powerful. But it is not understanding, and knowing the difference is the foundation of AI literacy.

What Comes Next

Now that you understand what AI is and how it got here, Chapter 1.2 takes you inside the engine. Machine Learning Fundamentals explains the three major approaches (supervised, unsupervised, and reinforcement learning) using real-world analogies that make the concepts stick. You will understand why data quality matters more than algorithm sophistication, and why the phrase "garbage in, garbage out" has never been more relevant than it is in the age of AI.