On Intelligence

Book

 2021/06/23 

Complexity is a symptom of confusion, not a cause.

1 Artificial Intelligence

计算机学界的主流观点：不需要学习大脑
此观点的起始：Turing Test，即让人们认为它是智能，产生 intelligent behavior 更重要
the Chinese Room: 在中文屋中智能没有产生，作者认为 Understanding cannot be measured by external behavior; it is instead an internal metric of how the brain remembers things and uses its memories to make predictions. 但绝大多数的所谓”AI”和这里的中文屋和这一定义无任何相似之处

2 Neural Networks

一些可能已经过时的观点：

Neural Network 没有考虑 feedback 和 time changing inputs
Cognitive Scientist 虽然想记录大脑中的 feedback，但是迫于现有技术(fMRI)只能记录脑内活动的位置，无法记录连续的变化

3 The Human Brain

Mind is the creation of the cells in the brain.

The cortex is extremely flexible and that the inputs to the brain are just patterns. It doesn’t matter where the patterns come from; as long as they correlate over time in consistent ways, the brain can make sense of them.

Function Hierarchy: 脑的每个功能部分都被划为 hierarchy，以输入的视觉为例
1. V1 (primary sensory areas): rawest, most basic level
2. V2, V4, IT: concerned with more specialized or more abstract aspects
3. association area: receive inputs from more than one sense
虽然是一个 hierarchy，但是实际上当我们从低层走向高层的过程中，information always flows in the opposite direction as well, and with more projections feeding back down the hierarchy than up.
Uniformity of Cortex Parts: Mountcastle found that parts of cortex performing different function is very similar in appearance and structure. From there, he argues that all regions of the cortex are performing the same operation. The thing that makes the vision area visual and the motor area motoric is how the regions of cortex are connected to each other and to other parts of the central nervous system.
Plasticity of Cortex: 我们发现如果大脑某个部分损坏，另一个部分可以接管它原先的人物，这佐证了 Mountcastle 的观点。另有一个 Thought Experiment：假设我们的大脑并不具有如此的可塑性，那么这就意味着我们的某个大脑部位是专门用来学习中文汉字的，但是对于生物进化来说，汉字进化地太快了，大脑根本不可能适应地这么快（或者外国人也可以迅速学中文亦能佐证这一观点）
Similarity of Inputs into Brain: 不管视觉听觉还是什么输入，真正进了人体都是 Action Potentials. They are all the same - just patterns. 也用来佐证 Mountcastle 的观点。There are spatial and and temporal patterns:
- Spatial Patterns: coincident patterns in time; they are created when multiple receptors in the same sense organ are stimulated simultaneously
- Temporal Patterns: patterns entering your sensory organs are constantly changing over time
进一步给出了关于以上两点的例子：认为同时做出反应的假手是自己的手 / 镜头连舌头上的压感接收器，用舌头看东西

4 Memory

驳斥人脑比计算机更快，计算力更高 -> 人脑能做到比计算机快是因为运行原理根本不同 -> 引出本章主旨: the brain doesn’t “compute” the answers to problems; it retrieves the answers from memory.
Four attributes of neocortical memory that are fundamentally different from computer memory:
- The neocortex stores sequences of patterns -> predictions of future events
- The neocortex recalls patterns auto-associatively -> recall memories appropriate for prediction
- The neocortex stores patterns in an invariant form -> apply knowledge of past to new situations that are similar but not identical
- The neocortex stores patterns in a hierarchy.
接下来我们将详细介绍前三个特征并在第6章介绍最后一个特征 “阶层”
Sequential Pattern: story is stored in your head in a sequential fashion and can only be recalled in the same sequence. You can’t remember the entire story at once.

一个有趣的观点: Truly random thoughts don’t exist. Memory recall almost always follows a pathway of association.
Self-Associativity: The memory system can recall complete patterns when given only partial or distorted inputs. This is a result of Hebbian Learning: Firing together Wires together, so when only a part of the cell is activated, the whole group of cells will be activated.
Invariant Representation: 人脑不是CD或硬盘，we don’t remember or recall things with complete fidelity. Instead, the brain remembers the important relationships in the world, independent of the details.

我们常用视觉来举例子：some set of the cells in the face recognition area remain active as long as your friend’s face is anywhere in your field of vision, regardless of its size, position, orientation, scale, and expression. This stability of cell firing is an invariant representation.
小引子导入下一章：下一章的主旨是人脑的主要功能就是 make predictions using memories，but given that the cortex stores invariant information, how can it make specific predictions? It combines knowledge of the invariant structure with the most recent details.

5 A New Framework of Intelligence

Prediction is not just one of the things your brain does. It is the primary function of the neocortex, and the foundation of intelligence. The cortex is an organ of prediction.

这是作者本书中最基本的观点，也就是他所说的新的智能框架 (Memory-Prediction Framework of Intelligence) 。具体地来解释 Prediction 这个概念：Your brain makes low-level sensory predictions about what it expects to see, hear, and feel at every given moment, and it does so in parallel. All regions of your neocortex are simultaneously trying to predict what their next experience will be. “Prediction” means that the neurons involved in sensing your door become active in advance of them actually receiving sensory input. When the sensory input does arrive, it is compared with what was expected. Correct predictions result in understanding. Incorrect predictions result in confusion and prompt you to pay attention. 不局限于 sensory input，motor output 在我们的大脑中也是和 sensory input一样的 pattern, so neocortex can also remembers what behavior (pattern) leads to what sensory input (patter) and we can direct behavior to satisfy its predictions.

作者举了很多关于 prediction 的例子（预知乐曲的旋律，朋友的样子，你妈下一句话会说什么…）其中最有意思的例子应该是 “filling in”，即我们原来了解过的人脑的 “自动补全” 功能：人眼虽然有盲点但我们视觉没有盲点，自动将三个角补全成三角形，描绘出被树遮挡的大楼的样子，等等。Your visual cortex is drawing on memories of similar patterns and is making a continuous stream of predictions that fill in for any missing input.

Behavior Cortex Intelligence 之间到底是个什么关系？从进化历程来看，cortex 起到什么作用？我们为什么要进化出 Cortex: in the beginning, the cortex served to make more efficient use of existing behaviors, not to create entirely new behaviors. 但是后来在进化过程中有了 new behavior？

Reptile: Keen senses and well-developed brains endowed them with complex behavior, but relatively rigid
Mammals: Neocortex covering the old brain (reptile brain)

Now sensory patterns are simultaneously fed into the neocortex and the old brain. The recalled memory is compared with the sensory input stream. It both “fills in” the current input and predicts what will be seen next.
Humans:
- large front part of cortex for high-level planning and thought, so it could store more sophisticated types of memories and make predictions based on complex relationships
- motor cortex makes more connections with our muscles so cortex usurps motor control from other parts of the brain (old brain) and now the cortex can direct behavior to satisfy its predictions.

本部分也反驳了第一章中所谓的人工智能学者的 behavior determines intelligence 观点：早在 reptile 时期，动物就有了生存本能的 behavior，但是直到 cortex 出现，它们才有了 intelligence。而 cortex 的核心功能就是 prediction.

To make predictions of future events, your neocortex has to store sequences of patterns. To recall the appropriate memories, it has to retrieve patterns by their similarity to past patterns (auto-associative recall). And, finally, memories have to be stored in an invariant form so that the knowledge of past events can be applied to new situations that are similar but not identical to the past. How the physical cortex accomplishes these tasks, plus a fuller exploration of its hierarchy, is the subject of the next chapter.

6 How the Cortex Works

invariant representation:

Light receptors in retina concentrate in fovea and sparse out in periphery, so retinal image relayed onto V1 is highly distorted. However, we don’t perceive any retinal pattern change at all. This is a result of invariant representation.

In the course of spanning four cortical stages from retina to IT: cells in retina and V2 are rapidly changing, spatially specific, tiny-feature recognition cells. When we go to IT region, something magical happens and the cells become constantly firing, spatially nonspecific, object recognition cells. (They now fire when seeing a face, no matter it’s on the left or on the right)
Integrating the Senses: 我们到现在为止都是讨论同一类型输入预测同一类型结果，实际上 association area 使得我们也可以预测其他类型的结果，比如视觉输入用来预测听觉，嗅觉等等的结果，亦可以用来指导动作
A New View of V1: 前文的模型有两个问题：仅当到了 IT 这一层时，我们奇迹般地获得了 invariant representation；大脑中大部分区域都是像 association area 一样得到多个输入，但我们的模型中好像 V2 只有 V1 一个输入，V4 只有 V2 一个。

To answer these questions, we propose a new model: V1, V2, V4 are not single cortical regions. Rather, each is a collection of many smaller subregions. V1 has largest number of little cortical areas. V2 has fewer, but larger subregions, each connecting to a number of V1’s subregions. Same for V4 and we have a single IT which has a bird’s eye view of the entire visual world. Now the job of any cortical region is to find out how its inputs are related, to memorize the sequence of correlations between them, and to use this memory to predict how the inputs will behave in the future. We can say each region of cortex forms invariant representation drawn from the input areas hierarchically below it.
A Model of the World: 作者认为世界中 Every object is composed of a collection of smaller objects, and most objects are part of larger objects. In an analogous way, memories are stored in the hierarchical structure of the cortex. Time really matters and information flowing into the brain arrives as a sequence of patterns. 对于每个 cortical region，它识别出来这个 sequence，将其抽象成一个 name - a constant pattern of cell firing，并将这个名字发给他的上级。所以我们也可以说大脑存储的是 Sequence of Sequences. By collapsing predictable sequences into “named objects” at each region in our hierarchy, we achieve more and more stability the higher we go. This creates invariant representations.
Sequences of Sequences: Two processes are at the essence of learning. Assume we are sorting out colored papers.
- bottom-up classification: deciding what color this paper is
- top-down sequence recognition: deciding which sequence are we reading in
Notice these two processes help each other. 1. If you know the most likely sequence for this series of inputs, you will use this knowledge to decide how to classify the ambiguous input. 2. recognizing any sequence would be impossible if you hadn’t first classified each piece of paper.

When we have finally recognized a color sequence, say “red red blue green”, we just pass this name to the next higher region; just like the colors to this region, the name is just a pattern to be combined with other inputs, classified, and then put into yet a higher-order sequence. The next higher up region doesn’t have to know what it means.
What a Region of Cortex Looks Like: 我们说过每个 cortical region 有六层 (six layers 从上到下分别为 L1, L2, …, L6 不要跟视觉的 V1 V4 搞混) 但我们一般不把每一层看做人脑的基本单位，而是把 columns running perpendicular to the layer 看做 basic unit of computation in the cortex. 作者认为它是 basic unit of prediction.

我们接下来讨论 How cortical regions communicate with each other 共有三种方法：
- Upward Flow: Converge inputs from lower regions goes to the input layer of the next region through axons
- Downward Flow: Axons in layer 1 spread over long distances, so information flowing down the hierarchy from one column has the potential to activate many columns in the regions below it.
- Lateral Flow: L1 给 L4,5 发指令运动，L4,5 收到指令的同时，不仅向下给肌肉发放运动信号，也把这个消息告诉 thalamus，thalamus 过一会后会把这个消息重新传回给 L1。其中 thalamus 收到来自许多不同 L4, L5 的信息，然后再把这些信息一起返回给所有 L1 ，这样本 column 就知道知道周围其他人现在收到的信息。Column not only knows the sequence name (downward flow from above), but also where we are within the sequence (activity from other columns)
How a Region of Cortex Works - The Details:
- How does a cortical region classifies inputs?
  
  It’s too complicated, we assume it does
- How does it learn sequences of patterns?
  
  Input from lower region -> layer 4 fires -> layers 2,3,5 fire -> layer 1 fires to tell the region up some input has come. Fire together Wire together, so 2,3,5,1 wire together. 2,3,5 now can fire without a layer 4 input, so they learn to “anticipate” when they should fire based on firing of 1. Half of input to layer 1 comes from layer 5 in neighboring columns. This information represents what was happening moments before. It represents columns that were active prior to your column becoming active. The other half of the input to layer 1 comes from layer 6 cells in hierarchically higher regions. This information is more stationary. It represents the name of the sequence you are currently experiencing. Combining these two information, a prediction/sequence is formed.
- How does it form a constant “name” for a sequence?
  
  constant names = constant input to the next region during learned sequences = need to turn off the output of the layer 2 and layer 3 cells when a column predicts its activity, or, alternately, to make these cells active when the column can’t predict its activity. Layer 2 cell represent the name of the sequence and they stay on when we are within the sequence. Layer 3b cell represents don’t fire when our column successfully predicts its input but do fire when it doesn’t predict its activity.
- How does it make specific predictions?
  
  If you expect a fifth (prediction / invariant representation) and hear a D (specific input). In layer 2 we fire all intervals of fifth. In layer 4 we fire all intervals starting with D. The intersection between the two is our specific prediction.
Flowing Up and Flowing Down:
1. 上层给下层 prediction
2. 当下层得到的输入与 prediction 不符 (unexpected)，我们将此特征传导给更上一层，直到 some higher region can interpret it as part of its normal sequence of events.
3. That higher region generates a new prediction and propagates it down
Can Feedback Really Do that? Feedback synapses are all far away from cell’s body, so it’s doubted whether the feedback currents can really make a difference. 但是新研究发现离得远的 synapse 可能有其他特殊的效果（并不确切证实）
How the Cortex Learns: 比如我们有1,2,3层，一开始单个文字在第3层，随着我们持续学习和不断练习单个文字移到了第2层，相对的，我们在第3层习得短语这个 pattern。This ensures that we free up the top for learning more subtle, more complex relationships. 这也是我们变得更熟练的原因。
The Hippocampus: 我们常认为海马体是生成新记忆的中心，在作者的模型中，Hippocampus is the top region of neocortex. 我们刚刚说 unexpected input 被传输给上层，so if something gets to the top of the cortical pyramid, it is the information that can’t be understood by previous experience, the input that is truly new and unexpected. That’s what stored in Hippocampus, but it won’t be stored forever. It’s either transferred down to the cortex (长期记忆) or eventually lost (遗忘) 所谓人在壮中年时对”新事物”的记忆没有那么好实际上是因为这些”新”的东西实际上早已在以前的生活中出现过，所以人对第一次记忆特别深刻，对之后的类似事物就没那么好记性。（它竟然和 How the Cortex Learns 这很扯的一节联起来了）
An Alternative Path up the Hierarchy: 这里要介绍的是从 Layer5 -> thalamus 的路径。这条路径可开可关，它要么被上层激活打开，要么被下层的 unexpected input 激活。我们认为这条路径代表注意力，两种开启方式分别对应主动关注(pay attention)，以及因为奇怪的现象而被动关注 (attention is caught)
Closing Thoughts: 分享了作者从零想结构写代码最后竟然能跑的例子，但是相对的如果别人只给你看一堆代码结构规划，你可能会怀疑这东西到底能不能跑，类比到脑结构中，怀疑的原因是 it is because our intuitive sense of the capacity of the cortex and the power of its hierarchical structure is inadequate.

7 Consciousness and Creativity

Animals and Human Intelligence: Memory and Prediction are the core of “Intelligence” and they are used by all livings. There is just a continuum of methods and sophistication in how they do it.
1. One-cell animal: They used DNA as the medium for memory. Individuals could not learn and adapt within their lifetimes. They could only pass on the DNA-based memory of the world to their offspring through their genes.
2. Modifiable Nervous System: An individual could now learn about the structure of its world and adapt its behavior accordingly within its lifetime. But an individual still could not communicate this knowledge to its offspring other than by direct observation. Neocortex was also created at this time.
3. Human Intelligence: It begins with the invention of language and the expansion of our large neocortex. The more important is language. We humans can learn a lot of the structure of the world within our lifetimes, and we can effectively communicate this to many other humans via language.
What is Creativity? Recall that we make predictions by combining the invariant memory recall of what should happen next with the details pertaining to this moment in time. All cortical predictions are predictions by analogy. We are being creative when our memory-prediction system operates at a higher level of abstraction, when it makes uncommon predictions, using uncommon analogies. 注意 GEB 中也提到说 analogy 是智慧的核心
What is Consciousness? 有人认为 consciousness/mind 在身体之外，但是实际上它就在脑中。Your thoughts, which are located in the brain, are physically separate from the body and the rest of the world. Mind is independent of body, but not of brain.

8 The Future of Intelligence

Because I have been immersed in the neuroscience and computer fields for over two decades, perhaps my brain has built a high-level model of how technological and scientific change occurs, and that model predicts rapid progress. Now is the turning point.

General Direction of Intelligent Machine: Our intelligent machine may have a set of senses that differ from a human’s. attach
to these senses a hierarchical memory system that works on the same principles as the cortex. We will then have to train the memory system much as we teach children. Over repetitive training sessions, our intelligent machine will build a model of its world as seen through its senses. The intelligent machine must learn via observation of its world. Once our intelligent machine has created a model of its world, it can then see analogies to past experiences, make predictions of future events. 这个智能机器的整体运作方法和大脑相同，但是它并不需要与大脑长得相似或得到和大脑相同的输入，它只需要复合结构的，能够用来作“预测”的输入即可。What makes it intelligent is that it can understand and interact with its world via a hierarchical memory model and can think about its world in a way analogous to how you and I think about our world.
Ethical Problems? No. The strongest applications of intelligent machines will be where the human intellect has difficulty, areas in which our senses are inadequate, or in activities we find boring. In general, these activities have little emotional content.
In the following areas, Intelligent Machines will exceed we humans:
- Speed: Transistor switch is much faster than human brain’s electrical signals.
- Capacity: we can add capacity to machine’s mind by doing the followings (these are also what we do in DL/ML)
  - Adding depth to the hierarchy will lead to deeper understanding: the ability to see higher-order patterns.
  - Enlarging the capacity within regions will allow the machine to remember more details, or perceive with greater acuity.
  - Adding new senses and sensory hierarchies permits the device to construct better models of the world
- Replicability: we humans learn knowledge and form our own model of the world rather slowly. However, an intelligent machine need not undergo this long learning curve, since chips and other storage can be replicated endlessly and the contents transferred easily.
- Sensory Systems: Input patterns to the machine don’t have to be analogous to animal senses, or even to derive from the real world at all. In fact, the author suspects that out inability to tackle issue may be related to a mismatch between the human senses and the physical phenomena we want to understand. Intelligent machines can have custom senses more
  sensitive than our own, or senses that are distributed, or senses for very small phenomena. They might think in three, four, or more dimensions.

Appendix: The Thousand Brain Theory

Notes from Microsoft Research - The Thousand Brains Theory by Jeff Hawkins

Local Cortical Circuit

Inside a local cortical circuit, neurons are organized in layers. Most connections go vertically across the layers; limited connections go horizontally within layer. Recent find: all layers have a motor output. So it’s always sensorimotor input, no pure sensory input.

Vernon Mountcastle: neocortex is remarkably uniform in appearance and structure because they are actually performing the same basic intrinsic function. A cortical column is the unit of replication. If you understand one of it, you understand the whole brain.

Layer 2,3 - object
Layer 4 - main input layer
Layer 6 - location relative to the object

L6 sends information to L4, L4 processes these information with its own other input. Over time it forms a representation of what the object itself is in layer L2,3. On top of that, if we have multiple cortical involved (imagine multiple fingers touching the cup instead of only one), we can instantly build a mental image of the cup by the connections across cortical units happened in L2,3. This is like a voting mechanism where each finger has a guess of its feeling and they settle what the object really is by talking to each other.

Building a Reference Map

A reference map is the sense of relative location as we are touching the cup

Contrast to the classical view, the vast majority of connections between cortical regions are not hieratical at all.

Hypothesis: the grid cells in entorhinal cortex also exist in every cortical column of every neocortex region. They don’t create reference frames for location but reference frames for the objects we interact (the cup).

In the classical view, we have a hierarchy in our neocortex. The real structure is similar, 但我们并不是杯柄 -> 杯身 -> 整个杯子这种真正的阶梯式建模，而是每个“层级”都形成一个自己的杯子模型，这些模型并不相同. This model allows all models to “vote”. Everyone tries to guess what’s going on.

Author：Yao Lirong

Link：https://yao-lirong.github.io/blog/2021-06-23-On-Intelligence/

Publish date：June 23rd 2021, 12:00:00 am

Update date：June 9th 2022, 3:33:42 am

License：本文采用 Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) 进行许可

Next Post

Install and Configure Aria2 on WSL
Previous Post

TensorFlow 1.x Manual

CATALOG

1. 1 Artificial Intelligence
2. 2 Neural Networks
3. 3 The Human Brain
4. 4 Memory
5. 5 A New Framework of Intelligence
6. 6 How the Cortex Works
7. 7 Consciousness and Creativity
8. 8 The Future of Intelligence
9. Appendix: The Thousand Brain Theory
1. 9.1. Local Cortical Circuit
2. 9.2. Building a Reference Map