AI TOOLS 2024.

What is the late-night buzz for researchers? It’s the nagging question of whether AI systems can truly learn like humans simply by looking at data. Maybe self-supervised learning guru Yann LeCun has the answers.

In a recent discussion on X, LeCun was summoned to talk about the critical role of redundancy in self-supervised learning (SSL). According to him, SSL thrives on data redundancy, enabling it to uncover structure and patterns within the input.

If the data has redundancy, which means there are repeated or similar parts, SSL can use it to learn useful structures and insights. However, “highly compressed data has no redundancy and appears random. SSL can not learn anything from random data,” said LeCun.

He highlighted that highly compressed data, devoid of redundancy, appears random and cannot be learned from. On the other hand, while completely predictable, highly redundant data lacks the novelty that SSL needs to extract useful information.

A compromise is reached when there is sufficient redundancy to enable SSL to model the structure while allowing for the possibility of learning from less predictable elements.

Since as long as we can remember, LeCun has been discussing this. While speaking with AIM two years ago, he said that the average human has the ability to process about ten images per second in a span of 100 milliseconds. By the time humans are five years old, they have already seen about a billion frames.

What Is the Argument?

LeCun emphasized, “A child has seen 50 times more data in four years than the biggest LLMs.” Not everyone agrees, though.

LeCun disagrees, arguing that inadequate data quality is not a barrier to the development of intelligent systems. He believes that the main issue is not the unavailability of data but how learning systems can take advantage of the available data. The same point has been reiterated several times by LeCun and others who agree with him.

The creator of Keras and another deep learning guru, Francois Chollet, started a debate about how much information humans learn through vision. “This point comes up frequently, but it does not quite make sense,” Chollet said, sounding unimpressed.

Chollet emphasized that although redundancy is helpful, one also needs to take into account the information’s “post-compression” measure because raw data is not always a reliable indicator of information that is meaningful. He pointed out that while visual feed data may appear to have a high bandwidth, much of it is autocorrelated and redundant in time and space, thus reducing its true informational value.

LeCun, who has been a big proponent of AI achieving animal-like intelligence before going for human intelligence, emphasised that learning simply cannot happen without a degree of redundancy in the data. The more redundancy there is, the more structure SSL can harness. He backed up his argument by referring to the human visual system.

Vision is Not All You Need?

LeCun explained that while the human eye has about 60 million photosensors, four layers of neurons reduce this raw data down to a million optic nerve fibres. This compression, LeCun argues, reduces excessive redundancy while allowing for essential features to be captured by the brain.

This underscores the vast difference in the bandwidth between text, which LeCun calls “too low”, and visual data, which is more redundant and thus ideal for self-supervised learning. In his view, video data offers the right balance of redundancy, making it a much richer modality for training models compared to text.

But this is not well agreed about. Chollet questioned LeCun’s estimates of the bandwidth of the human visual system, arguing that the raw optical input is far less than what LeCun suggests. According to Chollet, the true bandwidth is under 1 MB/s, which is significantly lower than LeCun’s estimate of 20MB/s.

“My point is just: the claim that the information bandwidth of the human visual system is 20MB/s (based on optic nerve count) is pure nonsense,” said Chollet. To which LeCun said, “What is pure nonsense is claiming that the relevant quantity is the number of bits after compression.”

The question still stands: which is more knowledgeable about the world—all the text on the internet or a four-year-old’s point-of-view video? Chollet questioned, “If raw data is all that matters, then why are blind children intelligent at all?” To which LeCun replied that humans pick up knowledge through touch, a highly redundant and high bandwidth method.

Though the word “redundancy” sounds counterintuitive for AI models, it turns out that it is actually helpful when talking about teaching AI how humans do, by looking at consistency and similar examples again and again. At the same time, the argument for data quality over redundancy holds ground when it comes to building text-based models.