Driven by Compression by Jürgen Schmidhuber | Notes & Summary

Jan 08, 2019

For the original journal by Jürgen Schmidhuber see: https://arxiv.org/abs/0812.4360

This journal proposes a framework that aims to describe at a high level of abstraction how advanced intelligent agents function as well as being a normative guide to creating artificially intelligent systems. Despite the technical intentions, many useful and, interestingly (you will soon see why), beautiful implications can be drawn which explains our daily lives.

It highlights our drive for compression: the propensity to generalize and categorize, and thereby reducing the number of bits for storage without losing valuable information as a primary driver of intelligent agents. We have an internal reward system that encourages learning, specifically compression.

The intuition is backed by the fact that external rewards of reproduction and survival are quite rare and it is often too sparse to optimize directly. “Every intelligent system interested in achieving future goals should be motivated to compress the history of raw sensory inputs in response to its actions, simply to improve its ability to plan ahead…The framework directs the agent towards a better understanding the world through active exploration, even when external reward is rare or absent, through intrinsic reward or curiosity reward for actions leading to discoveries of previously unknown regularities in the action-dependent incoming data stream.”

This drive is so universally useful that we are prewired to seek compression as a primary drive on par with external rewards.

The simplest act of compression is categorization (reminds me of Apoha theory in Indo-Tibetan Buddhism and how the abstraction from unique particulars is the basic building block of our language). “For example, the sun goes up every day. Hence it is efficient to create internal symbols such as daylight to describe this repetitive aspect of the data history by a short reusable piece of internal code, instead of storing just the raw data. In fact, predictive neural networks are often observed to create such internal (and hierarchical) codes as a by-product of minimizing their prediction error on the training data.”

Here are aspects this drive can explain:

Beauty

This is referring only to pristine beauty such as the beauty of a piece of art, a mathematical formula, or a landscape as opposed to sensual beauty such as the beauty from the feeling of a hot bath.

In a set of elements classified as comparable by a given observer (their raw data shares similar levels of complexity e.g. an ugly and a pretty face is comparable in raw complexity whereas a monster truck and a circle are not), the more pristinely beautiful one is the one requiring the simplest amount of bits (the shortest description) to store given an observer’s current mental categories.

“For example, to efficiently encode previously viewed human faces, a compressor such as a neural network may find it useful to generate the internal representation of a prototype face. To encode a new face, it must only encode the deviations from the prototype [67]. Thus a new face that does not deviate much from the prototype [17, 48] will be subjectively more beautiful than others. Similarly, for faces that exhibit geometric regularities such as symmetries or simple proportions [69, 88] — in principle, the compressor may exploit any regularity for reducing the number of bits required to store the data.”

Another example is why so many people think simple equations such as E=MC2 is “beautiful”. One equation distills and removes away so much complexity.

In a way, we do have a Platonic realm of Forms floating around that represents the most aesthetic, however, unlike Plato's proposal these mental concepts lie in our head and are not necessarily public and communal.

Interesting

Interesting is the first derivative of beauty, quite intuitively, it measures the change in our understanding of phenomena. “A beautiful thing is interesting only as long as it is new, that is, as long as the algorithmic regularity that makes it simple has not yet been fully assimilated by the adaptive observer who is still learning to compress the data better.”

For something to be interesting, it’s beauty needs to increase with time (or perhaps tease future compressibility). Beauty is not interesting, and they aren’t exclusive either. Unchanging beauty and interesting are exclusive in the long run. This maps to our intuitions fairly well as scientific dead ends that remain as such for a long time become uninteresting to scientists.

Curiosity

Curiosity is the drive that makes the world more beautiful by pursuing the interesting.

It is bored with objects that 1. Appear random (e.g. static noise) and hint at incompressibility 2. Known and already fully compressed 3. Too hard and compression would take too many resources compared to other phenomena.

This is why the most interesting people are the ones who are weird but in a consistent way, the ones with many faces but an underlying coherent values system.

Art

All types of art, both creation and observation, can be seen as a way to read the world in a compressible way. “Good observer-dependent art deepens the observer’s insights about this world or possible worlds, unveiling previously unknown regularities incompressible data, connecting previously disconnected patterns in an initially surprising way that makes the combination of these patterns subjectively more compressible (art as an eye-opener).”

This is why we like songs that are very similar to our tastes but just different enough to be interesting. “Not the one he just heard ten times in a row. It became too predictable in the process. But also not the new weird one with the completely unfamiliar rhythm and tonality. It seems too irregular and contains too much arbitrariness and subjective noise. He should try a song that is unfamiliar enough to contain somewhat unexpected harmonies or melodies or beats etc., but familiar enough to allow for quickly recognizing the presence of a new learnable regularity or compressibility in the sound stream. Sure, this song will get boring over time, but not yet.”

Artists are like scientists in that both “try to create new but non-random, non-arbitrary data with surprising, previously unknown regularities.“ With the main difference being that the essence of science is to formally nail down the nature of the compression progress while art and artists accomplish it in a more intuitive manner. I felt this viscerally doing UI design because it is this sort of intersection between aesthetic intuition and theory. It was quite enjoyable when you placed elements in pleasing locations and have that placement be explainable in a theoretical manner.

This explains why art goes through reactive cycles: when one type of art form (e.g. the fugue) has been exhausted such that all the ways to condense complexity has been exploited in some manner, only then is it worth taking on the heavy burdens associated with trying new forms.

It also explains why culture converges and the improvements are only usually iterative. What we think is beautiful is heavily influenced by their distance to current cultural prototypes of beauty. It exposes the culturally-relative basis for beauty. This makes me appreciate certain explorative types of modern art more because they take the hard path in trying to find new ways of compressing information instead of making iterative improvements. As a result, they might bring a whole new way to appreciate the world and add beauty and richness to culture.

Thus, we should be more open to trying new things because the more prototypes we have of beauty the more beauty we will see in this world. Some components of beauty are learned.

The fact that artistic taste follows the Pareto principle can be explained, in addition to mimetic theory, this compression theory. Not only do we take upon other people’s desire, but our very intrinsic in-the-vacuum notion of beauty is public in some manner.

Humor and Fun

The innate fun I find in reading philosophy and understanding concepts can also be attributed to the rewards from this drive, when concepts just click. The incongruence-resolution theory of humor also emphasizes how often funny phenomena are ones that draw unobvious phenomena together in new and novel ways:

“Comedians also tend to combine well-known concepts in a novel way such that the observer’s subjective description of the result is shorter than the sum of the lengths of the descriptions of the parts, due to some previously unnoticed regularity shared by the parts.“

…

Curiosity makes the world more beautiful by pursuing the interesting.

This journal was interesting, now it is beautiful.

Driven by Compression by Jürgen Schmidhuber | Notes & Summary

Beauty

Interesting

Curiosity

Art

Humor and Fun

Discussion about this post