INLS 110-122 Spring 2005 Knowledge Discovery: A1 Readings Pt. 1 Lesk

Lesk, M. "How Much Information...?"

In 1997 M. Lesk estimated volume of information in the world at several exabytes (~ several billion GB). Spirit of article is dedicated to haphazard guessing & makeshift counting method (e.g., estimates of hard drive sales), but entertaining nonetheless. Less entertaining and more haphazard is the later speculation on the "volume" of "human memory;" Lesk cites Landauer's estimate of brain capacity at 200MB, which is manifestly dubious, and then Lesk calculates total human memory to be just over one exabyte. Oops. Landauer, at least according to Lesk, figures the brain holds 1,000 to 100,000 neurons per bit of memory, based on Landauer's recal tests of human memory. There are obvious reasons why recall tests are a bad way of measuring the amount of memory (more in the computer sense than in the sentimental sense) a human brain can retain. Computers of course store memories in 0s and 1s, but, importantly, those 0s and 1s do not always add up to high level, semantically-rich information. Some of this memory can be very low-level. So human recall tests prima facie bias results to be very low.

To be fair, just as we measure hard disk capacity (and just as Lesk uses hard drive *capacity* quantity figures for his argument) we should guess the very same way with the human brain: count the number of bits. Now, a human brain has ~10e14 neurons, but hardly is a neuron a single bit. Neurons have a fairly high order of ways in which they can articulate: number & length of dendrites & axons alone are but two dimensions which we may count to get an idea of a possible number of states for each neuron. We might also have a few dimensions for measuring neural cell signalling; it's unlikely that nerve cells have just one signal to pass, and it's equally unlikely that the signal is strictly digital. There may be many other dimensions with respect to neurons to countas well, various location-dependent interactions with other aspects of the brain, for example. But we'll skip that arena. We might also count possible states for astrocytes, a species of glial cell, since they may also be involved in "saving state."

10e14 nerve cells

conservative estimate of states per cell
---------------------------------------
avg number of synapses: 10e4
avg number of sodium pumps: 10e6

10e4*10e6=10e10

total number our bits based on neurons: 10e14 * 10e10 = 10e24

one exabyte is approximately 10e19 bits
10e24 bits, our current estaimate for storage capacity of a single human brain without counting astrocytes, is approximately 100,000 times larger than Lesk's estimate for the total of the world's information, and is 10e16 larger than Landauer's estimate for a single brain.

By the numbers, and by the paper's current methods, the storage needed to cover the capacity of human neurons would be in the order of 10e34 (6*10e9*10e24) bits, or approximately 100 trillion exabytes.

It cannot be remembered for us wholesale.

To be very honest, I don't take seriously any claims to strong AI, and don't fancy the parallels routinely drawn between human experience and computer I/O. The metaphors of computing just don't work very well, except maybe to dumb down and underestimate our understanding of the myriad complexities of this lump of gray matter and its correlated but yet-to-be-verified partner, the mind. For example, there's simply *no* equivalence between an actual sentence in an ASCII file and that same sentence memorized and "inside" a human head. That sentence "inside" the human head ("inside" is in quotes, because no one can actually locate it, touch it, verify it) . probably has a tonb of other information attached to it, and the recall task brings with it tons of other information. In other words, when it comes to the human mind, a sentence is not a sentence. A sentence (e.g., ASCII) is not a sentence (either in the mind's eye OR in the brain).

So I digress. Lesk's more mild claim that we will be able to store all our media digitally seems reasonable. What this means for text mining is that we will have data, lots of data, for mining (if we gain access to it), and that the quantity of data will demand more and more mining work for information retrieval, classification, discovery, and synthesis.

INLS 110-122 Spring 2005 Knowledge Discovery

Monday, January 17, 2005

A1 Readings Pt. 1 Lesk

0 Comments:

Previous Posts