About Me

I used to be a UNIX systems admin, but got tired of the corporate games. Now I work for myself. I'm still good with the computers, though (grin).

Sunday, August 19, 2007

I Meta Meta

No, it isn't a fraternity or sorority.

It is a level of abstraction, a way of putting a description of information into the information, or on top of it. In other words, information about information. You can think of meta data as the column headings in a spreadsheet or a corporate financial statement. In that case, the meta data labels the data in the column below the label.

For a time sequenced group of spreadsheets or financial statements, there is also the date, which in a sense is meta-meta data, data about the group of spreadsheets or statements.

OK, I know we all understand what it is and how it works, and those who write html code, xml, css and other data descriptive things use it all the time. An html tag is meta data.

So is a card catalog in the library, or a building code.

Describing data is important in cosmological and philosophical senses as well. I touched on this in a previous blog, but it's important to say it again: In any closed system, there is not enough space to describe that system. Self reference or recursion is well known to computer geeks like me to require careful management to avoid running out of resources due to circular references, or too deeply nesting the recursion.

If we were able to use every atom in the universe as a storage medium, there wouldn't be enough of them to describe the system we had built, much less describe the universe. In other words, there are simply some things we can NEVER know.

One easily seen example of this is weather modeling. The most powerful computers ever built can't reliably model our planet's weather pattern for any extended period. There are too many variables, and some are linked in poorly understood ways with others. We have to build models that are incomplete because we don't yet understand all the interactions, but if we understood all the interactions, we wouldn't have a model, we would have a weather system. A perfect description is the object described. I have read fantasy fiction where knowing something's true name gave you power over it, and it's the same for objects. The only way to perfectly describe something is to point at the thing you are describing.

So maybe instead of trying to pile descriptions on top of descriptions of data, maybe we need to use the idea of "de-metaing" the data, or reducing the amount of data required to describe something as well as minimizing what is needed to tell us what the data means. Sort of a .zip or .arc compression. A function in mathematics can describe a complex set of relationships, so we only need the function and the ability to analyze it to understand all those relationships. We don't have to store all the data about them because the function does that for us.

It's time to start looking for ways to normalize all the data that we have generated. A primary record and a secure, guaranteed backup is all that is needed for any single piece of data. Repetition is wasteful when the storage resources are finite. Of course, this only applies to factual data. This will leave more room for the infinite ways in which we can use these facts for discourse and theory.

An example: Google "GNP 2000". There are more than two million hits. Most of the information is redundant, and that's only the copies on the world wide web. There are plenty of hardcopy references that may not even be listed once. Sure, many of those hits are analyses (and some are irrelevant) but the same basic information exists in each analysis. It's nice to have a book in your hands, but at some point in our future, that is going to be a huge waste of resources. Books don't have a very high information density compared to solid state (computer) memory, which is itself wasteful of resources when compared to technologies that store data on molecules, atoms or even photons.

Also, there is the issue of errors. Today, an correcting an error in data that is used in many different analyses takes a certain amount of time to propagate. During the propagation period, there exists a decreasing number (with respect to time) of references to the invalid data, which in turn may be cited and used in decision making. Those decisions are then necessarily flawed, and may cause damage to the very system being analyzed when implemented.

Of course, certain data are proprietary, as are many of the algorithms used for analysis. If a stockbroker uses incomplete or erroneous data, or demonstrably flawed analysis, to make recommendations for buy/sell/hold decisions, that can affect many investors. It is very important that corrections are made in the shortest possible time, and those recommendations should change in as close to real time as possible. This benefits the broker as well as investors. This is still an argument for a new approach to information storage, the difference being only in the possession of the data, not it's structure or usage.

Once we have perfected the science of storing facts, then we approach the thornier issue of what to do about all of the crazy people saying insane things just to hear themselves talking. I'll leave that as an exercise for the reader.










Powered by ScribeFire.

No comments: