Past, Present and Future of the Third Media Revolution

Your Infinite “Meta Book” and Random Textual Fingerprint

December 20th, 2009

In the New Journal of Physics (Dec 10, 2009, full PDF available) Sebastian Bernhardsson, Luis Enrique Correa da Rocha and Petter Minnhagen explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.

From their article “The meta book and size-dependent properties of written language:”

“The writing of a text can be described by a process where the author pulls a piece of text out of a large mother book (the meta book) and puts it down on paper. This meta book is an imaginary infinite book that gives a representation of the word-frequency characteristics of everything that a certain author could ever think of writing. This has nothing to do with semantics and the actual meaning of what is written, but rather concerns the extent of the vocabulary, the level and type of education, and the personal preferences of an author. The fact that people have such different backgrounds, together with the seemingly different behavior of the function N(M) for the different authors, opens up the speculation that everyone has a personal and unique meta book, in which case it can be seen as an author’s fingerprint.”

Jaap Bloem

You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

One Response to “Your Infinite “Meta Book” and Random Textual Fingerprint”

  1. found-your-story

    Something kept puzzlingme, so I asked Sebastian the following:

    Q: could this perhaps be called a fractal relationship — i am not an expert, which probably explains my stupid question >>
    “According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.”

    Thanks in advance for answering me. Regards, Jaap Bloem

    Sebastian to me:
    It is not a stupid question, however my answer would be no. I don’t think it could be called a fractal relation because the term fractal indicates a scale invariant property. It means roughly that something should look the same at any scale, for example size. The sentence you quote could, in itself, apply to a fractal system but our results for real texts suggests that there is a size (length) dependence on properties like the average frequency or the exponents ‘alpha’ and ‘gamma’. So, even though the word-frequency distribution looks the same for a short text and a section of a long text (of the same length), both of them are different from the long text.

    I hope my reply makes any sense. Please ask me if you have more questions.

    Best regards,
    Sebastian Bernhardsson

Leave a Reply