The Library of Babel - TTLG Forums

Qooper on 20/7/2022 at 21:51

Quote Posted by Komag

Bottom line, this is weird, messes with my brain's grasp on reality somehow

Do you remember David Braben's original Elite for the Commodore 64? There was this (relatively) large randomly generated galaxy with maybe a couple hundred star systems. Star systems had planets and some had star bases. You could go to every one of them. None of it was actually stored into memory. The way it worked was that when you entered a star system, the seed (which was a random number generated at the start of the game) was used to generate the star system's top-level data. Then, when you went to a location in that star system, say, a star base, that seed was once again used to generate more specific data for that particular location. When you left and went to a different star system, that previous star system basically ceased to "exist", until you came back again. So the only things that actually "existed" in that game were the seed, the generation rules and the game mechanics.

Same with this library. You give it a seed that is then used to generate two things:
1. Noise of a particular type (random English words or random character sequences).
2. A specific text in a specific location in the middle of the aforementioned noise.

This is why you can only find an infinitesimally narrow set of texts with this "search engine". Because you can only generate texts bound by very strict and limited rules. If you wanted to find a specific text that was longer than 3200 characters, it would have to be at least partially in that no. 1 part, the noise-portion. You'd have a better chance of finding the works of Shakespeare by sampling the cosmic background radiation.

Cipheron on 20/7/2022 at 23:35

Quote Posted by Qooper

Do you remember David Braben's original Elite for the Commodore 64?

The only thing I take exception at here is "for the Commodore 64".

Elite was a British game developed and released on the BBC Micro. It was ported to the Commodore, but also to every other platform. Commodore was not the dominant platform in Europe. About 2/3rds of the European market was split between Spectrum and Amstrad*, and Amstrad ended up owning Spectrum so they dominated the European 8-bit market.

(*The Amstrad story is actually interesting: people think of them as a lesser-ran platform compared to the Spectrum in Europe, but they cleverly targeted a mid-range gap in the market in between the budget 8-bit systems that connected to TVs and you had to buy wonky peripherals to get working, and the expensive IBM PC. So you could do 80x25 word processing for about 1/6th of the cost of an IBM PC, but also play games on it equivalent to the Speccy or Commodore. The Amstrad was more expensive but had a dedicated crisp monitor with RGB inputs instead of connecting it to a TV with fuzzy graphics: which also meant you *couldn't watch TV*. In the end the previous market leader Spectrum failed and was sold to Amstrad).

BTW this is relevant to the discussion:

Quote:

Braben and Bell at first intended to have 2^48 galaxies, but Acornsoft insisted on a smaller universe to hide the galaxies' mathematical origins.

Originally, it was intended for their to be near-infinite galaxies to explore using the procedural generation. However, the publisher pulled the plug on that idea because of marketing reasons.

So, the Elite dev originally wanted there to be 256 * 2^48 = 72 quadrillion planets, something on par with No Man's Sky, but the publishers got them to cap it at 2048.

Quote Posted by demagogue

I think at the end of the day the test is, if you re-ran it forever (a vastly long time), would it eventually reproduce every possible text meeting the criteria? And I think the answer to that would be yes, if it's really random; just by the pigeon hole principle alone.

If you ran the algorithm long enough then eventually all possible *finite* books would be generated. That's because finitely-long books exist in a countably-infinite search space. However, if you study Georg Cantor's higher infinities, you can see that the Library of Babel approach would *fail* to produce every possible set of stories.

There are countably infinities such as 1,2,3, ... ∞, and uncountable infinities such as the number of points in the line segment 0 ... 1. No matter how long you work for, you can never sequentially list all the points between 0 and 1 even in infinite time.

So that's one hard limit on any Library of Babel. You could imagine a serial work with an infinity amount of chapters. All these works (plus the existing Library of Babel works) exist somewhere in the digital expansion of the line segment between 0 and 1. However because of Cantor's Theorem, even an infinite Library of Babel that worked for an infinite amount of time couldn't find you all the books in there.

demagogue on 21/7/2022 at 04:43

Quote:

If you ran the algorithm long enough then eventually all possible *finite* books would be generated.

Yes, that's Borges's original criteria in his story, so that's what I meant by adding that little "meeting the criteria" qualifier at the end. ;)

In the original story, or I think Hofstadter's commentary on it in Godel, Escher, Bach (which if anyone is interested in this sort of thing definitely should read, along with his "The Mind's I"), the idea was you could combine the books to create any story of arbitrary finite length.

It's complete in the similar sense that just listing the 26 letters of the alphabet, capital and lowercase, and other marks like punctuation is complete since every English-langauge story is some combination of them. The thing missing then is the extra information of which books are combined with which other ones, and that itself would be infinite.

But I actually have a response to that too, if we want to go full philosophy major on this. There's always been two strains in the logic of math, the formalist strain like that Cantor argument plays to, and the constructivist strain, the part where at the end of the day it's actually parts of the brain like ventral parietal lobe that are actually doing computations, including little cognitive tricks to think about the logic of infinity without actually contemplating infninity, and that's what math actually is. Anyway, I tend to be on the constructivist side of the coin. At some point the books would be longer than any mind has the ability to read in a lifetime, or in the lifetime of the universe, after which it doesn't have any actual human / physical meaning. So that would be my ultimate escape hatch. You could have a set of appendixes that include every possible combination of the other books up to the amount that could be read by any entity within the lifetime of the universe. That would be a finite number, but it's okay because any number above that can't be physically read by any entity, which seems like an important criterion.

Actually now that I think about it, I think a stronger argument in that vein might be, at some point, if you're actually generating the number (not even the book) or even just giving yourself the capacity the generate the number, then the amount of particles you'd need, which would vastly outnumber the number of particles in the actual universe, at some point would actually collapse into a blackhole. Okay, at that point, I think it'd be safe to say any number above that point doesn't have a physical meaning, and there's your practically finite limit. XD

Tocky on 22/7/2022 at 14:19

Quote Posted by Qooper

It's not so much that there's an infinite number of different texts that this software "searches", but rather you give it a text and from that it generates a seed that generates the text.

The key here is that the generation mechanism works both ways, from seed to text as well as from text to seed. It doesn't search anything, it's a generator of two-way (text, seed) -pairs. There are also a few options where you can have your text appear:
1. With nothing else on that particular page.
2. Surrounded by random characters.
3. Surrounded by random English words.

This was my original thought though without the proper terminology to express it. Also I didn't want to quash any discussion with my blundering skepticism. It's nice to see it expressed in the proper terms so that even a non-computer person like myself may learn. The time it would take to search randomly seems incomprehensible with current tech. Correct me where I'm wrong of course.

Azaran on 22/7/2022 at 15:23

With technology and storage capacity growing exponentially, I'm hoping a genuine Library of Babel could some day be possible, with real search functionality; it would be revolutionary.
It would have to have certain conditions. So, e.g. every possible combination of words in any coherent language. This would solve the problem of having to wade through oceans of gibberish

hopper on 22/7/2022 at 22:28

Except „every possible combination of words in any coherent language“ is still oceans of gibberish. It‘d just be legible gibberish.

The truth is, if you are searching for words that actually carry any meaning, you‘ll only ever be able to“find“ texts that have already been written by someone, and only because you typed them into the search bar in the first place. It's like in a game where there's a map that theoretically covers the entire world, but will only show the parts you already know. The only way to „find“ a new text is to write it.

demagogue on 23/7/2022 at 00:53

You could use sophisticated AI to work as a fliter, e.g., skipping words until you came to words that it felt best fit consecutively into a nice poem or story or whatever. I think a human could do that and still come up with some good outcomes, and an AI if it were good enough. So it's not pre-written.

The catch is, if the algorithms were good enough that it could filter text to have good outcomes, then I'm pretty sure that same tech could generate a pleasing text on its own, so it's not really being "derived" from the library. It's still ultimately coming from the AI (or human), just using their authorship in selection instead of writing, although it's still using the Library text as a good prompt for something more original and tied to it than without it.

Anyway, whether a human or AI does it, the point is it's not prewritten, but it's still not all that impressive.

Cipheron on 27/7/2022 at 06:19

That's definitely the sticking point.

Ultimately there is no library of babel, there's just a very large mathematical search space, and AI such as GPT-3 are just algorithms that explore this search space.

The rub is that the algorithms that people are suggesting to search the search space in the context of "library of babel" are just REALLY shitty algorithms. It's a philosophical curiosity, but it generates crappy books.

It's pretty much what we do with neural networks and genetic algorithms already: you seed them with random values, which is exploring a very large search space, then you cull and tweak the values until they're exploring more interesting regions of the search space reliably.

GPT-3's direct search space is its parameters, of which it has ~175 billion. Those would be at least 32 bit numerical values. So GPT-3 is exploring a search space of 5.6 trillion bits, which is all possible variations of the stuff it could output.

Compare that to the Library of Babel, where each page looks like 80x50, no more than 32 symbols (which is 5 bits). So that's 20000 bits of data per Library of Babel page.

GPT-3's *possible* search space is equivalent to the information in all Library of Babel books up to a length of 280 million pages long. So it's a guided search, but one in a much more vast search space than directly trawling through Library of Babel entries, hoping that one is good.

Azaran on 3/6/2024 at 19:19

There's now an alternate, even larger version of the (https://libraryofbabel.app/) Library

Quote:

The Library of Babel is a fictional library that contains every possible unique book of 1,312,000 characters. Each book has 410 pages of 40 lines of 80 characters each. 410 × 40 × 80 = 1,312,000. As each book consists of a combination of the same 32 characters, the total number of unique books is 321,312,000. For comparison, it is estimated that there are around 1080 atoms in the observable universe.

It contains every book that has ever been written, and every book that ever will be written.

How does it work?
Most importantly, the library, and its books, are not stored anywhere — more on that below. Instead, a book is generated by a mathematical function when you view it. Once you view it, it is still not stored anywhere — the same way that by computing 2 × 5, the resulting 10 is not being stored anywhere, it is simply the output of the function. The next person to visit that same book will compute the same function again, and get the same answer.

In the most simple terms:

Each book is given a numerical index. The first book on the first shelf is book 1, the first book on the second shelf is 33 and so on, until the last book in the entire library — which has the index 321,312,000.
This (usually very large) number is run through a mathematical function to produce another unique number of 1,312,000 digits, which we represent in base-32 (we generally count in base-10, e.g. 0 through 9. Base-32 just means we count with more numerals, in this case 0-9 and then A-V).
Each digit of the base-32 result is mapped to the character at that position in the limited 32 character alphabet (0 → a, 1 → b, ...) to produce the content of that book.
Crucially, the mathematical function is reversible meaning that we can instead give it the contents of a book and work backwards determine the numerical index of the book that it appears in.

Once a book is generated from it's index, it is split up and displayed page by page. Each page is given an identifier derived from the numerical index, in the format room.wall.shelf.book.page e.g. 1.2.3.4.5. This is used in the URL of that page.

Azaran on 9/8/2024 at 17:42

This has been turned into an online (https://archiveofbabel.com/) first person game, where you can explore an infinite library