computer science goes bonk / all posts / rss / about

programming gibberish

One way to generate plausible-sounding gibberish is called Markov-chain text generation. It goes like this:

  1. Grab a bunch of text (Shakespeare's Hamlet, the lyrics to Baby got Back, etc.)
  2. Generate probabilities of which letters will follow other letters in that text
  3. Randomly pick new letters using the probabilities you just generated

The result is something that looks like the original text, but is nonetheless random.

How random is it? One interesting knob you can turn when running this algorithm is how many letters you use to figure out the probability of a subsequent letter.

Let's look at a little example of this using the current C++ language standard. We'll generate a new language (call it C++14!) and see how turning this knob affects the output.

Basing our probabilities off of one letter gives us this version of C++, apparently sung by Sigur Ros:

A tiss cop(18) sted 1 vathecv-titeceteand ig blld: al, s) YZ fistauctoshatd, r) ce uliten 33.4 (C 24) tis ioitrome t, atingof s<clare rtinisherairTh t C 2); coruranond ubuns cowhaueme prendocatst Ty, frularelaisspate ons op().

Upping to three letters switches genres to Edgar Allan Poe:

Two set typed ways poweversions // 20
defining res perfor_type id; stdardle nevel programe shower therence at bool volving: I1, or eachin the bming& wherwise, the double>, an tempty arent_everencess but cons [b.conver defix-expres: the function over.

Six letters begins to sound frighteningly plausible:

join(c, d) in_between the hash_function or a type openmode model reflects throughout the associated from the sequently left squared memory for facets, and 
Key, respectively): — If T is defining object used with *this has that has a nested class error.

And here is the result with nine letters:

Note: This guarantee is not zero, the functions F returns the null pointer to an integral conversion specified belongs is implementation-defined native character string literal, 29
boolean literal, 29, 1165
Boolean type, 71
bound argument must be used to represents a BLAS-like slice out of the standard C library, how a well-formed.

Well look at that -- I believe we have a new language.


recent posts (all posts)