computer science goes bonk / all posts / rss / about

programming gibberish

One way to generate plausible-sounding gibberish is called Markov-chain text generation. It goes like this:

  1. Grab a bunch of text (Shakespeare's Hamlet, the lyrics to Baby got Back, etc.)
  2. Generate probabilities of which letters will follow other letters in that text
  3. Randomly pick new letters using the probabilities you just generated

The result is something that looks like the original text, but is nonetheless random.

How random is it? One interesting knob you can turn when running this algorithm is how many letters you use to figure out the probability of a subsequent letter.

Let's look at a little example of this using the current C++ language standard. We'll generate a new language (call it C++14!) and see how turning this knob affects the output.

Basing our probabilities off of one letter gives us this version of C++, apparently sung by Sigur Ros:

A tiss cop(18) sted 1 vathecv-titeceteand ig blld: al, s) YZ fistauctoshatd, r) ce uliten 33.4 (C 24) tis ioitrome t, atingof s<clare rtinisherairTh t C 2); coruranond ubuns cowhaueme prendocatst Ty, frularelaisspate ons op().

Upping to three letters switches genres to Edgar Allan Poe:

Two set typed ways poweversions // 20
defining res perfor_type id; stdardle nevel programe shower therence at bool volving: I1, or eachin the bming& wherwise, the double>, an tempty arent_everencess but cons [b.conver defix-expres: the function over.

Six letters begins to sound frighteningly plausible:

join(c, d) in_between the hash_function or a type openmode model reflects throughout the associated from the sequently left squared memory for facets, and 
Key, respectively): — If T is defining object used with *this has that has a nested class error.

And here is the result with nine letters:

Note: This guarantee is not zero, the functions F returns the null pointer to an integral conversion specified belongs is implementation-defined native character string literal, 29
boolean literal, 29, 1165
Boolean type, 71
bound argument must be used to represents a BLAS-like slice out of the standard C library, how a well-formed.

Well look at that -- I believe we have a new language.


the bar exam for language lawyers

Here's a question for language lawyers studying for your bar exams:

Match the language standard (C, C++, C#, Go, Java, or Javascript) to the tag cloud:

??? answer:

??? answer:

??? answer:

??? answer:

??? answer:

??? answer:

If you got all six right, then I'm impressed. Get thee to a law firm!


cross-compiling and popularity

Apropos nothing, here is a visualization of programming language popularity compared to interest in cross-compiling between various languages:

popularity vs. crosscompiling interest

Observations

  • Unsurprisingly, C is a popular target to compile to.
  • Popularity doesn't necessarily correlate with cross-compiler interest. Look at Objective-C and Javascript, for example -- Javascript isn't as hot as Objective-C, but everybody wants to compile down to Javascript.
  • Nobody wants to compile to Bash.

About the visualization

Popularity is represented by color, with data from good ol' TIOBE 2013.

Interest in cross-compiling between languages is represented by node and edge size, using data based on the number of results from search engine queries like "compile c++ to ada" or "c++ to ada compiler".