inky has a blog

March 25, 2009

The ultimate perl program

Tags: , — inky @ 10:56 pm

The other day somebody asked if it’s possible to write a non-trivial perl program that uses no alphabetical characters whatsoever. People make jokes about this but I can’t find any actual treatment of the subject, so here’s a brief discussion.

First off, there’s the header. On my initial consideration, I figured either the first line has to be like #!/usr/bin/perl or else you run it from the command line with perl -e 'code here', so I think you have to allow for letters in the “extraneous bits” that just call the interpreter. That said, I think there are ways on various OSes (even linux) to associate files with interpreters based only on their extension, so it’s probably possible to do it with no alphanumerics. Alternately, on linux you could do, say, ln -s /usr/bin/perl \#, and then make the header #!/#.

But having said that, I don’t see how do it with no options to the interpreter. To take a step back, the main things you can do easily with no alphabetical characters are:

  • Numeric constants
  • Basic math (+ - * /) and assignment (=)
  • Conditions (||, &&, ==, !=, >) and conditional expressions (?:)
  • Variables (in perl variable names are of the form $[a-z_][a-z_0-9]*, so you can make a variable called, say, $_0)
  • Built-in variables (perl has a bunch of variables that get set to stuff, like $/ is set to the end-of-line character, and $. is the line number of the current/last-read file)
  • Regular expression matching ($foo =~ /a+/)
  • Strings (you can use octal escapes to define the strings, like "\150\145\154\154\157")*

Things that are notably absent from this list:

  • Printing
  • Any control flow beyond conditionals (no loops, no goto)
  • Regular expression search-and-replace, or any flags to the regular expression (notably, you can’t pass the /g option to make a match happen multiple times, which could be a loop substitute)

Ok, so I can’t see anything awesome to do using just what I’ve said here — without printing you don’t have much of a program. So, let’s relax our standards! The most obvious thing to relax is to let you pass some alphabetical command-line options to the interpreter since you can claim technically that’s not part of the program. As you probably know perl has various handy command-line options to make it easier to write one-liners. For instance, people have generally seen something like perl -p -i -e 's/foo/bar/g' file, which loops over the file and substitutes bar for foo everywhere. To break this down a little, the -e means “run the code coming up as the next argument”, the -i means “modify the file in-place”, and the -p means “take the code and wrap it in the expression while (<>) { ...; print $_; }” — which is where the more explicit “loop over every line, run the code, then print out the line (to the given file in-place)” bit comes from. So, hey, now we have print!

A semi-well-known hack that works with this is perl -lpe '}{$_=$.' file — if you substitute this expression into the … in the example above, you see this code does while (<>) {}; { $_ = $.; print $_; }. As you recall, $. is the current/last line number of the current/last file, so this prints out the number of lines in the given file. This is pretty good; unfortunately, I can’t work out a way to fully replicate wc and get a character or word count (there’s no way to do a split or multiple-match regexp on the line, which seem like the only ways to count anything with smaller granularity than lines).

But if you’re a real perl aficionado you are probably demanding something hackier. So, let’s talk DWYM. One of the famous things about perl is it tries to be helpful in understanding your code; if you do a hash lookup with $hash{foo} instead of $hash{"foo"}, it says “oh, you probably meant to quote that” and does the right thing. One more obscure example of this helpfulness function references — if you have a function reference $foo = sub { print "hello\n"; }, you can call it with $foo->(); That’s normal, but what’s weirdly helpful is that if you have a string that contains the name of a function, you can call it with the same syntax: sub bar { print "hello\n"; } "bar"->(); Nice! (Strings, as you recall, are a-ok with us because we can define them in the “real” program with octal escapes, but I will give them in unescaped form so you can read them.)

You might think we’re set — can’t we do "print"->("stuff to print")? And if you’re more perl-knowledgeable, you know about eval, which takes a string containing perl code and executes it — if we can get this without any alphabetical characters, we can write any program we want). Unfortunately, no such luck. print and eval and shift and so on are all perl built-in functions — they’re not “real” functions even though they act like them in many situations. You can’t take references to them and you can’t use this syntax to call them. On the other hand, you can use this technique to call user functions .. so what about modules?

Another helpful perl command-line option is -M, which imports a module. For instance, here’s a “long way” to print in perl: perl -MIO::Handle -e 'STDOUT->print("hello\n")' (this isn’t obviously using IO::Handle, but importing that module adds the ->print method to the STDOUT object — I think it actually redefines STDOUT to be an instance of IO::Handle). This is cool but of course it has a bunch of alphanumerics — so we need to find a way to get them as strings. Step one is to use the alternate syntax for calling an object method; like in python, the code above is equivalent to IO::Handle::print(STDOUT, "hello\n"), or, for our purposes, IO::Handle::print"->(STDOUT, "hello\n") (I do have to scope the method now since it’s not being called through an object). This is almost right but there’s still the STDOUT part.

If I was a perl wizard I would now explain globbing and the symbol table, but I’m just going to say that in this case STDOUT is equivalent to *{"STDOUT"} and leave it at that. So, check it out, arbitrary printing using nothing but strings: perl -MIO::Handle -e '"IO::Handle::print"->(*{"STDOUT"}, "hello\n")' — or, if you prefer, perl -MIO::Handle -e '"\111\117::\110\141\156\144\154\145::\160\162\151\156\164"->(*{"\123\124\104\117\125\124"}, "\150\145\154\154\157\012")'

Unfortunately, I’m not sure exactly where to push it from here. Obviously if you can call modules, you could write a module containing whatever code you wanted and then call it, but I think we can consider it cheating to use any custom libraries for this. If there’s a standard library that has an eval-esque method we could call, that’d be great, but I don’t know of one.

…But there is one! After some investigation I have come up with the Safe module, which apparently has been in perl for quite a while, and is designed to create a safe environment to evaluate arbitrary code in. This probably won’t let you do everything, but it’s a whole heck of a lot. The basic setup is like perl -MSafe -e 'my $s = Safe::new("Safe"); Safe::reval($s, "print qq(hello\n)")'

And so, in conclusion, a non-trivial perl program that contains no alphabetical characters:

perl -MSafe -e '$_0 = "\123\141\146\145::\156\145\167"->("\123\141\146\145"); "\123\141\146\145::\144\145\156\171_\157\156\154\171"->($_0); "\123\141\146\145::\162\145\166\141\154"->($_0, "\146\157\162 (\155\171 \$\151 = \61; \$\151 <= \61\60\60; \$\151++) { \151\146 (\$\151 % \61\65 == \60) { \160\162\151\156\164 \161\161(\106\151\172\172\102\165\172\172); } \145\154\163\151\146 (\$\151 % \63 == \60) { \160\162\151\156\164 \161\161(\106\151\172\172); } \145\154\163\151\146 (\$\151 % \65 == \60) { \160\162\151\156\164 \161\161(\102\165\172\172); } \145\154\163\145 { \160\162\151\156\164 \$\151; } \160\162\151\156\164 \161\161(\\\156) }")'

Yay perl!


* One-liner to print out the escaped version of a string:
perl -lpe 's/\\/\\\\/g; s/\$/\\\$/g; $_ = qq(") . join("", map { /[\x20-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/ ? $_ : sprintf("\\%o", ord($_)) } split //) . qq(")'

You note it has to produce a double-quoted string, so it has to escape backslashes and dollar signs, and to avoid a bad translation of a3 (you don’t want that to translate to \1413), it also escapes digits.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress