Press no key to discontinue...

Obscure Linux Commands: Cheating At Word Games

Date/Time Permalink: 09/24/07 07:25:30 pm
Category: HOWTOs and Guides

Now, granted, cheating at word games won't make you very good at the word games. However, using practical problems from word games is an excellent opportunity to brush up on your Command-Line-Foo! There is no better exercise to master regexps.

One thing you've probably found is that 99% of Linux distros have a file in /usr/share/dict/ which is called "words". True to Unix traditions, it's nothing but a plain text file of words, one per line, which is what the system uses for spell-checking.

So, you're bumbling along doing a crossword puzzle, and you need to find a seven-letter word that begins with 'm', ends in 's', and has a 't' in the middle. Go get it, grep!

ß grep "^m..t..s$" /usr/share/dict/words
mantels
mantles
martyrs
masters
matters
mentors
misters
mittens
mortals
mortars
mottoes
mouthes
mutters
mystics

Blow by blow:

  • ^ - The carat (shift-6) says "this is the beginning of the line". Without it, it would find all words like "fundamentals".
  • $ - The dollar sign is the same thing, only for the end of the line. Without it, you'd also get words like "mattresses".
  • . - The period means "any character here". One, and one only, character will match here.

But suppose you're dealing with a game besides a crossword puzzle, like Scrabble for instance, and you're limited by more constraints than in a crossword. You might want to 'hook' (Scrabble lingo for 'add letters to the beginning or end of a word to form more words'). So, let's see how many words end in "are".

ß grep "are$" /usr/share/dict/words | wc -l
43

Well, those are good odds. But we hit the edge of the board with some of them (I peeked). So, we need words that are seven letters or less which end in "are". "^....are$" would get all of the seven letter words, but not the shorter ones. The solution is rather cryptic this time:

ß grep "^.\{1,4\}are$" /usr/share/dict/words
airfare
aware
bare
beware
blare
care
compare
dare
declare
ensnare
fanfare
fare
flare
glare
hare
mare
pare
prepare
rare
scare
share
snare
spare
square
stare
unaware
ware
warfare
welfare

...but we've met the caret, dollar sign, and period before, so really the new part is the \{1,4\}. This says "match as few as one, and as many as four, repetitions of the previous character". The activator for the number range is the curly braces, which then have to be escaped with slashes (does anybody know why, class?). And since the previous character is a period, which matches any letter, we've found all the words shorter than eight letters which end in "are".

This is all well and good, but we only have so many letters to work with in Scrabble at one time. Say that our current rack has the letters "C F T W A B M". Can we limit it to only words which use those letters?

ß grep "^[cftwabm]\{1,4\}are$" /usr/share/dict/words
aware
bare
care
fare
mare
ware

Ah, now we're getting somewhere! The [] square brackets give the set of acceptable characters. Another way to use them is to express a range (e. g. [0-9]), but that's hardly the usual case in word games.

Here's another kind of word game you can play online. It's called simply "Crossword", but it's not at all like a regular crossword. Instead, you have to fill letters into a partially-completed grid, using only the letter provided. If you have a four letter space with "_w_y" and the 'A' available, you click on one of the empty slots which will then light up every space on the board that uses the same letter, click on A, and it will put the letter A everywhere in the puzzle that's marked for the same letter. And so on to complete the puzzle.

I introduce this fun little diversion because it brings up still other kinds of problems to solve. As you progress to higher levels in this game, you eventually come to a problem like "All I know about this word is that it has seven letters and the first, fourth, and fifth letters are the same. What is it?" I'll bet even the regexp gurus are scratching their heads at this point....

I'll give you a minute if you want to solve it yourself.

Hint: you ain't gonna do it with regexps alone this time.

OK, here's the spoiler:

ß grep "^.......$" /usr/share/dict/words | \
> awk '{$W==$1; FS=""; if ($1==$4 && $4==$5) print $W;}'
blabbed
blubber
esteems
exceeds
scissor
twitter

That's right, we had to use some real programming. Any Perl Ninjas out there have a solution in fewer keystrokes, you're welcome to post it. I know you're dying to.

What we did is simply grep for all the seven-letter words. Then the awk part (a) first grabs the whole word as "$1" and sticks it into the variable $W for 'word', (b) declares field-separator (the built-in FS variable) as null so it treats each character as a field, then (c) recycles $1, along with $4 and $5, as individual letters within the word, (the first, fourth, and fifth). So the last awk statement tests the equality of those three letters, and only prints the word (still safe in $W) if the match test passes.

Last, here's a bonus program: an. If you don't have an (it's rare to find it included in a base system) you can pick it up here in the Debian archive. If you have a non-Debian system, look for the source tarball in the right-hand column of that page. It compiled on my Slackware in nothing flat.

When installed, an takes your quoted string and returns all the possible anagrams for that string.

ß an "Richard Stallman"
marshal card lint
marshal clan dirt
thrills card an am
thrills cram ad an
thrills can drama
... and etc.

And it will only return dictionary words! It's pretty fast, too. Just the thing for finding "bingos" in Scrabble, as well as basis for nutty conspiracy theories.

I found it amusing that Richard Stallman's name breaks down to "chill drama rants", amongst others... the first hit for "George Bush" is "buggers hoe" and another one is "here go bugs". Endless hilarity potential.

sign guy sig

Follow me on Twitter for an update every time this blog gets a post.
Stumble it Reddit this share on Facebook

suddenly the moon