Issue 58 * October 10 2008

Faster Anagram Finder
Optimising an Anagram Search for Longer Strings

by Larry Watts

We are a new game company making physical games. Our first produced game is a unique deck of cards, Platypotapus, using letters of the alphabet. There are over 200 games we have created that can be played with the one deck of cards. Quickdraw is one of the Platypotapus games.

A few months ago we decided to start programming the games for computer play. I enlisted the help of a friend with some programming experience. My programming background consisted of writing a few simple apps in basic many years ago.   We spent several months researching and testing various development software apps.   Because I wanted to be involved in the design of our app, we wanted to find a 4GL development tool. We thought we had found the right tool with Visual Basic, but then found out that VB requires the end user to install Net Framework and that was unacceptable to us.

Feeling quite near the end of my rope, I spent an evening meditating and getting myself calm and telling myself that there must be a tool out there that is the right tool for us. In the wee hours of the morning I got back online and within a short time stumbled across Revolution - a miracle from my point of view!

Within an hour or so of browsing the site, I knew I'd found what I was looking for. Despite its few quirks, Rev is a delight to learn and use and from the first day even a newbie like me was not only laying out the appearance of stacks, but also writing some code.   Now, only two months later, I can write quite a bit of code.

I'm not the type of person who first goes through all of the tutorials and docs before getting started.   I just jump in and start playing around with software until I run into a question - then I go look up the solution.   Luckily I just happened to come across the tutorial Intermediate: Arrays. And this is what is so great about the Rev community - a spirit of sharing how to do things.

Our game uses an anagram finder to look up a list of possible words. My programming friend and I both learned a lot from going through the tutorial - especially understanding that arrays are much faster for doing searches than using fields.

Trying out the anagram finder in the tutorial we discovered that it was blazing fast for searches up to eight letters.   But if we input nine letters it slowed a bit and when we input ten letters it became too slow for our application. (In our first game, Quickdraw, we need to search for all possible anagrams within ten letters.)

So building on what we learned from the tutorial, we decided to do two main things:   1) Separate the word list into groups of words of the same length, and 2) optimize the words by creating an alphabetized key at the front of each word.

To make it a fair comparison, we used the same word list, SOWPODS, that was used in the tutorial for running speed tests.   (In our actual game, we use our own word list combined from various sources.)

We wrote a preliminary app to sort the word list into groups of same length words. Then I just used Microsoft Word to put each word into all CAPS.

Now we needed to optimize the word list with the alphabetical keys. We created another app for doing that.   Here is a screen shot of the stack where we add the keys to the words and maintain our word lists within the main app:

Word Lists

It appears a little fuzzy here, but you can see there is a field for importing a word list and then a field for adding the lower case key. We also have a field for each length of word.   Of course the fields are necessary to store the data because arrays only reside in memory.

Here is a screen shot of the actual anagram finder:

Anagram Finder

We decided to add a timer and show the number of words found.   The button at the bottom takes the user to the stack in the previous screenshot.

When the app first loads, we put the words from the fields into arrays. As previously mentioned, searching in arrays is much faster than searching fields.

Here are the comments from the card script for this procedure:

For each length of word (2 thru 10), we find each key/word in the field and put the first half (key) of the line from the field into the array variable name.   Then we put the word (the second half of the line) as a value into the array name.   This way, if two lines from the word field have the same lower case key, the two words are put into the same array variable.   Example: abAB and abBA are two lines in the field of 2 letter words.   We find the first, abAB, and put the first half, ab, into wKey. "ab" becomes the name of the array variable.   We put the second half, AB, as the value in the array variable "ab".   Then we find the second line in the word field, abBA, which has the same first half value, ab.   Since there is already an array name of "ab" it does not create a new variable name. It then appends the second half, BA, into ab with a comma before it.   The result is an array variable name, ab, with the value AB,BA in it.   Now when we do our searching, we only have to find the ab key one time and it will show us all the words that can be made with that key.

Here is the code for putting the 2-letter words from the field into the array:

repeat for each line wCounter in field wordList02 of stack \
   wordList
   put char 1 to 2 of wCounter into wKey
   put char 3 to 4 of wCounter & ", " after wgList02[wKey]
end repeat

When we do a search, we alphabetize the letters that the user input and then create all the possible combinations (keys) for those letters.   For example, if the user input the letters S T O P, it would first alphabetize the letters as opst and produce the following keys:

op, os, ot, ps, pt, st

ops, opt, ost, pst

opst

Using the code above will give us an array name of "opst" and the value of that array name will be OPTS,POST,POTS,SPOT,STOP,TOPS.   We put the value into a new array, foundWords.   If the user had entered more than 4 letters, such as S T O P N, we would add the other 4-letter words found to the array foundWords[4] and the result would be: OPTS,POST,POTS,SPOT,STOP,TOPS,SNOT,ONST,PONS,PONT,etc.

Because we know that each word is 4 letters, we can extract the words using the length function:

repeat with counter = 10 to 2 step –1  -- the array potentially \
   has words of 10 to 2 length
   put foundWords[counter] into allDisplay  -- this will \
   include all of the found words
   put "" into char (the length of allDisplay - 1) of \
   allDisplay   -- erase the trailing comma and space
   put " " before allDisplay  -- needed before sorting - \
   words have a leading space for word wrap
   sort items of allDisplay  -- put each group of words in \
   alphabetical order, a nice touch!
   put return into returns
   if allDisplay = " " then 
      put " (none)" into allDisplay  -- no words of that \
      length were found
   else
      put return after returns   -- so that the final \
      display is neatly organized
   end if
   put counter & " Letter Words:" before allDisplay    -- \
   “counter” is the length of this group of words
   put returns & allDisplay after field possibleWordsLists
end repeat

The result of a search looks like this:

All Possible Words

Now for the speed trials:

As I said, for the trials we used the same word list as the anagram finder in the tutorial Intermediate: Arrays.   I added the same timer we use to the Rev file in the tutorial. (The timer uses the long seconds function.) I compiled both programs and closed all my other apps that were running before conducting the speed tests.   My machine is a 1.8ghz Pentium 4 with 384MB of RAM and running XP Pro - Service Pack 2.   I input each of the following words 5 times and took the average time.

Letters entered for searching Tutorial Anagram Finder Our Anagram Finder
EAT 0.003 0.220
POST 0.003 0.221
MERIT 0.005 0.253
MERITS 0.013 0.278
CRAVING 0.060 0.359
CRAVINGS 0.448 0.584
PLENTIFUL 4.588 1.059
HNYLBTEGMO 152.107 3.172
AEIOURSTLN 151.665 3.156
ENCHANTERS 151.373 1.930
RAMONABEAN 157.479 1.484

As you can see, we get slaughtered when searching on input lengths of three to seven letters. We almost catch up at eight letters, start pulling away at nine letters and when ten letters are input, it is an important difference in how long it takes to generate the list of found words.

Without the excellent tutorial, Intermediate: Arrays, we would not as easily have found the solution for our needs.  Of course we learned a great deal from other tutorials and using the other Rev community resources. This is just one example.

We are not sure exactly why there is such a discrepancy in the speed times of our anagram finder when ten letters are input, but how many words found seems to be a factor.   In any case, about 3.2 seconds is the longest time we have noticed for any ten-letter combination.   Our game has the challenge of adding a U if a Q is among the ten letters and so that slows it down about a second per search. But we are also doing some extra optimizing of the word lists and a few tricks with the arrays that more than makes up for the lost speed of the Q logic.

In the first rendition of our game, we are using a separate application to generate the list of words while the user is playing the game because the focus can only be on one thing at a time. Because we are still new to Revolution we haven't yet figured out all the nuances of using a database to pass information between the two applications, so we just use .txt files until we can use a database.   Once we are able to implement the database, the results will always appear instantly to the user!

As I mentioned at the beginning, there are over 200 games that can be played with our Platypotapus card deck. Now that we have our first game at a point of playability, we are anxious to use Revolution to refine it and then start programming the hundreds of other games that can be played with the card deck.

In addition to being so easy to develop with, Revolution also gives us the tremendous bonus of compiling at once for the three different platforms. We look forward to a long and happy relationship with Revolution and the Revolution community.

The Revolution file for this project is anagram.rev and can be downloaded at:

www.wattsongames.com/revdocs.htm

Main Menu What's New