Issue 59 * October 22 2008

The Power of Sort
Arranging data becomes as easy as ABC

by Jeanne DeVoto

A few months ago, Hugh Senior demonstrated how to do a custom sort based on a fixed list of values. (For example, you can use a function to sort a list of dates by the day name: Monday, Tuesday, Wednesday, and so on.) The technique of using a custom function with the sort command is very powerful; sorting by a value list is one of its most useful manifestations, but only one. This article shows a few more ways to use the technique:

  • for checking data integrity on the fly, to prevent mis-entered data from messing up your sort;
  • to allow data to be stored and displayed in a more readable format, while still allowing sorting;
  • and for sorting data that's not in very sortable form.

The Sort Key

When you're sorting something, the sort key is the part you're sorting by. If you're sorting a list of lines, the sort key might be the entire line:

sort myVariable 

Or it might be just part of the line:

sort myVariable numeric by item 2 of each -- "each" means each \
line

When you're sorting cards, the sort key is usually a field:

sort cards dateTime by field "Modification Date" 
But it can also be part of a field, or a combination of fields:

sort cards by word 2 of field "Name"
sort cards by field "City" && field "State" 

A sort key, in fact, can be any valid expression at all...

sort myVariable by 1*1 -- doesn't do anything, but it's valid
sort myVariable by the length of each -- sorts by length of line
sort myVariable by myCustomFunction(each) -- sorts by what the \
function returns

...including a custom function, since functions are valid expressions. Since you write the custom function, it can transform the data any way you want to obtain the desired sort results.

Now for some examples.

Checking Data Integrity

One reason to use a custom function is if your data may not be in the correct form. For example, suppose you have a stack with an "Address" field, and you want to sort the cards by postal code:

sort cards numeric by last word of field "Address" 

If each address is in a simple form, this works. However, what if some of the addresses don't have a postal code? What if some are in a format that doesn't put the postal code last, or have non-numeric postal codes? The sort above will fail to work properly in that case.

What we really want is a way to say "sort cards by last word of field 'Address', but only if the last word is a five-digit number". In situations like this, we can write a function that checks the data to see whether it's in the correct form:

function verifiedPostalCode theData
   -- verifies the data is a US-style 5-digit zip code
   if the length of theData is 5 and theData is an integer and \
   theData > zero
   then return theData
   else return "100000"
end verifiedPostalCode 

If you pass this function a 5-digit zip code, it returns the zip code. Otherwise, if the postal code isn't numeric or doesn't have five digitsí, it returns the number "10000". (We use this number because it's larger than any five-digit zip code, so if we're doing a numeric sort, it will be sorted to the last position.) Then you use the function like this:

sort cards numeric by verifiedPostalCode(last word of field \
"Address")

This sorts the cards in the stack by their zip code, and places all cards that don't have a valid zip code at the back of the stack.

Of course, we could write different function for the postal code format used by another country, or a more sophisticated function that would accommodate multiple postal-code formats and sort by country as well as postal code. We also might want to accommodate US 5+4 zip codes as well as the 5-digit kind. All this can be done by expanding the same verifiedPostalCode function.

Adjusting data for alphabetization

Suppose you have a list of words and phrases. This might be a list of book titles, terms in an index, or song names. Some of these names will start with "A", "An", or "The". Customarily, when you alphabetize such a list you ignore these words; otherwise, half your song titles might be listed under "A". However, the sort command doesn't ignore them.

Libraries and book indexes often change the format of list items to match the desired sort order. For example, "A Walk in the Park" might be listed as "Walk in the Park, A". However, this isn't the most readable form and we might not want to require that titles be entered this way in the list. For this situation, we need a custom function to let the sort command ignore articles like "a" or "the" when they're at the start of a line:

function noArticles theTitle
   if word 1 of theTitle is among the words of "a an the" then \
      delete word 1 of theTitle
      return theTitle
   end noArticles 

When you use this function, it removes articles at the start of the sort key. The effect is that the list is sorted as though the leading article weren't there, so "A Walk in the Park" is alphabetized under "W" instead of "A".

As you can see from these few examples, there's almost no limit to what you can do with sorting by your own custom function - all without modifying the underlying data or changing the way information is displayed.

Main Menu What's New