The following article was printed in April 1981 of the magazine „Microcomputing".

The assembler source of the listing illustrated here will also be found in SIG/M1 volume 75. [Here the original 8080-source and here the adapted Z80-source for the M80 assembler]. Anywhere in the largeness of the use-net I found a similar utility of a Shell-Metzner sort; [here as 8080-assembler routine and here as Z80-assembler routine]. Find another sample in SIG/M volume 78.

An interesting article concerning sort of data.

No list of records is too short or long for this assembly-language Shell-Metzner sort.

Sorta Super Fast

By Albert J. Marino

This program will let you sort 32K in one minute. This means you can sort 2000 names in one minute and, in fact, can sort 10,000 names in a reasonable time. The routine can easily be used with any BASIC or other language as long as it supports external (user) routines and has a POKE or similar function.

I originally wrote a machine language program that worked on the algorithm shown in Fig. 1.

Fig. 1.

This routine was short and simple, required no scratch space and was about four times faster than a modified bubble sort. I have never seen this algorithm implemented by anyone else and it might be unique; it certainly is good enough for most applications.

However, when the files reached 2000 names, I found that this routine was taking two to three minutes. Since the time for the routine is approximately a function of N², this meant that 10,000 names would take over an hour. I decided I needed a better routine.

The one I settled on was the Shell-Metzner sort shown in Fig. 2.

Fig. 2.

Listing 1 shows the 8080 assembly-language program for this sort. Of course, the program will work for any 8080- or Z-80-based computer. The listing happens to be assembled at location AB00H but can be reassembled for any starting address. Note that the routine takes 203 bytes, including the 16-byte scratch area at the beginning.

Using the Routine

Once the program is in memory, it can be used by any other program, as long as it is not overwritten. To use it follow these steps:

Put the records to be sorted into memory starting at any location and sequentially up in memory. The records must be the same length.
Put the number of records to be sorted in location n1 (AB00H in our listing) and also in location m1 (AB02H). Note that the numbers are 16-bit (two-byte) numbers and are stored in standard reversed format with low byte first and high byte second. For example, if the number of records is 4000 (=0FA0H) then 160 (A0) is put into location AB00H and 15 (0F) is put into location AB01H.
In like manner, put the length of each record into location k1 (AB04,5) and the starting address in location j1 (AB06,7).
Transfer control to the program (call) at location ORG + 16 (AB10H). The program will then sort the records in ascending order and return.
To sort in descending order, change the byte at location ORG + 6FH (AB6FH) from D2 to DA before calling the program.

Test Results

To give times for comparison, I filled my memory with a random pattern of 0s and 1s. This minimum number of values for each location results in the maximum times for sorting. Test results are shown in Table 1a. The results for some of the same tests using the original algorithm are shown in Table 1b.

AMOUNT OF	#	#	TIME TO SORT
MEMORY	RECORDS	BYTES/REC	SECONDS
16K	4K	4	22
16K	2K	8	22
16K	1K	16	15
32K	8K	4	65
32K	4K	8	75
32K	2K	16	54
32K	1K	32	34
Table 1a.
16K	4K	4	775
16K	2K	8	285
16K	1K	16	110
Table 1b.

The improvement is significant especially for large numbers since the new algorithm time is approximately a function of N log(N).

An Example

In general, a good technique to use is:

For each record, load the field to be sorted into memory with a pointer. Fig. 3 shows a typical example.
Sort using the routine given here.
Use the now sorted pointers to access each record, in order.

Fig. 3.

As an example, suppose you want to do an alphabetical sort of names stored in a file in ASCII format (this is the normal string and file format of most languages). Getting the correct results is not difficult, but it does require some care. Consider the following correctly sorted names:

de la Fleur, Conrad
Diebold, Tom
DiGeorge, Olaf
DiGeorgeo, Dick
di Georgeo, Len
Di Georgeo, Pete
Dillingham, Sam

If you had sorted these names by just loading last name first name, into memory, you would have gotten:

Di Georgeo, Pete
DiGeorge, Olaf
DiGeorgeo, Dick
Diebold, Tom
Dillingham, Sam
de la Fleur, Conrad
di Georgeo, Len

To get around this problem, make the following changes as the names are loaded into memory for sorting:

Convert all characters to lowercase (or uppercase).
Remove all spaces and punctuation in the last name.
Insert a delimiter between last name and first name with ASCII value lower than letters (I used a comma).
Truncate this string plus the pointer to 16 bytes or pad with blanks if shorter.

For example, de la Fleur, Conrad, becomes "delafleur,conxx" and Jones, Sam, becomes "jones,sam00000xx," where xx is the pointer and the 0's are blanks. This allows 14 letters to sort on. With over 6000 names, I have not had a mistake caused by this length restriction; with less names you might get by with even less.

After you've done the above and sorted using this routine, you can use the now-sorted pointers to access the original records in alphabetical order.

Large Files

If the file is too large to be loaded into memory, even using the above method of loading only the sort field then you need an additional technique. One method, if you don't know anything about the distribution of the field you're sorting on, is:

Read into memory as above, stopping when you have filled memory.
Sort this block 1 and save on disk.
Read more fields in, sort and save as block 2, 3, etc.
Merge the blocks by reading each block item by item and saving the appropriate entry on disk in a separate file. This works since each block is individually sorted.

This method works for files of any size, even ones that span multiple disks.

Conclusion

The sorting routine given here should let you sort your files on any field within a reasonable time. For those who wish, this program is available on a CP/M eight-inch disk or TRS-80 five-inch disk for $15 from Compleat Systems, 9551 Casaba Ave., Chatsworth, CA 91311 (213-993-1479).

The disk also contains a Z-80 version that takes 25 percent less space and runs 25 percent faster, and two additional utilities (search and move).

Address correspondence to Albert J. Marino, Compleat Systems, 9551 Casaba Ave., Chatsworth, CA 91311

1.	SIG/M (Special Interest Group/Microcomputers), a part of the Amateur Computer Club of New Jersey, used to compile public-domain software into disk volumes for regular release. The SIG/M disk set was started in 1980 and volumes 000 up to 310 may be found here.