The following article was printed in May 1980 of the magazine „Microcomputing".

An article with a BASIC sample hashing program.
[A base article relating hashing may be found here]

Hashing Revisited

Turn your names into numbers and store and retrieve them with ease.

Raymond T. Vizzone
416 Stinson Ave.
Vacaville CA 95688

It had been four months since I last argued with my partner about the merits of hashing. In fact, I had introduced this technique to him, but was unable to convince him that it was the best way to store and retrieve data on disk or in arrays. He didn't like the disadvantages of rehashing—collision, wasted space and overhead. Neither did I! I couldn't blame him for ignoring me as he went back to his sorts and binary searches. I temporarily gave up.

While reading a book called Compiler Construction for Digital Computers by David Gries, I came across hashing again. At the same time, I also discovered some similarities existing between how Gries and Donald Fitchhorn ("DOCUFORM," Kilobaud, August 1978, p. 22) used hashing methods. With a little more research, l was ready to approach my partner with confidence.

Being subtle, I told my partner that hashing was the only way to retrieve data quickly. That was an absolute statement. He ignored me. I continued, nonetheless, and claimed hashing involves no sorting or binary or sequential searches. He had heard that song-and-dance before and was still waiting for more explanation. I emphasized that data is stored in memory sequentially as it is entered, never needing to be moved. Data is always available for quick retrieval with no need for rehashing. That did it! I had him now.

Implementing HASH-IT

In explaining the hashing method shared by both Gries and Fitchhorn, I used pictorials and illustrative examples. In the beginning, I proposed a function called HASH-IT, which would change names (KEYS) into numeric values called HASH-VALUES. The process is shown in Fig. 1.

The function HASH-IT can be any method of converting an alphanumeric KEY to an integer numeric.

Fig. 1. HASH-IT function.

In using HASH-IT, each HASH-VALUE produced points into an array called the HASH TABLE (HT). For example, a HASH-VALUE of 3 is shown in Fig. 2.

Fig. 2.

In turn, the contents of each element of the HASH TABLE point to an element of the STORAGE ARRAY (SA$), which holds the data and their respective KEYS (see Fig. 3).

I needed to implement two other concepts to fully explain the madness of this method. I wanted a pointer to show me where the next empty location of the STORAGE ARRAY was. For convenience, I called this pointer EMPTY. I made the STORAGE ARRAY two-dimensional to facilitate a linked-list affair. The use of this linked list will become clear in later examples. Fig. 4 shows a general pictorial of everything I needed to explain this hashing method.


Fig. 3.	Fig. 4.

To demonstrate this hashing method using Fig. 4, I used six names to "hash-in" to the STORAGE ARRAY. The first name was JOHNSON, which hashed to a numeric value of 3. The numbered location pointed to by EMPTY was then stored in the HASH TABLE at location 3. JOHNSON was stored in the STORAGE ARRAY at the location that EMPTY pointed to before being incremented (see Fig. 5). The names SMITH and JONES hashed to values of 6 and 1, respectively. Figs. 6 and 7 show how they were handled.

Rehash

The demonstration was going well, and my partner was still with me. All three names had hashed to different numeric values. So far, so good. I used the name MURPHY next and found it hashed to a value of 3. So had JOHNSON! This was a collision, and I had promised I wouldn't rehash or recalculate the HASH VALUE in order to find an empty element in the HASH TABLE. So now I had to use that second row of the STORAGE ARRAY.

MURPHY was stored at the next location in the STORAGE ARRAY pointed to by EMPTY. This location was 3, the sequential element after JONES. Normally, this value of 3 would be stored in the HASH TABLE at the location pointed to by the HASH VALUE. However, in this case, there was already a number there, more specifically, the pointer to JOHNSON, I couldn't disrupt this arrangement because I would not be able to find JOHNSON again! So I stored the value pointed to by EMPTY in the second row of the STORAGE ARRAY element containing JOHNSON.

To find MURPHY later would first require hashing it to a value of 3 (see Fig. 8).


Fig. 5.	Fig. 6.

Fig. 7.	Fig. 8.

A look into the HASH TABLE at location 3 would find a pointer value of zero. The contents of the "zeroth" element of the STORAGE ARRAY would produce JOHNSON. JOHNSON would not be MURPHY by any means, so a further look would find that the link attached to JOHNSON pointed to the third element of the STORAGE ARRAY. Further investigation would find that MURPHY resided at this third element of the STORAGE ARRAY, and the search would be done.

The name DOE was next and gave me no trouble. It quietly hashed in and was stored as shown in Fig. 9.

JAMES was not so easy. It, too, hashed to a value of 3, right in there with JOHNSON and MURPHY. As handled before, the links were changed, and JAMES joined the group (see Fig. 10).


Fig. 9.	Fig. 10.

The Program

The program listed demonstrates the preceding pictorials and procedures. Its design is to provide a set of utility procedures to be used in other programs. The sections called HASH-IT and HASH-IN correspond to the figures above. I wrote them to simulate the ability to pass and receive parameters between procedures. Both procedures can be extracted and used in other programs.

The only change to make before HASH-IN can be truly considered general is to delete lines 520 through 524 and lines 527 and 529. These were written to facilitate the deletion ability of the program. HASH-IT is general and, as listed, will return a HASH VALUE (HASHV) for a given N$ not equal to a null string (""). The length of the HASH TABLE (LHT) must be given in line 2120.

HASH-IT was developed by my partner, Mike Smith, to produce hash values from any length keys. His method first calculates the length of the Key N$. Then, in lines 2050-2100, the ASCII values of every character in N$ are cumulatively added. Starting with the first character, every other character of N$ is added and stored in variable A(0). Simultaneously, every other character, starting with the second character of N$, is added and the sum stored in variable A(1).

This forms two separate numbers that are divided by 256, and their remainders are saved. The remainder from A(1) is multiplied by 256 and added to the remainder from A(0). The HASH VALUE is then made equal to this number divided by the variable LHT — the prime number size of the HASH TABLE. This all takes place in lines 2110-2120.

Uses

The program accepts and stores keys called SUBJECTS. Along with each SUBJECT, a 256-character miscellaneous field can be stored for later retrieval. One use of this program is to store names and phone numbers. The quick retrieval process used makes it a lot faster than your telephone book. Another idea is to provide an information network affair for clubs or others organizations. When members want to store subjects with information, they can use the ADD command. Then when members need to know further information about any subject, a query into the computer may find the desired information.

Quick retrieval and large storage capabilities make this sort of information mill a good application for a microcomputer. For a large data base, the addition of a data-save procedure can be added. A disk file can be substituted for the STORAGE ARRAY also.

Conclusion

This program can be used as is, or modified for more complex applications. The concept of hashing itself is in use with compilers and in DOCUFORM. Its explanation is intended to add another method of data storage and retrieval to a programmer's bag of tricks.

0 CLEAR : REM
                 INFORMATION MILL
1 REM :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
2 REM : WRITTEN BY R. T. VIZZONE    1979 CREATIVE CONSULTING.   :
3 REM : FOR THE APPLE II COMPUTER (REQUIRES APPLESOFT).         :
4 REM :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
5 REM
6 REM ::::::::::::::::::::::::
7 REM ::::  MAIN PROGRAM  ::::
8 REM ::::::::::::::::::::::::
9 REM
10 GOSUB 100: REM                 INTIALIZE
20 GOSUB 800: REM                 MENU
30 ON SEL GOSUB 300,900,1100,3000,1000,700
40 IF EXIT THEN 60
50 GOTO 20
60 END
70 REM
90 POKE 32,8: POKE 33,20: POKE 34,7: POKE 35,24: PRINT : RETURN
100 REM ::::::::::::::::::::::
110 REM ::::  INITIALIZE  ::::
120 REM ::::::::::::::::::::::
125 LHT = 101:LSA = 100: REM   DEFINE LENGHTS OF HASH TABLE AND STORAGE ARRAY
130 DIM HT(LHT),SA$(LSA,2)
148 FOR I = 0 TO LSA
150 ::SA$(I,0) = "-1"
160 ::SA$(I,1) = "-1"
170 NEXT I
180 FOR I = 0 TO LHT
190 :HT(I) = -1
200 NEXT I
205 NLOC = - 1:TR = - 1
220 DEF FN FOUND(A) = A + 1
230 RETURN
240 REM
300 REM ::::::::::::::::::::::::
310 REM :::::::  ADD  ::::::::::
320 REM ::::::::::::::::::::::::
330 HOME
332 REM	                        IS SA$ FILLED$
335 IF (TR < LSA) THEN 340: REM NO IT'S NOT - CONTINUE
336 REM                         YES IT IS! QUIT
337 VTAB (10): PRINT "NOT ENOUGH ROOM FOR AN ADD."
338 VTAB (12): PRINT "PRESS ANY KEY TO CONTINUE";: GET A$
339 GOTO 370
340 VTAB (5): INPUT "SUBJECT: ";N$
345 IF N$ = "" THEN 370: REM     IF USER QUIT THEN RETURN
347 VTAB (7): PRINT "INFO: "
348 PRINT
350 GOSUB 90: VTAB (7): HTAB (1): INPUT " ";MISC$: TEXT
355 ADD = 1: REM                 HASH-IN(N$,1,RN)
360 GOSUB 400
365 TEXT : HOME : VTAB (12): PRINT "THANK YOU"
367 FOR I = 1 TO 500 : NEXT I: REM WAIT TO READ THANK YOU NOTE
368 TR = TR + 1: REM              INCREMENT TOTAL RECORDS VARIABLE (TR)
370 RETURN
380 REM
400 REM :::::::::::::::::::::::
410 REM : HASH-IN(N$,ADD,RN)  :
420 REM : GIVEN N$=NAME &     :
430 REM :       ADD=0 OR 1    :
440 REM : RETURN RN=RECORD #  :
450 REM :::::::::::::::::::::::
455 REM
457 GOSUB 2000: REM                    HASH-IT(N$,HASV)
460 COLLISIEN = 0: REM                 COLLISION FLAG
470 P1 = HASHV: REM                    P1 NOW POINTS INTO THE HASH TABLE
480 P2 = HT(P1): REM                   LOOK IN HASH TABLE
485 REM                                ARE LOCATION CONTENTS EMPTY?
490 IF P2 < > - 1 THEN 600: REM        NO - GO CHECK SA$ POINTED TO BY P2
500 IF ADD THEN 520: REM               YES - DO AN ADD.
510 RN = - 1: GOTO 650: REM            YES BUT, THIS WAS A RETRIVE. RN=-1 FOR a RECORD NOT FOUND. RETURN.
515 REM                                CHECK TO SEE IF THERE ARE ANY EMPTY LOCATIONS IN SA$ MADE BY A PREVIOUS DELETE
520 IF NOT DFLAG THEN 525
521 FOR I = 0 TO NLOC
522 IF SA$(I,0) = "-1" THEN 529: REM   THERE IS ONE - USE IT.
524 NEXT I
525 NLOC = NLOC + 1: REM               THERE WASN'T ONE - MOVE THE EMPTY POINTER DOWN.
526 EMPTY = NLOC
527 GOTO 530
529 EMPTY = I:DFLAG = DFLAG - 1: REM   USING A PREVIOUSLY DELETED LOCATION DECREMENT TOTAL # OF DELETED LOCATIONS.
530 P2 = EMPTY: REM                    AT LOCATION POINTED TO BY ENPTY
540 SA$(P2,0) = N$: REM                ...STORE KEY.
545 SA$(P2,2) = MISC$: REM             ATTACH ITS MISC INFO
560 IF COLLISIEN THEN SA$(P1,1) = STR$(EMPTY): REM  THIS IS A COLLISION-ADD - CHANGE POINTERS.
570 IF NOT COLLISIEN THEN HT(P1) = EMPTY: REM  THIS A NO COLLISION-ADD.
580 GOTO 650: REM                      ADD DONE - RETURN
599 REM                                LOOK IN SA$ FOR A MATCH
600 IF SA$(P2,0) = N$ THEN 640: REM    A MATCH - RETURN.
610 P1 = P2:COLLISIEN = 1: REM         NO MATCH SET COLUMN FLAG AND
620 P2 =  VAL (SA$(P1,1)): REM            CHECK THE 2ND CCOLUMN OF SA$ FOR NEXT LOCATION TO SEARCH
630 GOTO 485: REM                      SEARCH AGAIN.
640 RN = P2: REM                       RN NOW EQUALS THE LOCATION IN SA$ OF KEY JUST ADDED OR FOUND
650 ADD = 0
660 RETURN
700 REM ::::::::::::::::::::::
710 REM ::::::: EXIT  ::::::::
720 REM ::::::::::::::::::::::
730 REM
740 EXIT = 1
750 RETURN
800 REM ::::::::::::::::::::::
810 REM :::::: MENU   ::::::::
820 REM ::::::::::::::::::::::
830 HOME
840 VTAB (5): PRINT "SELECTION MENU"
850 VTAB (10): PRINT "1.  ADD A SUBJECT"
860 VTAB (12): PRINT "2.  FIND A SUBJECT"
870 VTAB (14): PRINT "3.  DELETE A SUBJECT"
875 VTAB (16): PRINT "4.  LIST ALL SUBJECTS"
879 VTAB (18): PRINT "5.  UPDATE A SUBJECT"
880 VTAB (20): PRINT "6.  EXIT PPOGRAM": PRINT : PRINT
890 PRINT "INPUT SELECTION NUMBER PLEASE: ";
892 GET SEL$
894 SEL = VAL (SEL$)
896 IF SEL < 1 OR SEL > 6 THEN 830
898 RETURN
900 REM ::::::::::::::::::::::
910 REM ::::::  RETRIVE ::::::
920 REM ::::::::::::::::::::::
930 REM
940 HOME
950 VTAB (5): INPUT "SUBJECT: ";R$
955 IF R$ = "" THEN 999: REM             USER QUITS
960 ADD = 0: N$ = R$: GOSUB 400: REM     HASH - IN(R$,0,RN)
980 IF FN FOUND(RN) THEN 990
982 VTAB (12): PRINT "I CAN'T FIND IT.   PRESS ANY"
983 PRINT "KEY TO GO ON. ";: GET A$
984 GOTO 999: REM                        RETURN
990 VTAB (7): PRINT "INFO: "
993 GOSUB 90: VTAB (7): HTAB (2): PRINT SA$(RN,2): TEXT
994 VTAB (24): PRINT "PRESS ANY KEY TO GO ON. ";:
995 GET A$
996 VTAB (24): HTAB (1): CALL - 868
999 RETURN
1000 REM :::::::::::::::::::::
1010 REM ::::  UPDATE  :::::::
1020 REM :::::::::::::::::::::
1030 REM
1040 GOSUB 900: IF NOT FN FOUND(RN) OR R$ = "" THEN 1070: REM  IF USER QUIT OR NOT FOUNO THEN RETURN
1041 VTAB (23): HTAB (1): CALL - 868
1042 GOSUB 90
1047 VTAB (7): INPUT MISC$: REM      DISPLAY MISC INFO UPDATE WITH CURSOR.
1060 SA$(FOUND,2) = MISC$
1070 TEXT : RETURN
1100 REM :::::::::::::::::::::
1110 REM ::::  DELETE  :::::::
1120 REM :::::::::::::::::::::
1130 REM
1140 GOSUB 900: REM                  FIND RECORD REQUESTED
1145 IF R$ = "" THEN 1200: REM       IF USER QUIT OR
1147 IF NOT FN FOUND(RN) THEN 1200: REM ...RECORD NOT FOUND THEN RETURN.
1150 VTAB (24): HTAB (1): PRINT "DO YOU WISH TO DELETE THIS INFORMATION ";: GET A$
1160 IF A$ = "N" THEN 1200
1170 SA$(RN,0) = "-1": REM           ERASE DELETED ENTRY.
1180 FOR I = 1 TO LHT: REM           CHECK DELETED ENTRY TO SEE IF...
1185 REM                             ...IT'S 1ST IN A SERIES OF COLLISIONS. IF SO...
1190 IF HT(I) < > RN THEN 1194: REM  ... REPLACE ITS POINTER IN HT WITH ITS COLLISION POINTER FROM ITS 2ND COLUMN
1192 HT(I) = VAL (SA$(RN,1))
1193 SA$(RN,1) = "-1": GOTO 1195 REM CONSEQUENTLY ERASE ITS 2ND COLUMN.
1194 NEXT I
1195 DFLAG = DFLAG + 1: REM          INCREMENT DFLAG TO INDICATE ANOTHER EMPTY LOCATION EXISTS.
1197 TR = TR - 1: IF TR < 0 THEN TR = - 1: REM DECREMENT TOTAL RECORDS, NOT TO BE <-1
1200 RETURN
2000 REM :::::::::::::::::::::::
2010 REM : HASH-IT(N$,HASHV)   :
2020 REM : GIVEN N$, RETURN    :
2030 REM : HASHV               :
2040 REM :::::::::::::::::::::::
2042 REM                              DESCRIBED IN ACCOMPAYING ARTICLE
2045 HASHV = 0
2047 N = LEN (N$)
2050 A(0) = 0:A(1) = 0
2060 FOR I = 1 TO N
2070 ::J = INT ((I / 2 - INT (I / 2)) * 2 + .05) * SGN (I / 2)
2080 ::A = ASC (MID$ (N$, I))
2090 :A(J) = A(J) + A
2100 NEXT I
2110 HASHV = ( INT ((A(0) / 256 - INT (A(0) / 256)) * 256 + .05) * SGN (A(0) / 256)) + 256 * ( INT ((A(1) / 256 - INT (A(1) / 256)) * 256 + .05) * SGN (A(1) / 256))
2120 HASHV = INT ((HASHV / LHT - INT (HASHV / LHT)) * LHT + .05) * SGN (HASHV / LHT)
2140 RETURN
3000 REM :::::::::::::::::::::::
3010 REM :::: LIST ALL    ::::::
3020 REM :::::::::::::::::::::::
3030 REM                              PRINT ALL ELEMENTS OF SA$ THAT ARE NOT EMPTY (NOT -1)
3035 HOME
3040 FOR I = 0 TO NLOC
3045 IF SA$(I,0) = "-1" THEN 3090
3050 HTAB (5): PRINT SA$(I,0)
3060 CV = PEEK (37): REM              IS SCREEN FULL(I.E. IS CURSOR AT BOTTOM OF SCREEN)?
3070 IF CV < 21 THEN 3090: REM        NO - CONTINUE.
3075 PRINT : REM                      YES - CHECK WITH USER TO CONTINUE
3080 PRINT "PRESS ANY KEY TO CONTINUE OR 'Q' TO QUIT";: GET A$: IF A$ = "Q" THEN 3110
3085 HOME
3090 NEXT I
3095 PRINT
3100 PRINT "PRESS ANY KEY TO GO ON ";: GET A$
3110 RETURN

Program listing. [CP/M source]

The original listing indicates that the program is written for an Apple II computer, as shown by instructions such as CALL, PEEK and POKE. For additional information take a look at the APPLE CALL, PEEK, POKE LIST.