CP/M Encryption Prescription

The following article was printed in November 1980 of the magazine „Microcomputing".

The article describes a method to encrypt files.

CP/M Encryption
Prescription

A software cipher will ensure privacy for your files–and peace of mind for you.

Alan Sclawy
1119 Avenue 1
Brooklyn, NY 11230

You hear a lot about data encryption these days – ways of scrambling confidential files so they can't be read by intruders. There is now a data encryption standard (DES); a chip implementing this standard; and lots of proposals for alternative systems, including some very attractive "public-key" systems.

But many of us can't use these sophisticated techniques. In my case, for example, I work in an academic time-sharing environment; the computer center would take a dim view if I tried to install a DES chip as a peripheral. Still, schools are notorious for characters who regard the computer, its users and their files as fair game. It was important to be able to edit, format and store sen-sitive materials, like examinations and class records. So I devised this simple way to frustrate snoopers.

The program is easy to implement and convenient to use, yet it is based on sound and well-understood cryptographic techniques. Although files encrypted with this program would probably yield their secrets to a determined professional in a few hours, they will be secure enough to thwart the amateur, which is all most of us usually need.

The version printed here is written in BASIC for use under CP/M, but I have written other versions in Fortran and in C and found them equally satisfactory.

Before describing the program, I want to define a few terms. Encryption is any kind of transformation of a text to make it unreadable except by authorized individuals. A cipher is an encryption method that transforms the text letter by letter, and to enci- pher a text is to apply a cipher to it. (That's what my program does.) Plaintext is the message to be encrypted (or enciphered); ciphertext is the enciphered version of the message. To decrypt a message is to transform it back so it can be read, and to decipher is to decrypt an enciphered message. Cryptanalysis is analyzing and cracking an encryption technique by studying the material transmitted. It's usually assumed that an authorized recipient deciphers a message, while an unauthorized individual resorts to cryptanalysis.

Plaintext:              H        e        l        l        o
Plaintext (ASCII):  01001000 01100101 01101100 01101100 01101111
Key:                10010101 11000100 00110101 11011010 01010110
Cipher text :       11011101 10100001 01011001 10110110 00111001

Fig. 1. Example of text enciphered by Vernam's method.

         p   q  |  p XOR q
        --------|---------
         0   0  |     0
         0   1  |     1
         1   0  |     1
         1   1  |     0

Fig. 2. Table of the exclusive-OR (XOR) function.

         _                         _
        |B| O Y B O Y B O Y B O Y |B| O Y B
   (A)  | |                       | |
        |G| I R L G I R L G I R L |G| I R L
         -                         -

         _                         _
        |P| A M E L A P A M E L A |P| A M E
   (B)  | |                       | |
        |J| O H N J O H N J O H N |J| O H N
         -                         -

Fig. 3. Getting long key periods from short keywords. (a) Key lengths have no common factors; period = (3*4) = 12. (b) Key lengths have a common factor of 2; period shorter than maximum.

The Method

The encryption method used here was devised by Gilbert S. Vernam, an engineer at AT&T, in 1917. That was a long time before the computer era, but his techniques will look very familiar to anyone who has been around computers. Vernam's algorithm has been described in great detail by David Kahn in his classic book, The Codebreakers. Briefly, it consists of the following steps:

Encode the message as a bit string.
Generate a random bit string of the same length as the message. (This is the key.)
Execute an exclusive-OR (XOR) between corresponding bits of the message and the key. The output bit string is the ciphertext.

To decipher the message, you apply the same key to the ciphertext and execute another bitwise exclusive-OR; the output will be the plaintext.

To see how this works, let's walk through a simple example. In the computer, step 1 is done for us already. The only way the computer can handle alphabetics is by representing them as bit strings. Fig. 1 shows the encryption of the message, "Hello," by Vernam's method. The first bit string is the ASCII representation of the message; the second bit string is the key – a random sequence of 1's and 0's.

To see how we arrive at the third bit string, you must look briefly at the exclusive-OR operation, shown in the table in Fig. 2. The usual way of describing this is to say that p XOR q has a value of 1 if either p or q is 1, but not both.

But there is another, more meaningful, way of looking at this. Look at Fig. 2 again and notice that if q = 1, the operation complements the p bit, while if q = 0, the p bit is left unchanged. XOR can therefore be viewed as a sort of controllable bit-flipper, and this is the way it functions in Vernam's algorithm.

Returning to Fig. 1, you will see that at every place where the key bit is 0, the message bit is copied into the cipherbit unchanged, and where the key bit is 1, the cipherbit is the complement of the message bit. Vernam's method thus flips bits in the message at random. To decipher the message, we apply the same key; this time it flips these same bits back again and restores the plaintext.

This system is a good deal more powerful than that description might lead you to expect. The key to the system, and to all its strengths and weaknesses, lies in the word "random." What makes a key random, anyway? Without going into all of the philosophical ramifications of probability theory, we can say that for our purposes a bit string is "random" if it is unpredictable and "almost random" if it is extremely difficult to predict.

An almost-random key is usually generated by some controllable process, and the whole trick of cracking a cipher consists of unearthing the controllable process and thus making the key predictable. A truly random key is generated by a naturally unpredictable process, like the tossing of a coin or the random emissions of radioactive decay. A Vernam cipher using a truly random key, and using it only once, is absolutely uncrackable; the reasons for this are elegantly explained in Kahn's book. (Stealing a copy of the key will enable you to read the message, but theft is not ordinarily considered a cryptanalytic technique.)

The key in my program is almost random. The commonest way of getting an almost-random key is by deriving it from a keyword. The main problem with this is that any keyword will be too short. That's because any almost-random key will eventually have to repeat, and discovering and analyzing these repetitions is the primary method of attack in cryptanalysis. The longer the period of the key, the harder it will be to mount such an attack.

The easiest way to generate a long-period key was discovered by a colleague of Vernam's. He realized that if you used two keys of different lengths, the pattern of the combined key would not repeat until the two words got back in step again.

For example, in Fig. 3 I use the keys BOY and GIRL. You will see that, for example, B and G occur together the first time, and then get out of step, until after four repetitions of BOY and three repetitions of GIRL they are back in step again. This illustrates the general rule: if keywords are used whose lengths have no common factors, the period of the composite will be equal to the product of the lengths of the keywords.

My program uses three keywords (and could quite easily be extended to more), and is capable of quite long periods. For example, the keys PHILANTHROPIC, BITTERSWEET and HENDECASYLLABIC when combined will yield a composite key with a period of (13*11*15) = 2145 characters.

The program is given in Listing 1. Lines 390 to 420 identify and open the file being encrypted and the file that receives the encrypted version. Lines 450-530 accept keywords from the user. The program recognizes the difference between upper and lowercase keywords.

Listing 1. File encryption program in BASIC.

100 REM
110 REM ************* FILE ENCRYPTION PROGRAM *************
120 REM
130 REM    USES VERNAM'S ALGORITHM WITH THREE KEYWORDS
140 REM
150 REM    ALAN SCLAWY, JULY, 1980
160 REM
170 REM ***************************************************
180 REM
190 REM VARIABLES:
200 REM
210 REM   F$            FILE NAME
220 REM   K1$,K2$,K3$   KEYWORDS
230 REM   K1, K2, K3    POINTERS TO CHARACTERS IN KEYWORDS
240 REM   L1, L2, L3    LENGTHS OF KEYWORDS
250 REM   IN$           TEXT LINE CURRENTLY BEING ENCRYPTED
260 REM   EN            LENGTH OF IN$
270 REM   Z$            CHARACTER FROM IN$ CURRENTLY BEING
280 REM                   ENCIPHERED
290 REM   P             HOLDS NUMERICAL EQUIVALENT OF Z$
300 REM   Y$            ENCRYPTED VERSION OF Z$
310 REM   OT$           ENCRYPTED VERSION OF IN$
320 REM
330 REM ***************************************************
340 REM
350 REM ------- MAKE ROOM FOR STRINGS 
360 CLEAR 400
370 REM
380 REM ------- GET FILE NAMES & OPEN FILES 
390 INPUT "Input file"; F$
400 OPEN "I", 1, F$
410 INPUT "Output file"; F$
420 OPEN "O", 2, F$
430 REM
440 REM ------- GET KEYWORDS
450 INPUT "Key"; K1$
460 K1 = 1
470 L1 = LEN(K1$)
480 INPUT "Key"; K2$
490 K2 = 1
500 L2 = LEN(K2$)
510 INPUT "Key"; K3$
520 K3 = 1
530 L3 = LEN(K3$)
540 REM 
550 REM --------- LOOP ON RECORDS -------------------------
560 FOR I = 1 TO 32767
570 IF (EOF(1) ) THEN 920 
580 LINE INPUT #1, IN$
590 EN = LEN(IN$)
600 REM PRINT "EN = "; EN
610 REM PRINT IN$
620 OT$ = ""
630 REM
640 REM --------- LOOP ON CHARACTRES WITHIN RECORD --------
645 IF (EN=0) THEN 880 
650 FOR J = 1 TO EN
660 Z$ = MID$(IN$, J, 1)
670 U$ = MID$(K1$, K1, 1)
680 P = ASC(Z$) XOR ASC(U$)
690 K1 = K1 + 1
700 IF (K1 > L1) THEN K1 = 1
710 V$ = MID$ (K2$, K2, 1)
720 P = P XOR ASC(V$)
730 K2 = K2 + 1
740 IF (K2 > L2) THEN K2 = 1
750 W$ = MID$(K3$, K3, 1)
760 P = P XOR ASC(W$)
770 K3 = K3 + 1
780 IF (K3 > L3) THEN K3 = 1
790 Y$ = CHR$ (P) 
800 REM --- PREVENT INADVERTENT NULL, CR, LF, OR EOF 
810 IF P <> 0 AND P <> 10 AND P <> 13 AND P <> 26 THEN 830
820 Y$ = Z$
830 OT$ = OT$ + Y$
840 REM PRINT J; ASC(Z$), U$; V$; W$; ASC(Y$)
850 NEXT J
860 REM --------- END OF LOOP ON CHARACTERS
870 REM ------- WRITE OUTPUT RECORD
880 PRINT #2, OT$
900 NEXT I
910 REM --------- END OF LOOP ON RECORDS
920 CLOSE 1, 2
930 END

Program listing.

(a)   Encryption

     RUN
     Input file? PLNTXT
     Output file? CYPTXT
     Key? ALPHA
     Key? BETA
     Key? EPSILON
     OK


(b)   Decryption 

     RUB
     Input file? CYPTXT
     Output file? CLRTXT
     Key? ALPHA
     Key? BETA
     Key? EPSILON
     OK

(c)   Input message

        This is an example of a paragraph of text encryp- 
ted by means of Vernam's algorithm.  In encryption, bits in 
the ASCII codes for the characters are flipped at random in 
a pattern set by the key, transforming the message into 
gibberish.  In decryption, the same bits are flipped back 
and the original text is restored.  If the key is per- 
fectly random, the cipher is uncrackable; if the key has a 
long period, the cipher is crackable only with difficulty. 

(d)   Output ciphertext 

OT?)

Fig. 4. Sample program run with input and output.

Lines 560-900 are a big loop for reading, enciphering and writing lines from the source file. Input is done by a line input statement in line 580. We need to use line input because an ordinary input statement will read a character string up to the first comma and then think its job is done. You want to handle bigger chunks of data than that. The line input statement will read characters until it finds a carriage return (CR). Ideally we would like to read a whole sector from the disk each time, but that requires I/O capabilities that are not provided in some dialects of BASIC; using line input is a compromise.

Lines 650-850 are a loop for enciphering one line of text. Each character is selected from the line, using a MID$ operation, in line 660. Vernam's XOR operations can be seen in lines 680, 720 and 760. (BASIC will not let us do XOR operations between characters, so you have to convert them to numbers by means of the ASC function and then convert the result back again by means of the CHR$ function at the end.) First you XOR the plaintext with key #1 in line 680; then you XOR the result with key #2 in line 720; and finally you XOR that result with key #3 in line 760. Key letters are selected each time by MID$ operations, and after each XOR the key's index is advanced and wrapped back around to the beginning of the key if necessary.

The I/O conventions of CP/M and BASIC pose some special problems for us. First, a control-Z character is used by CP/M to mark the end of a file. Therefore, if you should accidentally hit upon a combination of message and key characters that together produce a control-Z, this will be written onto the ciphertext file. When you decipher, the program will find this control-Z and think that the file ends there, and all the rest of the file will be irrecoverable. Similarly, an accidentally produced CR will be treated as a delimiter by the line input operation.

I also found out, the hard way, that BASIC will not insert a null into a character stream, and as a precaution I thought it wise not to allow line feeds. Thus whenever any of these characters results from the encryption process, you give up and retain the plaintext character. I feel a little uncomfortable about retaining plaintext characters, but since these situations arise infrequently and at random, it is probably safe to do so. Lines 810 and 820 test for these cases and take care of them. Each new ciphertext character is appended to the output string in line 830, and in line 870, after the line is complete, the output string is written into the output file.

The program includes a few added print statements in lines 600, 610, 840 and 890 for test and verification purposes. For actual use, these statements are disabled by turning them into remarks, but I have left them in for your convenience. The program generally is written a little less tightly than it might have been; I did this for ease of development and left it that way for clarity.

Fig. 4 shows a typical encryption run followed by a typical decryption run, together with the plaintext used and ciphertext obtained. (The correspondence between the two texts is not exact, since some of the ciphertext characters are non-printing.) Using a 2 MHz CPU and Microsoft's Version 4.3 BASIC under CP/M, it takes just about a minute to encipher the sample paragraph shown in Fig. 4. Execution can be speeded up by deleting the remarks and combining some of the statements. For real speed, however, you would be better off using a compiler-type BASIC, or hand-compiling the program in assembler language.

A word about keyword selection. Ideally a keyword should be a completely random bunch of symbols, and it should be easily memorized so it does not have to be written down. Of course, these are contradictory requirements. The important points to remember are these:

First, keep the keywords long, so the composite key period will be as long as possible. Ideally, the period of the key should be at least as long as the file you're encrypting.

Second, don't use keywords that have anything to do either with yourself or with the matter that you are encrypting. Using your own name or your spouse's name is out! The temptation to draw keywords from the file itself is nearly irresistible, but you must resist it. When encrypting a file containing notes for a patent application, don't key it with words like "invention," "concept" or "disclosure." Use crazy words like "gesundheit," "anamorphosis" or "3nB%r7*Q9Xm<."

Third, remember that the lengths of the three keywords must have no common factors if the composite period is to be as long as possible. For example, Fig. 3b shows what happens if the keywords both have a factor of 2. The individual keys have lengths of 4 and 6, but you will notice that the composite repeats after 12 characters instead of 24.

Choosing three long keywords of mutually prime lengths all unrelated to the material you are protecting may seem tedious and fussy, but armed with such a key, this simple encryption program will give even a professional codebreaker a fair amount of trouble.