Subsections

Writing programs

The first stage in the development of a new program consists of analysing the problem that the program must solve. Unfortunately, there is no known method or methodology which will solve any kind of problem. However, a particularly good book on problem solving was written by George Pólya(see the Bibliography) and although the book is geared towards mathematical problems, it will help you solve most technical problems.

Problem analysis is not usually taught to beginners at computer programming because, so far as we know, it is mainly an intuitive activity (it is a branch of Heuristics). Learning to analyse a problem with the intention of writing a computer program is largely accomplished by writing simple programs followed by programs of increasing sophistication--this is sometimes called “learning by doing”. When we start analysing actual programs later in the chapter, each such analysis will be preceded by a problem analysis. You will be able to see how the program, as presented, accords with that analysis.

Nevertheless, even though no definitive method can be given, there are guidelines which help you to appreciate and analyse problems suitable for computer solution. In the field of systems analysis, you will find various methodologies (such as SSADM). These are usually geared towards large-scale systems and are designed to prevent systems designers from forgetting details. In the context of program design, knowing the data to be used by the program and the data to be produced by the program is the principal guide to knowing what manipulations the program must perform. Data knowledge specifies the books accessed by the program and usually constitutes a substantial part of the program's documentation.

Once you know the data your program operates on, you can determine the actual manipulations, or calculations, required. At this stage, you should be able to determine which data structures are suitable for the solution of your problem. The data structures in turn lead you to the mode declarations. The kind of data structure also helps to determine the kind of procedures required. Some examples: if your data structures include a queue, then queue procedures will be needed; or, if you are using multiples (repeated data), then you will almost invariably be using loops. Again, if an input book contains structured data, such as an item which is repeated many times, then again your program will contain a processing loop. The Jackson programming methodology is a useful way of specifying procedures given the data structures to be manipulated (see the bibliography).

Top-down analysis

After you have determined suitable modes and procedures, you need to analyse the problem in a top-down manner. Basically, top-down analysis consists of determining the principal actions needed to perform a given action, then analysing each of the principal actions in the same way. For example, suppose we wished to write a program to copy a book whose identifier is given on the command line. The topmost statement of the problem could be

   copy an identified book

The next stage could be

   get the book identifier
   open the book
   establish the output copy book
   copy the input book to output
   close both books

At this stage, the process “copy the input book to output” will depend on the structure of the input book. If it is text, with lines of differing length, you could use a name of mode REF STRING. If the book contains similar groupings of data, called records, then it would be more appropriate to declare a structured mode and write appropriate input and output procedures:

   DO
      get record from input book
      put record to output book
   OD

The analysis is continued until each action can be directly coded.

Program layout

Before you start coding the program (writing the actual Algol 68 source program), you should be aware of various programming strategies besides the different means of manipulating data structures. The first to address is the matter of source program layout.

In the examples given in this book, code has been indented to reflect program structure, but even in this matter, there are choices. For example, some people indent the THEN and ELSE clauses of an IF clause:

   IF ...
     THEN ...
     ELSE ...
   FI

instead of

   IF   ...
   THEN ...
   ELSE ...
   FI

Others regard the parts of the IF clause as some kind of bracketing:

   IF
      ...
   THEN
      ...
   ELSE
      ...
   FI

Some people write a procedure as:

   PROC ...
      BEGIN
      ...
      END

Others never use BEGIN and END, but only use parentheses.

Another point is whether to put more than one phrase on the same line. And what about blank lines--these usually improve a program's legibility. Whatever you decide, keep to your decision throughout the program (or most of the program) otherwise the format of the code may prove confusing. Of course, you will learn by your mistakes and usually you will change your programming style over the years.

Declarations

Another matter is whether to group declarations. Unlike many programming languages, Algol 68 allows you to place declarations wherever you wish. This does not mean that you should therefore sprinkle declarations throughout your program, although there is something to be said for declarations being as local as possible. There are also advantages in grouping all your global declarations so that they can be found easily. Generally speaking, it is a good idea to group all global names together (those in the outermost range) and within that grouping, to declare together all names which use the same base mode (for example, group declarations of modes CHAR, []CHAR and STRING). Some of the exercises in this book only declare names when they are immediately followed by related procedures. If your program needs many global names, it makes sense to declare them near the beginning of the program, after mode declarations, so that if subsequent changes are required, you know that all the global name declarations are together and therefore you are unlikely to miss any.

Procedures

The next consideration is breaking your code into procedures. As you analyse the problem, you will find that some of the processing can be specified in a single line which must be analysed further before it can be directly coded. Such a line is a good indication that that process should be written as a procedure. Even a procedure which is used once only is worth writing if the internal logic is more than a couple of conditional clauses, or more than one conditional clause even.

You also have to decide between repeating a procedure in a loop, or placing the loop in the procedure. Deciding the level at which logic should be put in a procedure is largely the product of experience--yours and other people's--another reason for maintaining existing programs.

When you have decided where to use procedures, you should then consider the interface between the procedure and the code that calls it. What parameters should it have, what yield, should you use a united mode for the yield, and so on. Try to have as few parameters as possible, but preferably use parameters rather than assign to names global to the procedure. The design of individual procedures is similar to the design of a complete program.

When you are coding a procedure, be especially careful with compound Boolean formulæ. From experience, this is where most mistakes arise. If you are writing a procedure which manipulates a linked list, draw a diagram of what you are trying to do. That is much easier than trying to picture the structures in your head.

Monetary values

Problems can arise when dealing with money in computer programs because the value stored must be exact. For this reason, it is usually argued that only integers should be used. In fact, real numbers can be used provided that the precision of the mantissa is not exceeded. Real numbers are stored in two parts: the mantissa, which contains the significant digits of the value, and the exponent, which multiplies that value by a power of 2. In other words, using decimal arithmetic, the number 3⋅14159×10-43 has 3⋅14159 as a mantissa and -43 as an exponent. Because real numbers are stored in binary (radix 2), the mantissa is stored as a value in the range 1 ≤ value < 2 with the exponent adjusted appropriately.

There are a number of identifiers declared in the standard prelude, known as environment enquiries, which serve to determine the range and precision of real numbers. The real precision is the number of bits used to store the mantissa, while the value max exp real is the maximum exponent which can be stored for a binary mantissa (not the number of bits, although it is a guide to that number). The real width and exp width say how many decimal digits can be written for the mantissa and the exponent. The values max real and min real are the maximum and minimum real numbers which can be stored in the computer. All these values are specified by the IEEE 754-1985 standard on “Binary Floating-Point Arithmetic” which is implemented by most microprocessors today.

The value of real width is 15 meaning that 15 decimal digits can be stored accurately. Leaving a margin of safety, we can say that an integer with 14 digits can be stored accurately, so that the maximum amount is

99, 999, 999, 999, 999

units. If the unit of currency is divided into smaller units, such as the sterling pound into pence, or the dollar into cents, then the monetary value should be stored in the smaller unit unless it is known that the smaller unit is not required. Thus the greatest sterling amount that can be handled would appear to be £999,999,999,999.99.

However, Algol 68 allows arithmetic values to be stored to a lesser or greater precision. The modes INT, REAL, COMPL and BITS can be preceded by any number of SHORTs or LONGs (but not both). Thus

   LONG LONG LONG REAL r;

is a valid declaration for a name which can refer to an exceptionally precise real. When declaring identifiers of other precisions, denotations of the required precision can be obtained by using a cast with the standard denotation of the value as in

   LONG REAL lr = LONG REAL(1);

One alternative is to use LONG with the denotation:

   LONG REAL lr = LONG 1.0;

Another is to use the LENG operator, which converts a value of mode INT or REAL to a value of the next longer precision, as in

   LONG REAL lr = LENG 1.0;

SHORTEN goes the other way.

   SHORT SHORT INT ssi = SHORTEN SHORTEN 3;

All the arithmetic operators are valid for all the LONG and SHORT modes. Although you can write as many LONGs or SHORTs as you like, any implementation of Algol 68 will provide only a limited number. The number of different precisions available is given by some identifiers in the standard prelude called environment enquiries. They are

The values for complex numbers are the same as those for reals. For integers, where int lengths is greater than 1, long max int and so on are also declared, and similarly for short max int. If int lengths is 1, then only the mode INT is available.

For the a68toc compiler

   int lengths=2
   int shorths=3

Thus it is meaningful to write

   LONG INT long int:=long max int;
   INT int:=max int;
   SHORT INT sh int:=short max int;
   SHORT SHORT INT sh sh int:=
            short short max int;

The same applies to the mode BITS. Try writing a program which prints out the values of the environment enquiries mentioned in this section. The transput procedures get, put, get bin and put bin all handle the available LONG and SHORT modes.

Although you can still write

   LONG LONG INT lli=LONG LONG 3;

the actual value created may not differ from LONG INT depending on the value of int lengths. Note that you cannot transput a value which is not covered by the available lengths/shorths. Use LENG or SHORTEN before trying to transput.

For monetary values, LONG INT is available with the value of long max int being

   9,223,372,036,854,775,807

which should be big enough for most amounts.

Optimisation

There are two well-known rules about optimisation:

  1. Don't do it.
  2. Don't do it now.

However, often there is a great temptation to optimise code, particularly if two procedures are very similar. Using identity declarations is a good form of optimisation because not only do they save some writing, they also lead to more efficient code. However, you should avoid procedure optimisation like the plague because it usually leads to more complicated or obscure code. A good indicator of bad optimisation is the necessity of extra conditional clauses. In general, optimisation is never a primary consideration: you might save a few milliseconds of computer time at the expense of a few hours of programmer time.

Testing and debugging

When writing a program, there is a strong tendency to write hundreds of lines of code and then test it all at once. Resist it. The actual writing of a program rarely occupies more than 30% of the whole development time. If you write your overall logic, test it and it works, you will progress much faster than if you had written the whole program. Once your overall logic works, you can code constituent procedures, gradually refining your test data (see below) so that you are sure your program works at each stage. By the time you complete the writing of your program, most of it should already be working. You can then test it thoroughly. The added advantage of step-wise testing is that you can be sure of exercising more of your code. Your test data will also be simpler.

The idea behind devising test data is not just giving your program correct data to see whether it will produce the desired results. Almost every program is designed to deal with exception conditions. For example, the lf program has to be able to cope with blank lines (usually, zero-length lines) so the test data should contain not one blank line, but also two consecutive blank lines. It also has to be able to cope with extra-long lines, so the test data should contain at least one of those. Programs which check input data for validity need to be tested extensively with erroneous data.

It is particularly important that you test your programs with data designed to exercise boundary conditions. For example, suppose the creation of an output book fails due to a full hard disk. Have you tested it, and does your program terminate sensibly with a meaningful error message? You could try testing your program with the output book being created on a floppy disk which is full.

Sometimes a program will fault with a run-time error such as

   Run time fault (aborting):
   Subscript out of bounds

or errors associated with slicing or trimming multiples. A good way of discovering what has gone wrong is to write a monitor procedure on the lines of

   PROC monitor=(INT a,
                 []UNION(SIMPLOUT,
                         PROC(REF FILE)VOID)r
                 )VOID:
   BEGIN
      print(("*** ",whole(a,0)));
      print(r)
   END

and then call monitor with an identifying number and string at various points in the program. For example, if you think a multiple subscript is suspect, you could write

   monitor(20,("Subscript=",whole(subscript,0)))

By placing monitors at judicious points, you can follow the action of your program. This can be particularly useful for a program that loops unexpectedly: monitors will tell you what has gone wrong. If you need to collect a large amount of monitors, it is best to send the output to a book. The disadvantage of this is that the operating system does not register a book as having a size until it has been closed after creating. This means that if your program creates a monitoring book, writes a large amount of data to it and fails before the book is closed, you will not be able to read any of the contents because, according to most operating systems, there will not be any contents. A way round this problem is to open the book whenever you want to write to it, position the writing position at the end of the book, write your data to it and then close the book. This will ensure that the book will have all the executed monitors (unless, of course, it is a monitor which has caused the program to fail!). The procedure debug given in section 9.9 will do this.

An alternative method of tracing the action of a program at run-time is to use a source-level debugger. The DDD program can help you debug the C source program produced by the a68toc compiler, but unless you understand the C programming language and the output of the a68toc compiler, you will not find it useful. Monitors, although an old-fashioned solution to program debugging, are still the best means of gathering data about program execution.

Another proven method of debugging (the process of removing bugs) is dry-running. This involves acting as though you are the computer and executing a small portion of program accordingly. An example will be given in the analysis of the lf program later.

Sometimes, no matter what you do, it just seems impossible to find out what has gone wrong. There are three ploys you can try. The first, and easiest, is to imagine that you are explaining your program to a friend. The second is to actually explain it to a friend! This finds most errors. Finally, if all else fails, contact the author.

Compilation errors

You can trust the compiler to find grammatical errors in your program if any are there. The compiler will not display an error message for some weird, but legal, construction. If your program is syntactically correct (that is, it is legal according to the rules of the language), then it will parse correctly.

When compiling a program of more than a hundred lines, say, you can use the parsing option (-check) which will more than double the speed of compilation. When your program parses without error, then it is worth doing a straight compilation (see the online documentation for program mm in the a68toc compilation system).

A definitive list of error messages can be found in the file

   algol68toc-1.12/src/message.a68

You will find that most of the messages are easy to understand. Occasionally, you will get a message which seems to make no sense at all. This is usually because the actual error occurs much earlier in your program. By the time the compiler has discovered something wrong, it may well have compiled (or tried to compile) several hundred lines of code. A typical error of this sort is starting a comment and not finishing it, especially if you start the comment with an opening brace ({), which gives rise to the following error message:

   ERROR (112) end of file inside comment or pragmat

If you start a comment with a sharp (#) and forget to finish it likewise, the next time a sharp appears at the beginning of another comment, the compiler will announce all sorts of weird errors.13.1

Another kind of troublesome error is to insert an extra closing parenthesis or END. This can produce lots of spurious errors. For example:

   ERROR (118) FI expected here
                  (at character 48)
   ERROR (203) ELSE not expected here
                  (at character 4)
   ERROR (140) BOOL, INT or UNION required here,
                  not VOID
   ERROR (116) brackets mismatch
                  (at character 2)
   ERROR (159) elements of in-parts
                  must be units
   ERROR (117) FINISH expected here
                  (at character 3)

Omitting a semicolon, or inadvertently inserting one will also cause the appearance of curious error messages. Messages about UNIONs usually mean that you should use a cast to ensure that the compiler knows which mode you mean. If, for example, you have a procedure which expects a multiple of mode

   []UNION(STRING,[]INT)
and you present a parameter like
   ((1,2),(4,2),(0,4))

then the compiler will not know whether the display is a row-display or a structure-display. Either you should precede it with a suitable mode, or modify your procedure to take a single []INT and loop through it in twos. Having to modify your program because the compiler does not like what you have written is rare however.

Arithmetic overflow

Sometimes your program will fail at the time of elaboration or “run-time” due to arithmetic overflow. If, during a calculation, an intermediate result exceeds the capacity of an INT, no indication will be given other than erroneous results.

Overflow of REAL numbers can be detected by the floating-point unit. The standard prelude contains the value fpu cw algol 68 round of mode SHORT BITS and the procedure

   PROC set fpu cw = (SHORT BITS cw)VOID:

The small test program testov (to be found with the a68toc compilation system documentation) illustrates testing for overflow both with integers and real numbers.

Documentation

The most tedious aspect of writing a program is documenting it. Even if you describe what the program is going to do before you write it, but after you have designed it, documentation is not usually a vitally interesting task. Large programming teams often have the services of a technical writer whose job it is to ensure that all program documentation is completed.13.2

Existing programs are usually documented and there is no doubt that the best way of learning to document a program is to see how others have done it. There are several documentation standards in use, although most large companies have their own. Generally speaking, the documentation for a program should contain at least the following

but not necessarily in the order given above. The aim of program documentation is to make it easy to amend the program, or to use it for a subsequent rewrite.

Lastly, it is worthwhile saying “don't be rigid in program design”. If, as you reach the more detailed stages of designing your program, you discover that you have made a mistake in the high-level design, be willing to backtrack and revise it. Design faults are usually attributable to faulty analysis of the problem.

Sian Mountbatten 2012-01-19