Datareign

Comments and Inline Documentation

This page is aimed at working programmers in large organisations but much of it applies to anyone producing code that will be viewed by and maintained by programmers other than themselves.

There is no excuse for poor documentation in any programme. All modern languages provide a mechanism for commenting and it should be used. What little extra effort it takes is more than amply repaid by the time saved in maintenance. It is possible to go further and state that any programmer who fails to provide adequate documentation has not met his obligation to provide a finished, usable product.

There are three types of comment that can be identified: production headers, unit descriptions and code explanations. We'll go through each in turn. First though, let's consider general style.

Before we start

Comments are no good if they're not visible and clearly delineated from the code they describe. For this reason, I recommend placing all comments between a top and bottom line, as in…

 // ------------------------------------------------
 // This comment describes the code which follows...
 // ------------------------------------------------

The length of lines is important as legibility is paramount. Restricting comments to 80 columns ensures that they'll be correctly formated on A4 paper printed in portrait mode (short side at the top). Using 132 columns gives more space but requires that listings are printed in landscape mode (long side at the top). Even today, programmers are often issued with relatively small screens so settling on a shorter line length is easier for viewing on screen as well.

Production headers

These should be present on every file and provide enough information to enable a competent programmer to understand the purpose of the file at a glance. They should also contain a clear history of changes so that the code may be wound back to a given point….

 // ============================================================================
 // Filename: CustValSupLib.c
 //   Author: Harvey Platter (hp@blah.com)
 //    Owner: James Smith (js@blah.com)
 //  Updated: 23-Oct-2004
 // Function: All customer validation routines for the sales tracking system.
 //    Notes: Any other project using this file should contact the owner for
 //           inclusion on the change notification list.
 // ----------------------------------------------------------------------------
 // History...
 // ----------------------------------------------------------------------------
 // DATE   WHO  CHANGE
 // ----------------------------------------------------------------------------
 // 041023 TLP Added support for HTTP transfers as per CRST/34/03
 // 040605 JMS Changed functionality of GetCustomerID to reflect updated
 //            customer cross ref. CRST/29/01
 // 031211 HJP Amended all references to customer representative to 60 characters
 //            as per CRST/17/01
 // =============================================================================

The first thing to note about this example is that it gives just enough information and no more. The example is for a (fictitious) library file so there is no point in a description of functionality at this level. If this header belonged to a programme file, there would be a strong argument for the 'Function' entry providing a reasonably detailed description of what the programme does and why.

We list both an author and an owner. In many cases the two will not be the same. The author is the person who originally produced the code while the owner is the person who is currently responsible for it.

The Notes area is for things which apply to the whole file, as in the example shown. It should not be used for information which properly belongs with a particular piece of code.

The history is in reverse order so that the latest change is immediately visible. In any properly run shop, no change will ever be made without authorisation and the appropriate document should be quoted as part of the entry, thus tying the two together.

The last thing to note is that we enclose the heading in double lines (equals signs) in order to make it clear that this is an especially important group of comments. We'll see in a moment that we use this same technique for unit descriptions.

Unit descriptions

These identify sections of code such as procedures or functions. Their most important purpose is to describe what the code is supposed to do, not necessarily the same as what it actually does, at least, before testing.

 # ==============================================================================
 # Removes a sales order if the customer has been placed on hold. Will NOT remove
 # the order if the customer is still active, in which case it will write an
 # exception to the log.
 # ------------------------------------------------------------------------------
 # $_[0] the customer ID
 # $_[1] the sales order number
 # $_[2] the operator's work code
 # $_[3] the authorisation code for the deletion.
 # ------------------------------------------------------------------------------
 # Returns the new ID of the deleted sales order.
 # ------------------------------------------------------------------------------
 # Note that the sales order is not actually deleted from the work file but the
 # ID is given a 'DEL' prefix. Downstream functionality is expected to cleanse 
 # the work file.
 # ==============================================================================

This example is based on a format used for Perl programmes. The four sections shown are appropriate for many languages but are not the only way of providing clear documentation.

The first section is the functional description referred to earlier. This needs to be as concise as possible while including everything which a newcomer might not already know.

The second section describes the parameters for this function. As this is Perl, they're not named but for clarity's sake the parameter array element of each item is explicitly named. Once more, we're not leaving anything to chance.

The third section tells us what this function will return. Again, being Perl, we don't have to state an explicit data type but for many other languages such a definition would be entirely appropriate. If nothing is specified as being returned it is important that this is declared here.

The last section is for notes which are not appropriate in any of the previous sections. Deciding what goes here has to be done on a case by case basis. In some, perhaps many, cases it will be blank.

Note that we don't have a history for each procedure. This is a stylistic choice and some sites may take the view that history should be included for each procedure or function.

If the programmer has problems with writing a clear description, there's absolutely nothing wrong with either getting a more literate colleague to write the description or even cutting and pasting from the functional specification where appropriate. The important thing is that the description is clear and thoroughly appropriate so that anyone coming fresh to the project can swiftly pick up the reigns and carry on.

Code explanations

These are what most programmers immediately think of when the word 'comment' is uttered. They may be as short as two or three words or several paragraphs long. Their purpose is to clarify the function of the code.

Now here's a thing: the better written the code, the fewer the explanations that are required. If you've used good names for all the elements in your programme (variables, constants, procedures, etc) and laid out the code in a clear and readable manner, then you may need no code explanations at all because the code is self documented. If you feel you need a lot of explanations, then it's your code that may be at fault.

This isn't an infallible rule. Take the following snippet…

 # ----------------------------------------------------------
 # This must be a data line, read it in to the local array...
 # ----------------------------------------------------------
 for( $V_THIS_ELEMENT = 1;
      $V_THIS_ELEMENT <= $G_TOTAL_DEFS;
      $V_THIS_ELEMENT++ )
 {
 (etc...)

Without the description, all you'd know was that this was stepping through a list. Adding the explanation makes it all much clearer.

There's an art to documenting a programme file properly but it's an art worth learning. It makes the finished product much more useful because other people can, all the more readily, take on the code and amend it as required.

Good documentation is one of the signs of a professional coder and poor documentation speaks of carelessness and a lack of respect for colleagues.

Programme Naming Conventions

The first rule of naming is that names should add to the reader's understanding rather than obscuring it. For example, The name of a function that returns an element of an array should be something like GetNextCustomer() rather than GetNextArrayElement().

General points about naming

  • If you have difficulty in finding an appropriate name for a function, procedure or object then you may need to further analyse or define the purpose of that item.
  • Names should be long enough to be meaningful but no longer.
  • Remember that names have no meaning to the compiler/interpreter other than to distinguish one item from another.

Rules you may wish to follow

  • Use mixed case to aid readability. Capitalise the first letter of each word in a function or procedure name and the same for variable names but with the very first character in lower case. Thus a procedure might be called SortNewCustomers while a variable migh be called customerName.
  • Use all capitals with words seperated by underlines for constants as in TOTAL_SHOPS or SALES_BAND_A.
  • Don't use names that are ambiguous, such as AnalyzeThis() for a routine, or A47 for a variable.
  • Don't include class names in the names you give to class properties, such as Customer.CustomerID. Instead, use Customer.ID.
  • If a routine performs an operation on a given object, include the object name in the routine's name, as in ValidateCustomerID().
  • In languages that permit function overloading, all overloads should perform a similar function.
  • For those languages that do not permit function overloading, establish a naming standard that groups similar functions.
  • Never overload a function in such a way that it performs a totally different process than its predecessors.
  • Where a variable holds the result of a computation, add the type of computation to the end of the variable name as in salesAvg, customerSum, temperatureMin, temperatureMax, customerIndex) and so on.
  • When producing complementary pairs of variables, name them appropriately, e.g. min/max, begin/end or open/close.
  • If a variable is Boolean i.e. it contains only Yes/No or True/False values, use a name which highlights this such as fileIsFound or customerIsValid.
  • Avoid the lazy use of terms like 'Flag' when naming status variables. Instead of documentFlag, use a more descriptive name such as documentFound.
  • Always avoid single character variable names in loops and so on. “for ThisShop = 1 to TOTAL_SHOPS” is far more informative than “for i = 1 to 23”
  • Never use literals instead of constants. If your language does not support constants, use variables named in upper case as suggested above.
  • Always assign values to constants in a single place at the top of the programme, for the good and sufficient reason that constants need changing surprisingly often!

Points that apply particularly to database handling

  • When naming tables, express the name in the singular form. For example, use Employee instead of Employees.
  • When naming columns of tables do not repeat the table name; for example, avoid a field called EmployeeLastName in a table called Employee.
  • Do not incorporate the data type in the name of a column. This will reduce the amount of work should it become necessary to change the data type later.
  • Do not prefix stored procedures with sp, which is generally a prefix reserved for identifying system stored procedures.
  • Do not prefix user-defined functions with fn_, which is generally a prefix reserved for identifying built-in functions.
  • Do not prefix extended stored procedures with xp_, which is generally a prefix reserved for identifying system extended stored procedures.

Other points you may wish to consider

  • Don't abbreviate unless you must. If you do, use those abbreviations that you have created consistently. Any abbreviation should have only one meaning and likewise, each abbreviated word should have only one abbreviation. For example, if you use min to abbreviate minimum, do so everywhere and do not use min to also abbreviate minute.
  • When naming functions, include a description of the value being returned, such as GetCustomerName().
  • File and path names should also accurately describe their purpose such as SalesTransferFile or SalesFileDirectory.
  • Don't use easily confused names for multiple items. Especially avoid confusions such as a routine called ProcessSales() and a variable loaded from it called iProcessSales.
  • Try to avoid words that sound the same (homonyms), such as 'write' and 'right'. They can lead to all sorts of confusion when several people work on a project.
  • Use the appropriate spelling for your country such as color/colour or check/cheque.
  • It's no longer considered valid to use character type identifiers, such as $ for strings or % for integers as various languages, such as Perl, reserve these symbols.
Last modified: 2009/01/18 22:35