Datareign
Table of Contents

Sort components possess a parameter called 'max-core', which controls how much memory is allocated to the sort. There's a certain amount of confusion about setting this variable's value but it is, in essence, subject to some simple rules.

Real Memory

If you are fairly sure how many records the sort will handle and that number is significantly smaller than available real memory, then you can set max-core according to the simple formula number of bytes in key * number of records. This will give you the fastest performance, because the sort will be carried out in memory. Thus, run time will only be subject to the machine's hardware performance and the influence of other programmes running concurrently with your graph.

Swapping

If the number of records and size of the keys is such that the sort cannot be handled in memory, the problem becomes one of deciding which approach to take. Do you use the maximum amount of memory to sort in the largest 'chunks', or do you use a smaller memory allocation, in order to reduce the amount of system swapping, as opposed to file read/write by the Co>Processor?

Once again, if the graph is going to be the main or only non-system programme running on the machine, it is normal to go for the maximum amount of real memory. This will get the sort done in the shortest time.

On the other hand, if the graph is going to be run on a machine that is also running other production work simultaneously, then a smaller value of max-core will lead to Ab Initio reading and writing more frequently. However, by preventing excessive page swapping, this will give the 'least bad' overall performance. It is, in general, a truism that forcing the system to 'hit' the swap file hard will degrade overall performance more quickly and unpleasantly, than allowing increased normal disk access.

In practice

It is often necessary to test the graph with realistic volumes of data, on a machine which simulates the expected system load in production.

The normal approach is to set up a series of shell variables in the graph's Start Script or in its parameters, which hold the maximum value you can give to max-core and a series of fractions thereof. These are usually called $MAX_CORE, $HALF_MAX_CORE, $QUARTER_MAX_CORE, etc. Then the appropriate value is used to set max-core in the sort's parameters.

Where a graph contains several sort components, the above scheme allows you to assign the appropriate value to each sort and have the ability to make quick global changes without altering the individual components.

Last modified: 2009/01/21 17:57