prolog-mpi: a distributed prolog environment
These serve to crudely illustrate the overhead of prolog-mpi as regards its distribution functionality (see note). Benchmark/profile utilities are distributed with the system and may be generated on any host (click for a larger picture).
The time-stamping algorithm is roughly as follows:
These benchmarks were generated by generating traces from test routines and processing them:
$ mpirun -np 10 pl-mpi-test -tur 500 $ pl-mpi-gdtime -qf output.png dump-*
Note: these are not appropriate measurements for estimating run-time. The testing environment is a symmetric multiprocessor; all internode communication occurs over the loop-back interface.
Due to the extremely fast (subsecond) rate of distribution, compute completion, and acknowledgement, these values represent an upper bound in the rate of distribution, a fact somewhat mediated by running on a local symmetric multiprocessor (the control node must accept bursts of responses).
At this time, data transfer between nodes is a rather heavyweight operation: future versions of prolog-mpi will have optimised transfer sequences. The primary bottle-neck is within the control node's system of responding to events, which is sequential, i.e., once an completion event is received from a compute node, the control node blocks until all data is received. This bottle-neck may be removed by maintaining a series of state machines defining compute node response sequences. This removal will come at the cost of algorithm simplicity; analysis is underway as to whether this provides a significant benefit to prolog-mpi instead of unnecessarily complicating the scheduling mechanisms.