prolog-mpi: a distributed prolog environment

Documentation (0.2.x)

Please note that a bug in the documentation is considered a bug in the system overall. Documentation bugs include missing or incomplete data; please contact Kristaps with any errata.

 

Manuals

There are several utilities bundled with the prolog-mpi system. Each comes with a Unix manual distributed during installation. Be sure to read these manuals carefully before operating any component.

 

Installation

Most of the complexity of installing prolog-mpi is in the MPI installation itself. Beyond that, the system follows standard compilation and installation conventions. This document assumes that you have an MPI deployment installed and working, and that further, the file-system in which the pl-mpi(1) and pl-mpi-test(1) binaries reside is accessible to all participating nodes. Future versions of prolog-mpi will be considerably more friendly in this regard.

Tested MPI implementations (please notify us of additions):

Tested operating systems (please notify us of additions):

To compile the binary, you'll need make (BSD or GNU), an MPI C compiler (hcc or mpicc), and a working installation of Prolog (must be SWI Prolog) with a usable plld linker. Edit the Makefile in the top-level directory of the distribution with the locations and/or names of these files, if the defaults are not correct. Compile the system with make, and (optionally) install it with make install. At this point, you're ready to run the system.

Note that the system should compile without warnings, although on i386 you may get the following warning (it doesn't cause execution problems):

src/master.c: In function `pl_adist':
src/master.c:487: warning: cast from pointer to integer of different size

The Makefile may be edited in several ways. You'll probably want to modify the file to suit your MPI installation needs. If you wish to compile in Prolog scripts, edit the PFILES variable. This is equivalent to interactively running mpiload/1 for each linked file.

 

Operation

A sample session may be the most instructive. Note that extraneous output has been snipped.

$ cat test.pl
testp(A, B) :-
    B is A + 2.
$
$ mpirun -np 20 pl-mpi
1 ?- mpiload('test.pl').
2 ?- mpidist(testp, [1, 2], RES, YESN, INDEX).
RES = 3
YESN = 1
INDEX = 0 ;

RES = 4
YESN = 1
INDEX = 1 ;

No
3 ?- halt.
$

The mpirun utility will execute pl-mpi on 20 nodes. Note that one must execute on at least three nodes: one control, one master, and one slave. Node location is not important; note also that the master the control take minimal processing power, and may safely piggy-back on physical slave nodes. For systems with a very high rate of communication (small, constant jobs), a dedicated I/O signal line may be appropriate (physical machine, or distinct interface) but I have no benchmarks on this.

You'll probably want to read the manual page. If you ran make install, it should be installed under pl-mpi(1).

 

Interface

  • mpiload/1: mpiload(+String)

    This is a distributed version of consult/1. The term String must point to a file available on all hosts at the same location. Without using this predicate, one is restricted only to built-in predicates for mpidist/5. It returns "Yes" on success, "No" on failure.

    The mpiload/1 predicate only returns when all participating nodes (including the master node) indicate success. If the operations fails (one or more participating nodes failed to load the file), the system should be exited and re-started. This usually happens due to errors in file-sharing in, for example, NFS. Example:

    1 ?- mpiload('test.pl').
    
  • mpidist/5: mpidist(+Pred, +List, -Result, -Yesno, -Index)

    This synchronously distributes a predicate request and waits for completed sections. Pred is the predicate to call on all nodes (probably defined from mpiload/1). List is the input list (see Types for available types) (lists may contain a mixture of types, so long as the called predicate understands how to handle the types). Elements of this list are fed into Pred on each processing node. Result will be filled with the result of Pred as fed with List at the index of Index (see Types for available types). However, one should first check Yesno, which indicates whether the predicate returned with success or did not return a result at all.

    The mpidist/5 predicate operates in backtracking. Results are returned as soon as they're received; to see more results, one must backtrack. The number of backtrack results is equivalent to the size of List; once all possible results have been exhausted, mpidist/5 returns false. Example:

    1 ?- mpidist(test, ['hello', 'world', 1, 2, 3, 15.3], RES, YESNO, INDEX).
    
  • ampidist/3: ampidist(+Pred, +List, -Qid)

    This begins an asynchronous distribution sequence. See notes on scheduling, below. Pred and List follow the specifications set forth in mpidist/5. Qid is an identifier for this particular asynchronous query. See Types for available types. Example:

    1 ?- ampidist(test, ['hello', 'world', 1, 2, 3, 15.3], QID).
    
  • ampiquery/2: ampiquery(+Qid, -Results)

    Checks if results are available for a query begun with ampidist/3. The number of waiting results (which may be zero) is stored in Results. Note that a "waiting result" is one that may be fetched with ampiget/4 without blocking. See Types for available types. Example:

    1 ?- ampiquery(QID, RES).
    
  • ampiget/4: ampiget(+Qid, -Result, -Yesno, -Index)

    Fetches the next available result. See mpidist/5 for a description of Result, Yesno, and Index. Blocks if no results are available. When all results are exhausted, ampiget/3 returns false. See Types for a list of available types. Example:

    1 ?- ampiclose(QID).
    
  • ampiclose/1: ampiclose(+Qid)

    Closes out a query begun with ampidist/3. No more results will be available after calling this predicate, and ampi predicates on the Qid will fail with undefined behaviour. See Types for a list of available types. Example:

    1 ?- ampiclose(QID).
    

Caveats: signals. Do not use signals during operation. While sending an interrupt (control-C) to a Prolog system triggers a trace, it will usually kill the MPI group (this behaviour differs with MPI implementations). This has the added ill effect of polluting the MPI channels with crufty data.

 

Scheduling

The control node in a pl-mpi run-time maintains a scheduling queue for all pending jobs. As jobs are entered into the system via mpidist/5 or ampidist/3, they are enqueued according to a distribution scheduling algorithm.

When a job is enqueued for distribution, it is interleaved with existing jobs in a round-robin fashion. In other words, the initial set of jobs is an empty set S = {0}. When a particular job X is added, its contents x0 - xn are added to S yielding {x0, x1, ..., xn}. If a concurrent job Y is added, its elements are fairly interleaved into S yielding {x0, y0, x1, y1, ..., xn, yn}. Thus if several jobs are added with ampidist/3, then a synchronous job with mpidist/5, the synchronous job will have to wait for competing asynchronous jobs to complete as well.

 

Types

Data passed through prolog-mpi may have limited data types. Please note that there is a difference between Prolog's stated type system ("single data type is a term") and the internal representation. In short, data passed through to the prolog-mpi system must be assigned a type. At this time, atomic data types (float, integer, atom) are supported. Term types are not supported. Supported:

  • 1, -2, ... (integers, or "int")
  • -32.1221, +1.332E100, ... (floats, or "double")
  • 'hello', 'world', ... (atoms, or "char *")

Not supported:

  • predicate1('hello'), predicate2(235, 523), ... (terms)
  • [1, 2, 3], ["hello", "world"], ... (lists)
  • "hello", "world", ... (strings (really terms))
  • a, b, ... (bound variables)
  • A, B, ... (un-bound variables)

If these types are inputted to (or outputted from) prolog-mpi predicates, the system will abort and coredump immediately. Type detection occurs dynamically at the point of entry or when results are first received.

 

Architecture

This section is being fully re-written.

 

$Id: docs.html,v 1.14 2007-01-24 12:16:08 kristaps Exp $