pbs_mom

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
CONFIGURATION FILE
FILES
Signal Handling
EXIT STATUS
SEE ALSO

NAME

pbs_mom - start a pbs batch execution mini-server

SYNOPSIS

pbs_mom [-C chkdirectory] [-c config] [-d directory] [-L logfile] [-M MOMport] [-R RPPport] [-p|-r] [-x]

DESCRIPTION

The pbs_mom command starts the operation of a batch Machine Oriented Mini-server, MOM, on the local host. Typically, this command will be in a local boot file such as /etc/rc.local . To insure that the pbs_mom command is not runnable by the general user community, the server will only execute if its real and effective uid is zero.
One function of pbs_mom is to place jobs into execution as directed by the server, establish resource usage limits, monitor the job's usage, and notify the server when the job completes. If they exist, pbs_mom will execute a prologue script before executing a job and an epilogue script after executing the job. The next function of pbs_mom is to respond to resource monitor requests. This was done by a separate process in previous versions of PBS but has now been combined into one process. The resource monitor function is provided mainly for the PBS scheduler. It provides information about the status of running jobs, memory available etc. The next function of pbs_mom is to respond to task manager requests. This involves communicating with running tasks over a tcp socket as well as communicating with other MOMs within a job (aka a "sisterhood").
Pbs_mom will record a diagnostic message in a log file for any error occurrence. The log files are maintained in the mom_logs directory below the home directory of the server. If the log file cannot be opened, the diagnostic message is written to the system console.

OPTIONS

-C chkdirectory
Specifieds the path of the directory used to hold checkpoint files. [Currently this is only valid on Cray systems.] The default directory is PBS_HOME/spool/checkpoint, see the -d option. The directory specified with the -C option must be owned by root and accessible (rwx) only by root to protect the security of the checkpoint files.
-c config
Specify a alternative configuration file, see description below. If this is a relative file name it will be relative to PBS_HOME/mom_priv, see the -d option. If the specified file cannot be opened, pbs_mom will abort. If the -c option is not supplied, pbs_mom will attempt to open the default configuration file "config" in PBS_HOME/mom_priv. If this file is not present, pbs_mom will log the fact and continue.
-d directory
Specifies the path of the directory which is the home of the servers working files, PBS_HOME. This option is typically used along with -M when debugging MOM. The default directory is given by $PBS_SERVER_HOME which is typically /usr/spool/PBS.
-L logfile
Specify an absolute path name for use as the log file. If not specified, MOM will open a file named for the current date in the PBS_HOME/mom_logs directory, see the -d option.
-M port Specifies the port number on which the mini-server (MOM) will listen for batch requests.
-R port Specifies the port number on which the mini-server (MOM) will listen for resource monitor requests, task manager requests and inter-MOM messages. Both a UDP and a TCP port of this number will be used.
-p Specifies the impact on jobs which were in execution when the mini-server shut down. On any restart of MOM, the new mini-server will not be the parent of any running jobs, MOM has lost control of her offspring (not a new situation for a mother). With the -p option, Mom will allow the jobs to continue to run and monitor them indirectly via polling. The -p option is mutually exclusive with the -r option.
-r Specifies the impact on jobs which were in execution when the mini-server shut down. With the -r option, MOM will kill any processes belonging to jobs, mark the jobs as terminated, and notify the batch server which owns the job. The -r option is mutual exclusive with the -p option.
Normally the mini-server is started from the system boot file without the -p or the -r option. The mini-server will make no attempt to signal the former session of any job which may have been running when the mini-server terminated. It is assumed that on reboot, all processes have been killed.
If the -r option is used following a reboot, process IDs (pids) may be reused and MOM may kill a process that is not a batch session.
-a alarm Used to specify the alarm timeout in seconds for computing a resource. Every time a resource request is processed, an alarm is set for the given amount of time. If the request has not completed before the given time, an alarm signal is generated. The default is 5 seconds.
-x Disables the check for privileged port resource monitor connections. This is used mainly for testing since the privileged port is the only mechanism used to prevent any ordinary user from connecting.

CONFIGURATION FILE

The configuration file may be specified on the command line at program start with the -c flag. The use of this file is to provide several types of run time information to pbs_mom: static resource names and values, external resources provided by a program to be run on request via a shell escape, and values to pass to internal set up functions at initialization (and re-initialization).
Each item type is on a single line with the component parts separated by white space. If the line starts with a hash mark (pound sign, #), the line is considered to be a comment and is skipped.
Static Resources
For static resource names and values, the configuration file contains a list of resource names/values pairs, one pair per line and separated by white space. An Example of static resource names and values could be the number of tape drives of different types and could be specified by
tape3480 4
tape3420 2
tapedat 1
tape8mm 1
Shell Commands
If the first character of the value is an exclamation mark (!), the entire rest of the line is saved to be executed through the services of the system(3) standard library routine.
The shell escape provides a means for the resource monitor to yield arbitrary information to the scheduler. Parameter substitution is done such that the value of any qualifier sent with the query, as explained below, replaces a token with a percent sign (%) followed by the name of the qualifier. For example, here is a configuration file line which gives a resource name of "escape":
escape !echo %xxx %yyy
If a query for "escape" is sent with no qualifiers, the command executed would be "echo %xxx %yyy". If one qualifier is sent, "escape[xxx=hi there]", the command executed would be "echo hi there %yyy". If two qualifiers are sent, "escape[xxx=hi][yyy=there]", the command executed would be "echo hi there". If a qualifier is sent with no matching token in the command line, "escape[zzz=snafu]", an error is reported.
Initialization Value
An initialization value directive has a name which starts with a dollar sign ($) and must be known to MOM via an internal table. The entries in this table now are:
clienthost
which causes a host name to be added to the list of hosts which will be allowed to connect to MOM as long as they are using a privilaged port. For example, here are two configuration file lines which will allow the hosts "fred" and "wilma" to connect:
$clienthost fred
$clienthost wilma
Two host name are always allowed to connection to pbs_mom, "localhost" and the name returned to pbs_mom by the system call gethostname(). These names need not be specified in the configuration file. The hosts listed as "clienthosts" comprise a "sisterhood" of machines. Any one of the sisterhood will accept connections from a server from within the sisterhood. They will also accept Resource Monitor (RM) requests and Internal MOM (IM) messages from within the sisterhood. For a sisterhood to be able to communicate IM messages to each other, they must all share the same RM port.
restricted
which causes a host name to be added to the list of hosts which will be allowed to connect to MOM without needing to use a privilaged port. These names allow for wildcard matching. For example, here is a configuration file line which will allow queries from any host from the domain "ibm.com".
$restricted *.ibm.com
The restriction which applies to these connections is that only internal queries may be made. No resources from a config file will be found. This is to prevent any shell commands from being run by a non-root process.
logevent
which sets the mask that determines which event types are logged by pbs_mom. For example:
$logevent 0x1fff
$logevent 255
The first example would set the log event mask to 0x1ff (511) which enables logging of all events including debug events. The second example would set the mask to 0x0ff (255) which enables all events except debug events.
cputmult
which sets a factor used to adjust cpu time used by a job. This is provided to allow adjustment of time charged and limits enforced where the job might run on systems with different cpu performance. If Mom's system is faster than the reference system, set cputmult to a decimal value greater than 1.0. If Mom's system is slower, set cputmult to a value between 1.0 and 0.0. For example:
$cputmult 1.5
$cputmult 0.75
wallmult
which sets a factor used to adjust wall time usage by to job to a common reference system. The factor is used for walltime calculations and limits the same as cputmult is used for cpu time.
The configuration file must be "secure". It must be owned by a user id and group id less than 10 and not be world writtable.

FILES

$PBS_SERVER_HOME/mom_priv
the default directory for configuration files, typical (/usr/spool/pbs)/mom_priv.
$PBS_SERVER_HOME/mom_logs
directory for log files recorded by the server.
$PBS_SERVER_HOME/mom_priv/prologue
the administrative script to be run before job execution.
$PBS_SERVER_HOME/mom_priv/eiplogue
the administrative script to be run after job execution.

Signal Handling

Pbs_mom handles the following signals:
SIGHUP
causes pbs_mom to re-read its configuration file, close and reopen the log file, and reinitialize resource structures.
SIGALRM
results in a log file entry. The signal is used to limit the time taken by certain children processes, such as the prologue and epilogue.
SIGINT and SIGTERM
Result in pbs_mom terminating all running children and exiting. This is the action for the following signals as well: SIGXCPU, SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
SIGPIPE, SIGUSR1, SIGUSR2, SIGINFO
are ignored.
All other signals have their default behavior installed.

EXIT STATUS

If the mini-server command fails to begin operation, the server exits with a value greater than zero.

SEE ALSO

pbs_server(8B) , pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the PBS External Reference Specification, and the PBS Administrator's Guide.