pbs_scheduler_basl
NAME
SYNOPSIS
DESCRIPTION
OPTIONS
USAGE
SCHEDULING LANGUAGE
CONFIGURATION FILE
FILES
Signal Handling
EXIT STATUS
SEE ALSO
NAME
|
pbs_sched_basl - pbs BASL scheduler |
SYNOPSIS
|
pbs_sched [-d home] [-L logfile] [-p print_file] [-a alarm]
[-S port] [-c configfile] |
DESCRIPTION
|
The pbs_sched command starts the operation of a batch
scheduler on the local host. It runs in conjunction with the
PBS server. It queries the server about the state of PBS and
communicates with pbs_mom to get information about
the status of running jobs, memory available etc. It then
makes decisions as to what jobs to run. |
|
Typically, this command will be in a local boot file such as
/etc/rc.local . |
|
pbs_sched must be executed with root
permission. |
OPTIONS
|
Specifies the name of the PBS home directory, PBS_HOME. If
not specified, the value of $PBS_SERVER_HOME as defined at
compile time is used. Also see the -L option. |
|
Specifies an absolute path name of the file to use as the
log file. If not specified, the scheduler will open a file
named for the current date in the PBS_HOME/sched_logs
directory. See the -d option. |
|
This specifies the "print" file. Any output from
the scheduler code which is written to standard out or
standard error will be written to this file. If this option
is not given, the file used will be
$PBS_HOME/sched_priv/sched_out. See the -d
option. |
|
This specifies the time in seconds to wait for a schedule
run to finish. If a scheduling iteration takes too long to
finish, an alarm signal is sent, and the scheduler is
restarted. If a core file does not exist in the current
directory, abort() is called and a core file is generated.
The default for alarm is 180 seconds. |
|
Specifies a port on which to talk to the server. This option
is not required. It merely overides the default PBS
scheduler port. |
|
Specify a configuration file, see description below. If this
is a relative file name it will be relative to
PBS_HOME/sched_priv, see the -d option. If the -c option is
not supplied, pbs_sched will not attempt to open a
configuration file. In BASL, this config file is almost
always needed because it is where the list of servers,
nodes, and host resource queries are specified by the
administrator. |
USAGE
|
This version of the scheduler requires knowledge of the BASL
language. The site must first write a function called
sched_main() (and all functions supporting it) using
BASL constructs, and then translate the functions into C
using the BASL compiler basl2c , which would also
attach a main program to the resulting code. This main
program performs general initialization and housekeeping
chores such as setting up local socket to communicate with
the server running on the same machine, cd-ing to the priv
directory, opening log files, opening configuration file (if
any), setting up locks, forking the child to become a
daemon, initializing a scheduling cycle (i.e. get node
attributes that are static in nature), setting up the signal
handlers, executing global initialization assignment
statements specified by the scheduler writer, and finally
sitting on a loop waiting for a scheduling command from the
server. When the server sends the scheduler an appropriate
scheduling command {SCH_SCHEDULE_NEW,
SCH_SCHEDULE_TERM, SCH_SCHEDULE_TIME, SCH_SCHEDULE_RECYC,
SCH_SCHEDULE_CMD, SCH_SCHEDULE_FIRST} ,
information about server(s), jobs, queues, and execution
host(s) are obtained, and then sched_main() is
called. |
SCHEDULING LANGUAGE
|
The BAtch Scheduling Language (BASL) is a C-like procedural
language. It provides a number of constructs and predefined
functions that facilitate dealing with scheduling issues.
Information about a PBS server, the queues that it owns,
jobs residing on each queue, and the computational nodes
where jobs can be run, are accessed via the BASL data types
Server, Que, Job, CNode, Set Server, Set Que, Set Job, and
Set CNode. |
|
The following simple sched_main() will cause the server to
run all queued jobs on the local server: |
|
sched_main()
{
Server s;
Que q;
Job j;
Set Que queues;
Set Job jobs;
s = AllServersLocalHostGet(); // get local server
queues = ServerQueuesGet(s);
foreach( q in queues ) {
jobs = QueJobsGet(q);
foreach( j in jobs ) {
JobAction(j, SYNCRUN, NULLSTR);
}
}
}
|
|
For a more complete discussion of the Batch Scheduler
Language, see basl2c(1B) . |
CONFIGURATION FILE
|
A configuration file may be specified with the -c option.
This file is used to specify the (1) hosts which are allowed
to connect to pbs_sched, (2) the list of server hosts for
which the scheduler writer wishes the system to periodically
check for status, queues, and jobs info, (3) list of
execution hosts for which the scheduler writer wants the
system to periodically check for information like state,
property, and so on, and (4) various queries to send to each
execution host. |
|
(1) specifying client hosts: |
|
The hosts allowed to connect to pbs_sched are specified in
the configuration file in a manner identical to that used in
pbs_mom. There is one line per host using the
syntax: |
|
where clienthost and hostname are separated by
white space. Two host names are always allowed to connection
to pbs_sched: "localhost" and the name returned to
pbs_sched by the system call gethostname(). These names need
not be specified in the configuration file. |
|
(2) specifying list of servers: |
|
The list of servers is specified in a one host per line
manner, using the syntax: |
|
$serverhost hostname port_number
or where $server_host, hostname, and
port_number are separated by white
space. |
|
If port_number is 0, then the default PBS server port
will be used. |
|
Regardless of what has been specified in the file, the list
of servers will always include the local server - one
running on the same host where the scheduler is
running. |
|
Within the BASL code, access to data of the list of servers
is done by calling AllServersGet(), or
AllServersLocalHostGet() which returns the local
server on the list. |
|
(3) specifying the list of execution hosts: |
|
The list of execution hosts (nodes), whose MOMs are to be
queried from the scheduler, is specified in a one host per
line manner, using the syntax: |
|
$momhost hostname port_number |
|
where $momhost, hostname, and port_number are
separated by white space. |
|
If port_number is 0, then the default PBS MOM port
will be used. |
|
The BASL function AllNodesGet() , or
ServerNodesGet(AllServersLocalHostGet()) is available
for getting the list of nodes known to the local
system. |
|
(4) specifying the list of host resources: |
|
For specifying the list of host resource queries to send to
each execution host's MOM, the following syntax is
used: |
|
$node node_name CNode..Get host_resource |
|
node_name should be the same hostname string that was
specified in a $momhost line. A node_name
value of "*" (wildcard) means to match any
node. |
|
Please consult section 9 of the PBS ERS (Resource
Monitor/Resources) for a list of possible values to
host_resource parameter. |
|
CNode..Get refers to the actual function name that is
called from the scheduler code to obtain the return values
to host resource queries. The list of CNode..Get
function names that can appear in the configuration file
are: |
STATIC:
================================
CNodePropertiesGet
CNodeVendorGet
CNodeNumCpusGet
CNodeOsGet
CNodeMemTotalGet[type]
CNodeNetworkBwGet[type]
CNodeSwapSpaceTotalGet[name]
CNodeDiskSpaceTotalGet[name]
CNodeDiskInBwGet[name]
CNodeDiskOutBwGet[name]
CNodeTapeSpaceTotalGet[name]
CNodeTapeInBwGet[name]
CNodeTapeOutBwGet[name]
CNodeSrfsSpaceTotalGet[name]
CNodeSrfsInBwGet[name]
CNodeSrfsOutBwGet[name]
DYNAMIC:
================================
CNodeIdletimeGet
CNodeLoadAveGet
CNodeMemAvailGet[type]
CNodeSwapSpaceAvailGet[name]
CNodeSwapInBwGet[name]
CNodeSwapOutBwGet[name]
CNodeDiskSpaceReservedGet[name]
CNodeDiskSpaceAvailGet[name]
CNodeTapeSpaceAvailGet[name]
CNodeSrfsSpaceReservedGet[name]
CNodeSrfsSpaceAvailGet[name]
CNodeCpuPercentIdleGet
CNodeCpuPercentSysGet
CNodeCpuPercentUserGet
CNodeCpuPercentGuestGet
STATIC function names return values that are obtained
only during the first scheduling cycle, or when the
scheduler is instructed to reconfig; whereas, DYNAMIC
function names return attribute values that are taken at
every subsequent scheduling cycle.
name and type are arbitrarily defined. For
example, you can choose to have name defined as
"$FASTDIR" for the CNodeSrfs* calls, and a sample
configuration file entry would look like:
$node unicos8 CNodeSrfsSpaceAvailGet[$FASTDIR]
quota[type=ares_avail,dir=$FASTDIR]
So in a BASL code, if you call
CNodeSrfsSpaceAvailGet(node, "$FASTDIR"), then it
will return the value to the query
"quota[type=ares_avail,dir=$FASTDIR]" (3rd
parameter) as sent to the node's MOM.
By default, the scheduler has already internally defined
the following mappings, which can be overriden in the
configuration file:
keyword node_name CNode..Get host_resource
======= ========= ================ =============
$node * CNodeOsGet arch
$node * CNodeLoadAveGet loadave
$node * CNodeIdletimeGet idletime
The above means that for all declared nodes (via
$momhost), the host queries arch, loadave, and
idletime will be sent to each node's MOM. The value
to arch is obtained internally by the system during
the first scheduling cycle because it falls under STATIC
category, while values to loadave and idletime
are taken at every scheduling iteration because they fall
under the DYNAMIC category. Access to the return values is
done by calling CNodeOsGet(node),
CNodeLoadAveGet(node), and
CNodeIdletimeGet(node), respectively. The following
are some sample $node arguments that you may put in the
configuration file.
node_name CNode..Get host res
================== ========================= ==========
<sunos4_nodename> CNodeIdletimeGet idletime
<sunos4_nodename> CNodeLoadAveGet loadave
<sunos4_nodename> CNodeMemTotalGet[real] physmem
<sunos4_nodename> CNodeMemTotalGet[virtual] totmem
<sunos4_nodename> CNodeMemAvailGet[virtual] availmem
<irix5_nodename> CNodeNumCpusGet ncpus
<irix5_nodename> CNodeMemTotalGet[real] physmem
<irix5_nodename> CNodeMemTotalGet[virtual] totmem
<irix5_nodename> CNodeIdletimeGet idletime
<irix5_nodename> CNodeLoadAveGet loadave
<irix5_nodename> CNodeMemAvailGet[virtual] availmem
<linux_nodename> CNodeNumCpusGet ncpus
<linux_nodename> CNodeMemTotalGet[real] physmem
<linux_nodename> CNodeMemTotalGet[virtual] totmem
<linux_nodename> CNodeIdletimeGet idletime
<linux_nodename> CNodeLoadAveGet loadave
<linux_nodename> CNodeMemAvailGet[virtual] availmem
<solaris5_nodename> CNodeIdletimeGet idletime
<solaris5_nodename> CNodeLoadAveGet loadave
<solaris5_nodename> CNodeNumCpusGet ncpus
<solaris5_nodename> CNodeMemTotalGet[real] physmem
<aix4_nodename> CNodeIdletimeGet idletime
<aix4_nodename> CNodeLoadAveGet loadave
<aix4_nodename> CNodeMemTotalGet[virtual] totmem
<aix4_nodename> CNodeMemAvailGet[virtual] availmem
<unicos8_nodename> CNodeIdletimeGet idletime
<unicos8_nodename> CNodeLoadAveGet loadave
<unicos8_nodename> CNodeNumCpusGet ncpus
<unicos8_nodename> CNodeMemTotalGet[real] physme
<unicos8_nodename> CNodeMemAvailGet[virtual] availmem
<unicos8_nodename> CNodeSwapSpaceTotalGet[primary] swaptotal
<unicos8_nodename> CNodeSwapSpaceAvailGet[primary] swapavail
<unicos8_nodename> CNodeSwapInBwGet[primary] swapinrate
<unicos8_nodename> CNodeSwapOutBwGet[primary] swapoutrate
<unicos8_nodename> CNodePercentIdleGet cpuidle
<unicos8_nodename> CNodePercentSysGet cpuunix
<unicos8_nodename> CNodePercentGuestGet cpuguest
<unicos8_nodename> CNodePercentUsrGet cpuuser
<unicos8_nodename> CNodeSrfsSpaceAvailGet[$FASTDIR] quota[type
=ares_avail,
dir=$FASTDIR]
<unicos8_nodename> CNodeSrfsSpaceAvailGet[$BIGDIR] quota[type
=ares_avail,
dir=$BIGDIR]
<unicos8_nodename> CNodeSrfsSpaceAvailGet[$WRKDIR] quota[type
=ares_avail,
dir=$WRKDIR]
<sp2_nodename> CNodeLoadAveGet loadave
Suppose you have an execution host that is of irix5 os
type, then the <irix5_node_name> entries will be
consulted by the scheduler. The initial scheduling cycle
would involve sending the STATIC queries ncpus,
physmem, totmem to the execution host's MOM,
and access to return values of the queries is done via
CNodeNumCpusGet(node), CNodeMemTotalGet(node,
"real"), CNodeMemTotalGet(node,
"virtual") respectively, where node is the
CNode representation of the execution host. The subsequent
scheduling cycles will only send DYNAMIC queries
idletime, loadave, and availmem, and
access to the return values of the queries is done via
CNodeIdleTimeGet(node), CNodeLoadAveGet(node),
CNodeMemAvailGet(node, "virtual").
respectively.
"Later" entries in the config file take
precedence.
The configuration file must be "secure". It
must be owned by a user id and group id less than 10 and not
be world writable.
On receipt of a SIGHUP signal, the scheduler will close
and reopen its log file and reread its configuration file
(if any).
FILES
|
$PBS_SERVER_HOME/sched_priv |
|
the default directory for configuration files, typically
(/usr/spool/pbs)/sched_priv. |
Signal Handling
|
A C based scheduler will handle the following
signals: |
|
The server will close and reopen its log file and reread the
config file if one exists. |
|
If the site supplied scheduling module exceeds the time
limit, the Alarm will cause the scheduler to attempt to core
dump and restart itself. |
|
Will result in an orderly shutdown of the
scheduler. |
|
All other signals have the default action
installed. |
EXIT STATUS
|
Upon normal termination, an exit status of zero is
returned. |
SEE ALSO