Farmer's Guide to D0 Monte Carlo Production
V1.0 Ransom Stephens, May 8, 1996.
Introduction and Contents
The Farmer's guide covers the essentials of participating in the D0 Monte Carlo
Production Collective. Much of this document is of a general information flavor.
However, the FATMEN file naming and tape/labeling is of primary importance as a
reference.
- The Monte Carlo Production Collective
- D0Geant Requests and How They're Handled
- Communication Among the Farmers and Your WWW
Status Page
- File and Tape Handling: FATMEN
- Software Details
Here's a link to the
D0 Monte Carlo Production page.
The collective was assembled to provide a coherant uniform method for people
to request Monte Carlo jobs that they may be produced in an efficient manner
maximizing the CPU potential of the entire collaboration. The charter members of
the collective are UTA, FSU, UMd, and FNAL. The idea is to utilize every
available CPU cycle throughout the collaboration. The idea is not
to take for collaborators at a given institution to sacrifice CPU cycles
for another institution's analysis.
Participation in the Collective is a service task. (Don't tell anyone,
but it's a very easy service task which maximizes institutional recognition
and minimizes actual work, just your style, eh?) As a service task it allows
institutions to play a more direct role in a much wider set of analyses than
usually possible.
The system operates under the direction of the Offline Computing Priorities
Board (OCPB):
- Physicists performing analyses arrange to have their
requests made by their physics group OCPB representative. The OCPB
representative then use the WWW to fill out a request form. The URL of the
DØGeant request form is
http://www-hep.uta.edu/stephens/d0/geant_mcform.html
The form is fully annotated.
- The form is then mailed ot the Coordinator who copies it to a directory
on the D0FS cluster:
D0FS::USR$ROOT21:[D0GEANT.REQUEST.OCPB]
If there is a backlog of requests, they are reviewed by the
OCPB which sets priorities.
- Given these priorities, the Coordinator then requests specific
CPU farms to take on a given job. The farms then
interact with the person who made the request to arrange the details of
file/tape delivery.
- To keep tabs on requests, there are four different directories on
D0FS:
D0FS::USR$ROOT21:[D0GEANT.REQUEST.ocpb/todo/inpr/done]
each request has its own file, referenced by the "order number"
(which is assigned when the request is made) and the nine character
file descriptor. The files first appear in the OCPB directory, then
in the TODO directory after OCPB review, then when a
farm has accepted the request, the file is moved to the INPR directory
during production, and finally it is put in the DONE
area after production has completed. The D0FS directories can be
accessed through the MC Production web page.
The production jobs should be reasonable in the sense that the product of
a request should fit on one tape. This usually means a scale of 15,000 events.
Of course, the CPU days required to process such an event depends on the farm's
resources.
Communication among the people who run production Monte Carlo is the critical
part of this system. Use of a combination of the telephone and electronic mail
has proven nearly useless. Since we are all very busy with many different
things, and since the Coordinator needs to know the immediate status of a given
job on a given farm, we have agreed to have dynamically updated web
status pages.
These pages need not be at all complicated. At UTA when a file is processed
the date, time, and output file name are written to the top of a
simple text file. This way anyone can see the progress of a given job. It is
nearly trivial to implement such a dynamic web page and the utility of
the page is enormous. Anyone can access this file from the
D0 Monte Carlo Production page.
Let me reiterate. The web status page is extremely important to
both the Requestor
and the Coordinator. The Coordinator needs to know when a job is finished
so that s/he can shuffle the request form to the [...done]
area on D0FS and so s/he will know that the given farm may be available
for another production job.
It would also be nice to have a link from the D0 Monte Carlo Production page
to a more formal farm page in addition to the status page.
To maximize production efficiency and to decrease the probability of
redundant production jobs, we archive the
request forms and insist that the files produced be installed in the FATMEN
catalog. As long as the guidelines of file naming and tape labeling are
followed this can be reasonably automatic. The
request form includes instructions for the requestor on how input files
should be named. Here I give more complete instructions including naming
of output files.
There's more information in D0Notes 1744 and 2403.
4.1 FATMEN: File Naming
Input files, provided by the requestor should have the naming format:
GRP_EVENTTYPE_SIMUL_A_nn.EXT
and, more importantly, files written to tape MUST have the cormat
GRP_EVENTTYPE_SIMUL_DETCSIM_A_nn.EXT
- GRP = three characters for the physics group:
- EVENTTYPE = nine character Requestor assigned description of events;
- SIMUL = five character description of event generator including version
number, e.g. IS714 = isajet 7.14;
- DETCSIM = Geant version and showering parameter, e.g. G315TS5 for D0GEANT
version 3.15 and
SHWG 5 or G314SS0 for version 3.14 and
SHWG 0.
- A = one alphanumeric descriptor the requestor can choose for
organizational purposes - called the tape descriptor;
- nn = two digit file number;
- EXT = X_RAW01 ==> eXchange format, raw data output.
For example an input file
NEW_S200G300_IS714_A_01.ISA
Could become the output file
NEW_S200G300_IS714_G314SS5_A_01.X_RAW01
for Geant V3.14,
SHWG 5,
eXchange format,
RAW geant output.
another example, for version 3.15:
TOP_TTLL170XS_HW057_A_20.ISA
Could become
TOP_TTLL170XS_HW057_G315TS0_A_20.X_RAW01
for Geant V3.15,
SHWG 0,
eXchange format,
RAW geant output.
I don't know why Geant V3.15 gets G315TS and Geant V3.14 gets G314SS.
This is pretty straightforward, but it gets quite a bit more involved when we
start running D0RECO on the Monte Carlo output.
4.2 FATMEN: Tape Writing and Labeling
You can write any set of files to a tape in any order provided that
the file names are unique. However, it's nice to write them in a logical
sequence.
Two tapes should be written, one sent to the requestor and the other to
the FATMEN at D0. If one tape is written then it should be sent to
the FATMEN at D0.
We have been assigned tape label seeds. Here's the list, be sure to label your
output tapes appropriately:
- UUCnnn - University of Maryland
- UUDnnn - University of Texas at Arlington
- UUEnnn - Florida State University
- UUFnnn - Northern Illinois University
The nnn should be a number unique for each tape. It is also nice to print out a
list of the files on a tape, fold it up and stick it in the tape box.
4.3 FATMEN: RCP Catalog File creation
This is where things get ugly. Basically we each need to have scripts which can
automatically create an RCP file
that includes all kinds of horrible information for each file on a tape. This
file should be copied to a FNALD0 disk where it can be accessed by the
FATMEN at D0.
When I understand what's involved here, I'll fill this section in and provide
both a perl script (from the RECO farm) and a VMS command file (from the UTA
farm) which can make the RCP files from a listing of the files on the tape
and (hopefully) just a few other inputs.
Of course you need to know which version of D0GEANT your farm is running. You
can figure this out by looking at the log file near the top. You need this to
get the file names right (as described above).
As a producer of D0GEANT you should also have some familiarity with the
options available through the Geant FFREAD cards. The quickest way to get
this familiarity is to check out the
Geant card description which is also accessible from the
request form .
Ransom W. Stephens