Farmer's Guide to D0 Monte Carlo Production

V1.0 Ransom Stephens, May 8, 1996.

Introduction and Contents

The Farmer's guide covers the essentials of participating in the D0 Monte Carlo Production Collective. Much of this document is of a general information flavor. However, the FATMEN file naming and tape/labeling is of primary importance as a reference.

  1. The Monte Carlo Production Collective
  2. D0Geant Requests and How They're Handled
  3. Communication Among the Farmers and Your WWW Status Page
  4. File and Tape Handling: FATMEN
  5. Software Details

Here's a link to the D0 Monte Carlo Production page.

1. The Monte Carlo Production Collective

The collective was assembled to provide a coherant uniform method for people to request Monte Carlo jobs that they may be produced in an efficient manner maximizing the CPU potential of the entire collaboration. The charter members of the collective are UTA, FSU, UMd, and FNAL. The idea is to utilize every available CPU cycle throughout the collaboration. The idea is not to take for collaborators at a given institution to sacrifice CPU cycles for another institution's analysis.

Participation in the Collective is a service task. (Don't tell anyone, but it's a very easy service task which maximizes institutional recognition and minimizes actual work, just your style, eh?) As a service task it allows institutions to play a more direct role in a much wider set of analyses than usually possible.

2. D0Geant Requests and How They're Handled

The system operates under the direction of the Offline Computing Priorities Board (OCPB):
  1. Physicists performing analyses arrange to have their requests made by their physics group OCPB representative. The OCPB representative then use the WWW to fill out a request form. The URL of the DØGeant request form is
    http://www-hep.uta.edu/stephens/d0/geant_mcform.html
    The form is fully annotated.
  2. The form is then mailed ot the Coordinator who copies it to a directory on the D0FS cluster:
    D0FS::USR$ROOT21:[D0GEANT.REQUEST.OCPB]
    If there is a backlog of requests, they are reviewed by the OCPB which sets priorities.
  3. Given these priorities, the Coordinator then requests specific CPU farms to take on a given job. The farms then interact with the person who made the request to arrange the details of file/tape delivery.
  4. To keep tabs on requests, there are four different directories on D0FS:
    D0FS::USR$ROOT21:[D0GEANT.REQUEST.ocpb/todo/inpr/done]
    each request has its own file, referenced by the "order number" (which is assigned when the request is made) and the nine character file descriptor. The files first appear in the OCPB directory, then in the TODO directory after OCPB review, then when a farm has accepted the request, the file is moved to the INPR directory during production, and finally it is put in the DONE area after production has completed. The D0FS directories can be accessed through the MC Production web page.
The production jobs should be reasonable in the sense that the product of a request should fit on one tape. This usually means a scale of 15,000 events. Of course, the CPU days required to process such an event depends on the farm's resources.

3. Communication Among the Farmers and Your WWW Status Page

Communication among the people who run production Monte Carlo is the critical part of this system. Use of a combination of the telephone and electronic mail has proven nearly useless. Since we are all very busy with many different things, and since the Coordinator needs to know the immediate status of a given job on a given farm, we have agreed to have dynamically updated web status pages. These pages need not be at all complicated. At UTA when a file is processed the date, time, and output file name are written to the top of a simple text file. This way anyone can see the progress of a given job. It is nearly trivial to implement such a dynamic web page and the utility of the page is enormous. Anyone can access this file from the D0 Monte Carlo Production page.

Let me reiterate. The web status page is extremely important to both the Requestor and the Coordinator. The Coordinator needs to know when a job is finished so that s/he can shuffle the request form to the [...done] area on D0FS and so s/he will know that the given farm may be available for another production job.

It would also be nice to have a link from the D0 Monte Carlo Production page to a more formal farm page in addition to the status page.

4. File and Tape Handling: FATMEN

To maximize production efficiency and to decrease the probability of redundant production jobs, we archive the request forms and insist that the files produced be installed in the FATMEN catalog. As long as the guidelines of file naming and tape labeling are followed this can be reasonably automatic. The request form includes instructions for the requestor on how input files should be named. Here I give more complete instructions including naming of output files.

There's more information in D0Notes 1744 and 2403.

4.1 FATMEN: File Naming

Input files, provided by the requestor should have the naming format:
GRP_EVENTTYPE_SIMUL_A_nn.EXT
and, more importantly, files written to tape MUST have the cormat
GRP_EVENTTYPE_SIMUL_DETCSIM_A_nn.EXT
  1. GRP = three characters for the physics group:
  2. EVENTTYPE = nine character Requestor assigned description of events;
  3. SIMUL = five character description of event generator including version number, e.g. IS714 = isajet 7.14;
  4. DETCSIM = Geant version and showering parameter, e.g. G315TS5 for D0GEANT version 3.15 and SHWG 5 or G314SS0 for version 3.14 and SHWG 0.
  5. A = one alphanumeric descriptor the requestor can choose for organizational purposes - called the tape descriptor;
  6. nn = two digit file number;
  7. EXT = X_RAW01 ==> eXchange format, raw data output.
For example an input file
NEW_S200G300_IS714_A_01.ISA
Could become the output file
NEW_S200G300_IS714_G314SS5_A_01.X_RAW01
               for Geant V3.14, 
                    SHWG 5, 
                               eXchange format, 
                                  RAW geant output.
another example, for version 3.15:
TOP_TTLL170XS_HW057_A_20.ISA
Could become TOP_TTLL170XS_HW057_G315TS0_A_20.X_RAW01 for Geant V3.15, SHWG 0, eXchange format, RAW geant output. I don't know why Geant V3.15 gets G315TS and Geant V3.14 gets G314SS.

This is pretty straightforward, but it gets quite a bit more involved when we start running D0RECO on the Monte Carlo output.

4.2 FATMEN: Tape Writing and Labeling

You can write any set of files to a tape in any order provided that the file names are unique. However, it's nice to write them in a logical sequence. Two tapes should be written, one sent to the requestor and the other to the FATMEN at D0. If one tape is written then it should be sent to the FATMEN at D0. We have been assigned tape label seeds. Here's the list, be sure to label your output tapes appropriately: The nnn should be a number unique for each tape. It is also nice to print out a list of the files on a tape, fold it up and stick it in the tape box.

4.3 FATMEN: RCP Catalog File creation

This is where things get ugly. Basically we each need to have scripts which can automatically create an RCP file that includes all kinds of horrible information for each file on a tape. This file should be copied to a FNALD0 disk where it can be accessed by the FATMEN at D0. When I understand what's involved here, I'll fill this section in and provide both a perl script (from the RECO farm) and a VMS command file (from the UTA farm) which can make the RCP files from a listing of the files on the tape and (hopefully) just a few other inputs.

5. Software Details

Of course you need to know which version of D0GEANT your farm is running. You can figure this out by looking at the log file near the top. You need this to get the file names right (as described above). As a producer of D0GEANT you should also have some familiarity with the options available through the Geant FFREAD cards. The quickest way to get this familiarity is to check out the Geant card description which is also accessible from the request form .

Ransom W. Stephens