SCHOOL OF ENGINEERING AND COMPUTER SCIENCE

Technical Note - Grid Computing: ECS Grid

IMPORTANT

If you follow the insructions here and something does not work, please let the programing staff know

READ THIS 2011

There were some siginificant changes to the ECS/SGE Grid at the start of 2011 academic year. See the section below.

Summary

There didn't seem to be any ECS-focused documentation aimed at the users wishing to run jobs on the ECS Grid, so here are the basics of job submission, this being the area in which things differ most from the provider's documentation http://dlc.sun.com/pdf/820-0699/820-0699.pdf. Other aspects of job control are covered within that documentation.

Note: After Oracle brought Sun, they stopped providing the documentation at that link.

Details

General

ECS administers two "Grids" known, in administration circles, as the ECS Grid and the SCS Grid. You may also find these referred to as the SGE Grid and the Condor Grid respectively. The two are seperate.

The ECS Grid runs under the control of a Sun Grid Engine (SGE) and exists to make use of the computing power of the School's NetBSD machines at times when they are unused, ie when they should have no one logged in at the console, eg overnight.

Jobs, usually shell scripts wrapping a number of tasks, are submitted from any ECS workstation into a simple queuing system where they remain, in turn, until an "unused" machine is able to start them. (Note that being able to start them is not the same as being able to run them to completion).

At present, users at the console of a machine have priority over Grid jobs running on the same machine to the extent that a Grid job will be suspended upon a machine where there is console activity, so users submitting Grid jobs should be aware that there is no guaranteed run time for any given task. Basically, it'll finish when it finishes

Setting up the environment

A single SGE instance can control a number of "Grids". In order to provide the SGE utilities with information about which Grid the user wishes to run their job within, a couple of environmental variables need to be set up. This is achieved using the standard package system's need pkgname environment modifiying process.

We'll be using the SGE Grid (ECS also maintains the SCS Grid not accessible from here), so a simple

need sgegrid

suffices to set up the environment for job submission.

Do I have a home on the Grid?

This is slightly quirky and not initially intuitive.

Staff will not be able to access their home directories

Students will

A user of an SGE-controlled Grid might expect to find that their jobs start to execute from a home directory within the overall system, that home directory being accessible to all machines within that system and, for the case of a grid utilising their everyday machines, being their normal working directory after logging in to any of those machines interactively.

A simple qsub submission_script_name would then be enough to start the job off.

With the ECS Grid, however, despite all staff and students having a home directory no matter which ECS machine they might login at, this is only the case for student accounts.

Because the machines comprising the Grid system can be any of the School workstations, both individuals' office machines and public access lab machines, staff will not see their home directories accessible from a remote machine when running a Grid job and so will have to explicitly set an initial working directory elsewhere.

This can be achieved on the command line at submission time, by use of the -wd path option to qsub though perhaps a better option for staff is to always place the equivalent SGE directive at the top of the submission script to a known path.

#$ -wd path

Of course, non-staff users may also find this mechanism useful.

At the time of this revision of this document, there is a small area of accessible filestore that could be used to create a standard area from which staff could reliably start SGE jobs,

/vol/grid/sgeusers/username (Now /vol/grid-solar/sgeusers/username see below)

but until that becomes a standard that staff can rely upon, staff should follow the guidelines pertaining to the location of input and output files for Grid jobs.

You need to ask for this area to be created for you.

May 2010: New filesystem for grid job data

As of May 2010, all of the machines participating in the ECS Grid will have access to an area of filestore that is much larger in size than either of the two existing filesystems

/vol/scratch and /vol/grid/sgeusers/username

The new area should be used in preference to either of the above.

The new area is a mount of some filestore supplied by ITS and users of the ECS grid will find that they have a personal directory within that filesystem, accessible as:

/vol/grid-solar/sgeusers/username

If you have not used the grid in a while, you may not have a directory there. Just ask.

Where will the input and output files be?

Because this is a distributed batch processing environment, there's usually no clear indication as to which machine(s) your job(s) will end up running on.

You thus need to give a little more thought to the location of input and output files than if you were simply running a job on your own workstation where everything is local to the machine (though, who knows, your Grid job might end up running on your workstation).

Once the job is running on the remote machine it will have access to many of the NFS shared filesystems that that the user would expect to see from their own workstation during an interactive session.

This can be useful when large data sets need to be accessible for reading and the overhead of copying the data to each machine upon which the job is running is large, because they can be placed at known paths.

The NFS-shared filesystems are less of an advantage for writing out data.

  1. Writing over NFS is often slower than reading
  2. There is the potential for bottlenecks to occur where a user has each job writing to the same directory (or even file, if they get things wrong) over NFS, or where many users each have jobs writing to the same NFS partition.

It is thus advisable to arrange for any output from the program to be written to a directory local to the machine upon which the program is running and then to copy any output to filestore to which the user will have general access, at the end of the job.

The area of filestore provided for this purpose upon every ECS Grid machine is the directory

/local/tmp

Note that this directory may well be being used by the user who normally sits at the console, and will almost certainly be used by Grid jobs that came before your current one and those that come after yours, and so there is no guarantee that a path or file name that you wish to create does not exist already.

To avoid any clashes; as a courtesy to other users, and to simplify the process of cleaning up afterwards, it is advisable to write files from the currently running job into a directory below the path /local/tmp that follows the convention

/local/tmp/[username]/$JOB_ID

where $JOB_ID is an environmental variable maintained by the SGE for the duration of the job which will thus be available to your submission script and to any programs able to read the environment.

This directory can be created from within your job submission script by use of the command

mkdir -p /local/tmp/[username]/$JOB_ID

Preserving results after execution

Once your job has run and the submission script has terminated, any output written to /local/tmp/[username]/$JOB_ID will only be accessible on the machine upon which the job ran.

In order to get your output back to somewhere more useful to you, there are a number of options:

  1. If you have direct access to your home directory path, you can copy directly to that. Staff can't do this
  2. If you have write access to a shared filesystem you can copy directly to that and then move files into your own filestore from your own machine

Staff will need to exercise Option 2 and will find they have access to create directories below /vol/scratch

Directory creation within /vol/scratch should follow the guidelines given for /local/tmp above.

There is a second area of shared filestore that is probably too small to be of much use and which requires explicit creation of a directory to /read from/write into/ anyway, but which is mentioned here for completeness.

/vol/grid/sgeusers/

Cleaning up

If you have done the decent thing and created a directory specific to the current job, then after you have finished the job, including copying data to a more permanent storage area, it would be a benefit to the Grid facility as a whole if your job script removed any files created during the run on the remote machine.

This can be achieved, assuming you have followed the guidelines above by placing this command at the end of your job submission script

rm -fr /local/tmp/[username]/$JOB_ID

DANGER, Will Robinson! Cleaning up can be dangerous:you might be running on your own machine where you have permissions to do lots of stuff.

As an example, it might be tempting to remove everything below the path starting from your username, vis:

rm -fr /local/tmp/[username]

however, if you end up having your job run on your own workstation, whilst you were logged off from it, and were in the habit of using /local/tmp as a place for temporary files, you may find that you do not simply delete files from the current run.

Similarly, were two of your jobs to run on the same machine at the same time, as could well be the case for multi-processor machines, you may end up removing files from a job still running.

To reiterate the point then, placing all the files related to your current job below a directory designated for that job is likely to be of benefit to you and not just others.

Where do stdin, stdout and stderr appear

When one runs programs locally, program output and error messages will often appear on the console, or in the terminal emulator, and one can usually perform command line redrection for input.

When you are running a non-interactive job on a remote machine however, it is likely that you aren't going to see any console output during the execution of the program.

The SGE therefore redirects the stdout and stderr channels to file so that they may be inspected after the job has finished.

Typically the default stdout and stderr naming conventions try to create files called scriptname.o$JOB_ID and scriptname.e$JOB_ID respectively, in the working directory of the task when it starts. (See the note about working directories for staff)

The default location of these files can be altered by use of qsub command-line options or via the corresponding SGE directives being specified in the job submission script.

Job submission script example

Basic jobs just need to run on some machine within the Grid. If you know that you want a certain type of processor or need a minimum amount of memory or disk space, then you probably know enough to create the relevant submission script, or at least be capable of reading the Sun documentation for more information.

The job submission script freds_test.sh, used as an example here, is effectively then, just a simple test of the system. (though running a simple test to check that the basics, eg, directories existing, before you submit 5000 jobs that write to them is GOOD thing)

There are, however, a couple of advancements over a simple fire and forget activity.

We've taken the view that you'll want to know when your job starts and finishes and so the example job submission script will tell the Grid Engine to email you when it does.

We've taken the view that you'll want to script using the Bourne Shell so we'll force the SGE to run your job submission script within that shell, because the initial line's #!/bin/sh may not be honoured.

We've taken the view that you'll be doing something more than just adding a couple of integers together in a loop (and, no, adding a couple of thousand integers together still doesn't count) so we'll try and access some areas of the filestore you will have access to when you run your jobs and move things around.

Finally, we've written the example for an ECS user with the username fred and the mail address Fred.Bloggs@ecs.vuw.ac.nz, so you might need to change a few things in the script if you are not Fred Bloggs and/or not in ECS.

(HINT: Search for the strings fred, Fred.Bloggs, FRED and ecs).

BIGGER HINT ADDED

after someone tried to mail Fred.Bloggs@ecs.vuw.ac.nz and Fred was not happy getting all the emails.

Search for the strings fred, Fred.Bloggs, FRED and ecs and change to match your username, email address and school

And of course, this is just an example. Once you have modifed the scirpt to suit your needs and run a few tests to check things work as you expect then, you will probably want to remove some, if not all, of the recording of the environment and so on - but that's up to you.

Sometimes, having the info can be useful to a debugging exercise, eg when you are trying to invoke something not on the PATH because you are effectivley logging into a non-interactive environment: sometimes, all the extra clutter makes it hard to see what's happening.

That "extra clutter" should also include the individual job emails you will get, so it is usually worth electing not to be informed when running large numbers of jobs.

A basic job submission script

#!/bin/sh
#
# Force Bourne Shell if not Sun Grid Engine default shell (you never know!)
#
#$ -S /bin/sh
#
# I know I have a directory here so I'll use it as my initial working directory
#
#$ -wd /vol/grid-solar/sgeusers/fred 
#
# Mail me at the b(eginning) and e(nd) of the job
#
#$ -M Fred.Bloggs@ecs.vuw.ac.nz
#$ -m be
#
# End of the setup directives
#
# Stdout from programs and shell echos will go into the file scriptname.o$JOB_ID
#  so we'll put a few things in there to help us see what went on
#
echo ==UNAME==
uname -n
echo ==WHO AM I and GROUPS==
id
groups
echo ==SGE_O_WORKDIR==
echo $SGE_O_WORKDIR
echo ==/VOL/SCRATCH==
ls -ltr /vol/scratch/
echo /LOCAL/TMP
ls -ltr /local/tmp/
echo ==/VOL/GRID==
ls -l /vol/grid-solar/sgeusers/
#
# OK, where are we starting from and what's the environment we're in
#
echo ==RUN HOME==
pwd
ls
echo ==ENV==
env
echo ==SET==
set
#
echo == WHATS IN LOCAL/TMP ON THE MACHINE WE ARE RUNNING ON ==
ls -ltra /local/tmp | tail
#
# Now let's do something useful, but first create a directory specific to this job
#  and copy something we already know exists into it
#
mkdir -p /local/tmp/fred/$JOB_ID
#
# Check we have somewhere to work now and if we don't, exit nicely.
#  We could do more to try and run here but this is just a test
#
if [ -d /local/tmp/fred/$JOB_ID ]; then
        cd /local/tmp/fred/$JOB_ID
else
        echo "There's no job directory to change into "
        echo "Here's LOCAL TMP "
        ls -la /local/tmp
        echo "AND LOCAL TMP FRED "
        ls -la /local/tmp/fred
        echo "Exiting"
        exit 1
fi
#
# Now we are in the job-specific directory so
#
echo == WHATS IN LOCAL TMP FRED JOB_ID AT THE START==
ls -la 
#
# Copy the input file ot the local directory
#
cp /vol/grid-solar/sgeusers/fred/krb_tkt_flow.JPG .
echo ==WHATS THERE HAVING COPIED STUFF OVER AS INPUT==
ls -la 
# 
# Note that we need the full path to this utility, as it is not on the PATH
#
/usr/pkg/bin/convert krb_tkt_flow.JPG krb_tkt_flow.png
#
echo ==AND NOW, HAVING DONE SOMTHING USEFUL AND CREATED SOME OUTPUT==
ls -la
#
# Now we move the output to a place to pick it up from later and clean up
#  (really should check that directory exists too, but this is just a test)
#
mkdir -p /vol/grid-solar/sgeusers/fred/$JOB_ID
cp krb_tkt_flow.png  /vol/grid-solar/sgeusers/fred/$JOB_ID
#
# Do the cleaning up from our starting directory
#
echo ==CLEANING UP==
cd /vol/grid-solar/sgeusers/fred
pwd
rm -fr /local/tmp/fred/$JOB_ID
echo ==WHATS LEFT IN MY LOCAL TMP==
ls -ltra /local/tmp/fred | tail
#
echo "Ran through OK"

Emailed output

Fred will see an email message like this when the job starts:

Subject:      Job 341642 (freds_test.sh) Started

Job 341642 (freds_test.sh) Started
 User       = fred
 Queue      = GX755
 Host       = lumiere.ecs.vuw.ac.nz
 Start Time = 03/18/2009 16:20:54

and one like this when it ends:

Subject:      Job 341642 (freds_test.sh) Complete

Job 341642 (freds_test.sh) Complete
 User             = fred
 Queue            = GX755@lumiere.ecs.vuw.ac.nz
 Host             = lumiere.ecs.vuw.ac.nz
 Start Time       = 03/18/2009 16:20:54
 End Time         = 03/18/2009 16:20:55
 User Time        = 00:00:00
 System Time      = 00:00:00
 Wallclock Time   = 00:00:01
 CPU              = NA
 Max vmem         = NA
 Exit Status      = 0

Specialised job summission

As previously detailed, basic jobs just need to run on some machine within the Grid.

There may, of course, be classes of job where ensuring that all tasks run on the same architecture is desirable, an example within ECS being a desire to ensure run timings were not influenced by differences in the model of machine that individual tasks were executed on.

Similarly, temporary resource partitioning requests that ensure students in a lab tutorial can target the machines in the lab that has been booked for them, require a handle through which the user can access a subset of the full SGE Grid.

A number of the SGE utilities, including qsub allow for a resource request list to be defined by use of the

 -l resource=value

where the resources are maintained within the SGE Complex

Currently, the local additions (some of which may not always be populated) to the SGE Complex are

ecs_df_local 
ecs_model
ecs_netgroup
ecs_room

so a user wishing to target only those machines which are the model GX745 would need to add

 -l ecs_model=GX745

to their SGE command.

Update 2011: Changes to the ECS/SGE Grid

Following the introduction of this year's new hardware, some of the ECS/MSOR machines will be running a GNU/Linux variant operating system (OS) as opposed to the previous NetBSD OS.

The primary effect of this for the ECS Grid is that programs you've compiled (or obtained) for the NetBSD OS are unlikely to run should your grid job end up being scheduled to a GNU/Linux machine, and vice versa, once you start compiling on the new platform.

However, platform neutral stuff, eg school-wide packages you run in batch mode, or Java programs, should be less affected, if at all.

Two of the better approaches for handling the change are:

  1. specifically requesting the OS you want when submitting the job
  2. creating multiple copies of your "compiled" codes and testing for the OS so as to invoke the correct binary for the machine your grid job runs on.

The first approach can be achieved by using qsub's -l argument.

A qsub command targetting NetBSD machines might now thus be:

  qsub -l arch=nbsd-i386 your_script.sh

or to target the ArchLinux machines,

 qsub -l arch=lx26-x86 your_script.sh

With the second approach, you could check for the OS your job ends up running on, in the submission script, and then branch so as to run the appropriate binary.

You could differentiate the binaries by using directories or filename extensions.

The SGE will actually set an environmental variable for you to test against,eg

SGE_ARCH=nbsd-i386

You should also have access to the utility that SGE uses to provide its own view of things, on all the machines in the grid, as

 /usr/pkg/sge/util/arch

an invocation of which will return nbsd-i386 or lx26-x86

So, for example, you might have, if you choose to use the value that SGE would return to differentitate, directories containing OS-specific binaries with the same name:

  /vol/grid-solar/username/mycodes/bin/nbsd-i386/prog1
  /vol/grid-solar/username/mycodes/bin/lx26-x86/prog1

or these programs, where the names differentiate them:

 /vol/grid-solar/username/mycodes/bin/prog1.nbsd-i386
  /vol/grid-solar/username/mycodes/bin/prog1.lx26-x86

Here is a template (in Bourne shell syntax) that will allow you to branch your submission script, your commands would go where the "I could run" echo statements appear:

if [ -z "$SGE_ARCH" ]; then
     echo "Can't determine SGE ARCH"
 else
     if [ "$SGE_ARCH" = "nbsd-i386" ]; then
         echo "I could run a NetBSD binary"
     fi
     if [ "$SGE_ARCH" = "lx26-x86" ]; then
         echo "I could run a Linux x86 binary"
     fi
 fi

and a similar version for the C shell syntax (though it is a good idea to write your job submission scripts in Bourne shell syntax)

if ( $?SGE_ARCH == 0 ) then
    echo "Can't determine SGE ARCH"
else
    if ( $SGE_ARCH == "nbsd-i386" ) then
        echo "I could run a NetBSD binary"
    endif
    if ( $SGE_ARCH == "lx26-x86" ) then
        echo "I could run a Linux x86 binary"
    endif
endif

Compilation for the mixed environment

One potential "gotcha", brought to light by Aaron Scoble, is that, following the upgrade, there is no labb access to a NetBSD machine that allows you to compile new codes that let you target NetBSD machines in the ECS Grid.

You'll thus need to login to a server, eg, greta-pt.

Similarly, if you are someone without access to the ECS lab machines, you'll not have access to an ArchLinux machine on which to compile code targetting those grid resources.

In this case, however, you should find that binaries compiled for i386 (so not x86_64) on other GNU/Linux machines you have access to may work.

If you experience other problems in this area, please get in touch with us.

Running Java programs on the ECS/SGE Grid

Running Java programs on the ECS/SGE Grid has always been compilcated by the fact that to set up the full Java environment you would get when running interactively, using a need javaXYZ command, you did not have access to the need facility from within the default Grid environment.

With the introduction into the School of the ArchLinux machines, currently (May 2011) running alongside the older NetBSD boxes, it seems worth providing some basic guidelines that should allow many Java programs to operate across as many machines as possible.

With a bit of guinea-pig'ing by Roman Klapaukh, it would appear that the following stanza should allow one to submit Java programs to the ECS Grid, without worrying about the OS your job ends up running against.

Note that this solution uses the mechanism outlined above for determining the OS and thus, should you need to branch any other operations on that test, you could combine them.

Note also that you're probaly not the user fred so YOU WILL NEED TO EDIT THE SCRIPT

#!/bin/sh
#
# Force Bourne Shell if not Sun Grid Engine default shell (you never know!)
#
#$ -S /bin/sh
#
# I know I have a directory here so I'll use it as my initial working directory
#
#$ -wd /vol/grid-solar/sgeusers/fred 
#
if [ -z "$SGE_ARCH" ]; then
   echo "Can't determine SGE ARCH"
else
   if [ "$SGE_ARCH" = "nbsd-i386" ]; then
       JAVA_HOME="/usr/pkg/java/jdk-1.6.0"
   fi
   if [ "$SGE_ARCH" = "lx26-x86" ]; then
       JAVA_HOME="/usr/pkg/java/sun-6"
   fi
fi

if [ -z "$JAVA_HOME" ]; then
   echo "Can't define a JAVA_HOME"
else
   export JAVA_HOME
   PATH="/usr/pkg/java/bin:${JAVA_HOME}/bin:${PATH}"; export PATH

   java Hello
fi

Using DRMAA with the ECS/SGE Grid

This section provides some information and typical commands required to compile and run codes making use of DRMAA.

Background

Some simple source code examples of using DRMAA via the C and Java bindings have been placed below:

/vol/grid-solar/sgeusers/admin/DRMAA

as an introduction to users wishing to experiment with DRMAA codes.

The C example sources originally come from a Dr Dobbs Journal article (2004, Frederic Pariente)

http://www.ddj.com/184405932

which can seemingly still be found at:

http://www.drdobbs.com/184405932

though an archived version, less the Flash adverts, of the original article is provided locally:

/vol/grid-solar/sgeusers/admin/DRMAA/DrDobbs-2004-Article/

The Java example sources come from the SGE source code distribution, although a small change is required in order to have the codes work as expected.

C Bindings

The C bindings make use of the header file

/usr/pkg/sge/include/drmaa.h

and the default SGE shared library

/usr/pkg/sge/lib/nbsd-i386/libdrmaa.so.1.0

Java Bindings

The Java bindings make use of a locally-compiled JAR-file and dynamic library

/vol/grid-solar/sgeusers/admin/drmaa.jar

/vol/grid-solar/sgeusers/admin/libdrmaa.so

which required a rebuild from the SGE sources.

Simple, proof of concept example: C Binding

Create a directory ~/DRMAA, change into that directory and copy the exammple codes provided over,

% cp /vol/grid-solar/sgeusers/admin/DRMAA/DrDobbs-Code/* .

Compile and link the proof of concept source

Note that the linking options differ across the two OS-s in use within ECS

*NetBSD*

% gcc -c -I/usr/pkg/sge/include/ ListingOne.c
% gcc -o  ListingOne \
      -L/usr/pkg/sge/lib/nbsd-i386/ \
      -R/usr/pkg/sge/lib/nbsd-i386/ -ldrmaa ListingOne.o

*ArchLinux*

% gcc -c -I/usr/pkg/sge/include/ ListingOne.c
% gcc -o  ListingOne \
      -L/usr/pkg/sge/lib/lx26-x86/ \
      -Wl,R/usr/pkg/sge/lib/lx26-x86/ -ldrmaa ListingOne.o

You did rememeber to

% need sgegrid

Now we can test that things work

% ./ListingOne 
Successfully started the DRMAA library

Spawning an actual job into the SGE: C Binding

The file drdobbs-shell.c is a slightly modified version of ListingTwo.c, which allows one to specify the script to be executed as a commnad line argument and sets a SGE-native option required to tell SGE to "do the right thing"

Compile the source

Note that the linking options differ across the two OS-s in use within ECS

*NetBSD*

% gcc -c -I/usr/pkg/sge/include/drdobbs-shell.c
% gcc -o drdobbs-shell \
  -L/usr/pkg/sge/lib/nbsd-i386/ \
  -R/usr/pkg/sge/lib/nbsd-i386/ -ldrmaa \
 drdobbs-shell.o

*ArchLinux*

% gcc -c -I/usr/pkg/sge/include/drdobbs-shell.c
% gcc -o drdobbs-shell \
      -L/usr/pkg/sge/lib/lx26-x86/ \
      -Wl,R/usr/pkg/sge/lib/lx26-x86/ -ldrmaa \
 drdobbs-shell.o

Edit, or otherwise replace, the placeholder username "fred" used within the job submission script to match your username

% mv i_am_alive.sh i_am_alive.sh.orig
% sed -e "s/fred/yourusername/g" i_am_alive.sh.orig > i_am_alive.sh
% chmod u+x i_am_alive.sh

As written the DRMAA code that will spawn the job will not pay attention to the directory you are in when you use the DRMAA executable to spawn your job script into the SGE, so we run as follows:

% ~/DRMAA/drdobbs-shell ~/DRMAA/i_am_alive.sh
Your job "/u/students/fred/DRMAA/i_am_alive.sh"has been submitted with id 000000
%

after which you should find the log files from the running of your script in your home directory

% ls -ltr ~
...
drwx------  2 fred  students    512 Oct  6 12:02 DRMAA
-rw-r--r--  1 fred  students      0 Oct  6 12:20 i_am_alive.sh.e000000
-rw-r--r--  1 fred  students  29753 Oct  6 12:20 i_am_alive.sh.o000000
%

Note that, as written, the directive at the top of the job submission script which requests an initial working directory

#$ -wd /vol/grid-solar/sgeusers/fred

has been ignored and inspection of the output file confirms this:

% cat ~/i_am_alive.sh.o000000
==UNAME==
breaker.msor.vuw.ac.nz
==WHO AM I and GROUPS==
uid=0000(fred) gid=25(students) groups=25(students),1500(c302t1)
students c302t1
==SGE_O_WORKDIR==
/home/rialto1/fred/DRMAA
==/VOL/SCRATCH==
...

Spawning an actual job into the SGE: Java Binding

*Has been rewritten for use with the ArchLinux machines*

The modified version of the SGE-provided Howto2.java adds the setting of an SGE-native option required to tell SGE to "do the right thing" and removes the package qualification from the code.

Create a directory DRMAA, change into that directory and copy the exammple codes provided over,

cp /vol/grid-solar/sgeusers/admin/DRMAA/SGE-Code/* .

Compile the Java source against the locally-built DRMAA JAR-file

% javac -cp /vol/grid-solar/sgeusers/admin/DRMAA/lx26-x86 drmaa.jar:. Howto2.java

As written, the DRMAA code will look for the hard-coded script it is going to launch (sleeper.sh) as the SGE job, in your home directory so we copy it there and ensure it is executable, before running the DRMAA code.

You did remember to

% need sgegrid

as well though?

Notice that we need to tell Java to use the locally built dynamic library as well, by defining the search path within the Java environment

% cp sleeper.sh ~
% chmod u+x ~/sleeper.sh 
% java -Djava.library.path=/vol/grid-solar/sgeusers/admin/DRMAA/lx26-86 \
   -cp /vol/grid-solar/sgeusers/admin/DRMAA/lx26-86/drmaa.jar:. Howto2
Your job has been submitted with id 000000
%

after which you should find the log files from the running of your script in your home directory

% ls -ltr ~
...
drwx------  2 fred  students    512 Oct  6 12:02 DRMAA
-rw-r--r--  1 fred  students      0 Oct  6 12:43 Sleeper.e000000
-rw-r--r--  1 fred  students     97 Oct  6 12:43 Sleeper.o000000
% cat ~/Sleeper.o000000
Here I am. Sleeping now at: Wed Oct 6 12:43:19 NZDT 2010
Now it is: Wed Oct 6 12:43:24 NZDT 2010
%

Note that the job submission has taken notice of the script directive requesting that our job have the name "Sleeper"

#$ -N Sleeper

and that this is reflected in the names of the logfiles.

Caveats

Environmental variables now have an SGE_ prefix not GE_

As noticed by many but formally pointed out by Kourosh Neshatian (and, belatedly, Lloyd Parkes):

The Sun documentation and man pages for the Sun Grid Engine (SGE) mention environmental variables of the form GE_SOME_THING.

The docs are out of date with respect to current SGE implementations and users should be using environmental variables of the form SGE_SOME_THING

TechNoteForm
TechNoteTitle Grid Computing: ECS Grid
Description Some notes on using the ECS Grid
Keywords