Technical Note - Grid Computing: ECS Grid
IMPORTANT
If you follow the insructions here and something does not work, please let the programing staff know
READ THIS 2011
There were some siginificant changes to the ECS/SGE Grid at the start of 2011 academic
year. See the section below.
Summary
There didn't seem to be any ECS-focused documentation aimed at the users wishing to run jobs
on the ECS Grid, so here are the basics of job submission, this being the area in which things differ
most from the provider's documentation
http://dlc.sun.com/pdf/820-0699/820-0699.pdf. Other aspects of job control are covered within that
documentation.
Note: After Oracle brought Sun, they stopped providing the documentation at that link.
Details
General
ECS administers two "Grids" known, in administration circles, as the
ECS Grid and the
SCS Grid.
You may also find these referred to as the
SGE Grid and the
Condor Grid respectively.
The two are seperate.
The ECS Grid runs under the control of a Sun Grid Engine (
SGE) and exists to make use of the computing
power of the School's NetBSD machines at times when they are unused, ie when they should have no one
logged in at the console, eg overnight.
Jobs, usually shell scripts wrapping a number of tasks, are submitted from any ECS workstation into a
simple queuing system where they remain, in turn, until an "unused" machine is able to start them.
(Note that being
able to start them is not the same as being able to run them to completion).
At present, users at the console of a machine have priority over Grid jobs running on the same machine to
the extent that a Grid job will be suspended upon a machine where there is console activity, so users submitting
Grid jobs should be aware that there is no guaranteed run time for any given task.
Basically,
it'll finish when it finishes
Setting up the environment
A single SGE instance can control a number of "Grids". In order to provide the SGE utilities with information
about which Grid the user wishes to run their job within, a couple of environmental variables need to be set
up. This is achieved using the standard package system's
need pkgname environment
modifiying process.
We'll be using the
SGE Grid (ECS also maintains the
SCS Grid not accessible from here), so a simple
need sgegrid
suffices to set up the environment for job submission.
Do I have a home on the Grid?
This is slightly quirky and not initially intuitive.
Staff will not be able to access their home directories
Students will
A user of an SGE-controlled Grid might expect to find that their jobs start to execute from a
home directory within the overall system, that home directory being accessible to all machines
within that system and, for the case of a grid utilising their everyday machines, being their
normal
working directory after logging in to any of those machines interactively.
A simple
qsub submission_script_name would then be enough to start the job off.
With the ECS Grid, however, despite all staff and students having a home directory no matter which
ECS machine they might login at, this is only the case for student accounts.
Because the machines comprising the Grid system can be any of the School workstations,
both individuals' office machines and public access lab machines, staff will not see their home
directories accessible from a remote machine when running a Grid job and so will have to
explicitly set an initial working directory elsewhere.
This can be achieved on the command line at submission time, by use of the
-wd path option to
qsub
though perhaps a better option for staff is to always place the equivalent SGE directive at the top of the submission
script to a known path.
#$ -wd path
Of course, non-staff users may also find this mechanism useful.
At the time of this revision of this document, there is a small area of accessible filestore that
could be used to create a standard area from which staff could reliably start SGE jobs,
/vol/grid/sgeusers/username (Now
/vol/grid-solar/sgeusers/username see below)
but until that becomes a standard that staff can rely upon, staff should follow the guidelines
pertaining to the location of input and output files for Grid jobs.
You need to ask for this area to be created for you.
May 2010: New filesystem for grid job data
As of May 2010, all of the machines participating in the ECS Grid will have access to an area of filestore
that is much larger in size than either of the two existing filesystems
/vol/scratch and
/vol/grid/sgeusers/username
The new area should be used in preference to either of the above.
The new area is a mount of some filestore supplied by ITS and users of the ECS grid will find that they
have a personal directory within that filesystem, accessible as:
/vol/grid-solar/sgeusers/username
If you have not used the grid in a while, you may not have a directory there. Just ask.
Where will the input and output files be?
Because this is a distributed batch processing environment, there's usually no clear indication as to
which machine(s) your job(s) will end up running on.
You thus need to give a little more thought to the location of input and output files than if you were
simply running a job on your own workstation where everything is local to the machine (though, who
knows, your Grid job might end up running on your workstation).
Once the job is running on the remote machine it will have access to many of the NFS shared
filesystems that that the user would expect to see from their own workstation during an
interactive session.
This can be useful when large data sets need to be accessible for reading and the overhead
of copying the data to each machine upon which the job is running is large, because they
can be placed at known paths.
The NFS-shared filesystems are less of an advantage for writing out data.
- Writing over NFS is often slower than reading
- There is the potential for bottlenecks to occur where a user has each job writing to the same directory (or even file, if they get things wrong) over NFS, or where many users each have jobs writing to the same NFS partition.
It is thus advisable to arrange for any output from the program to be written to a directory local
to the machine upon which the program is running and then to copy any output to filestore to
which the user will have general access, at the end of the job.
The area of filestore provided for this purpose upon every ECS Grid machine is the directory
/local/tmp
Note that this directory may well be being used by the user who normally sits at the console, and will
almost certainly be used by Grid jobs that came before your current one and those that come after yours,
and so there is no guarantee that a path or file name that you wish to create does not exist already.
To avoid any clashes; as a courtesy to other users, and to simplify the process of cleaning up
afterwards, it is advisable to write files from the currently running job into a directory below the
path
/local/tmp that follows the convention
/local/tmp/[username]/$JOB_ID
where
$JOB_ID is an environmental variable maintained by the SGE for the duration of
the job which will thus be available to your submission script and to any programs able to read
the environment.
This directory can be created from within your job submission script by use of the command
mkdir -p /local/tmp/[username]/$JOB_ID
Preserving results after execution
Once your job has run and the submission script has terminated, any output written to
/local/tmp/[username]/$JOB_ID
will only be accessible on the machine upon which the job ran.
In order to get your output back to somewhere more useful to you, there are a number of options:
- If you have direct access to your home directory path, you can copy directly to that. Staff can't do this
- If you have write access to a shared filesystem you can copy directly to that and then move files into your own filestore from your own machine
Staff will need to exercise Option 2 and will find they have access to create directories below
/vol/scratch
Directory creation within
/vol/scratch should follow the guidelines given for
/local/tmp above.
There is a second area of shared filestore that is probably too small to be of much use and which
requires explicit creation of a directory to /read from/write into/ anyway, but which is mentioned
here for completeness.
/vol/grid/sgeusers/
Cleaning up
If you have done the decent thing and created a directory specific to the current job, then after
you have finished the job, including copying data to a more permanent storage area, it would
be a benefit to the Grid facility as a whole if your job script removed any files created during
the run on the remote machine.
This can be achieved, assuming you have followed the guidelines above by placing this command
at the end of your job submission script
rm -fr /local/tmp/[username]/$JOB_ID
DANGER, Will Robinson! Cleaning up can be
dangerous:you might be running on
your own machine
where you have permissions to do
lots of stuff.
As an example, it might be tempting to remove everything below the path starting from your username, vis:
rm -fr /local/tmp/[username]
however, if you end up having your job run on your own workstation, whilst you were logged off from it, and
were in the habit of using
/local/tmp as a place for temporary files, you may find that you do not simply
delete files from the current run.
Similarly, were two of your jobs to run on the same machine at the same time, as could well be the case for
multi-processor machines, you may end up removing files from a job still running.
To reiterate the point then, placing all the files related to your current job below a directory designated for
that job is likely to be of benefit to you and not just others.
Where do stdin, stdout and stderr appear
When one runs programs locally, program output and error messages will often appear
on the console, or in the terminal emulator, and one can usually perform command line
redrection for input.
When you are running a non-interactive job on a remote machine however, it is likely that
you aren't going to see any console output during the execution of the program.
The SGE therefore redirects the
stdout and
stderr channels to file so that they may be
inspected after the job has finished.
Typically the default
stdout and
stderr naming conventions try to create files called
scriptname.o$JOB_ID and
scriptname.e$JOB_ID respectively, in the working directory
of the task when it starts. (
See the note about working directories for staff)
The default location of these files can be altered by use of
qsub command-line options or via
the corresponding SGE directives being specified in the job submission script.
Job submission script example
Basic jobs just need to run on some machine within the Grid. If you know that you want a certain type
of processor or need a minimum amount of memory or disk space, then you probably know enough to
create the relevant submission script, or at least be capable of reading the Sun documentation for
more information.
The job submission script
freds_test.sh, used as an example here, is effectively then, just a simple
test of the system.
(
though running a simple test to check that the basics, eg, directories existing, before you submit 5000 jobs that write to them is GOOD thing)
There are, however, a couple of advancements over a simple
fire and forget activity.
We've taken the view that you'll want to know when your job starts and finishes and so the example
job submission script will tell the Grid Engine to email you when it does.
We've taken the view that you'll want to script using the Bourne Shell so we'll force the SGE to run your
job submission script within that shell, because the initial line's
#!/bin/sh may not be honoured.
We've taken the view that you'll be doing something more than just adding a couple of integers together
in a loop (and, no, adding a couple of thousand integers together still doesn't count) so we'll try and access
some areas of the filestore you will have access to when you run your jobs and move things around.
Finally, we've written the example for an ECS user with the username
fred and the mail address
Fred.Bloggs@ecs.vuw.ac.nz, so you might need to change a few things in the script if you are
not Fred Bloggs and/or not in ECS.
(HINT: Search for the strings
fred,
Fred.Bloggs,
FRED and
ecs).
BIGGER HINT ADDED
after someone tried to mail
Fred.Bloggs@ecs.vuw.ac.nz and Fred was not happy getting all the
emails.
Search for the strings
fred,
Fred.Bloggs,
FRED and
ecs and
change to match
your username,
email address and
school
And of course, this is just an example. Once you have modifed the scirpt to suit your needs
and run a few tests to check things work as you expect then, you will probably want to remove
some, if not all, of the recording of the environment and so on - but that's up to you.
Sometimes, having the info can be useful to a debugging exercise, eg when you are trying
to invoke something not on the PATH because you are effectivley logging into a non-interactive
environment: sometimes, all the extra clutter makes it hard to see what's happening.
That "extra clutter" should also include the individual job emails you will get, so it is usually
worth electing not to be informed when running large numbers of jobs.
A basic job submission script
#!/bin/sh
#
# Force Bourne Shell if not Sun Grid Engine default shell (you never know!)
#
#$ -S /bin/sh
#
# I know I have a directory here so I'll use it as my initial working directory
#
#$ -wd /vol/grid-solar/sgeusers/fred
#
# Mail me at the b(eginning) and e(nd) of the job
#
#$ -M Fred.Bloggs@ecs.vuw.ac.nz
#$ -m be
#
# End of the setup directives
#
# Stdout from programs and shell echos will go into the file scriptname.o$JOB_ID
# so we'll put a few things in there to help us see what went on
#
echo ==UNAME==
uname -n
echo ==WHO AM I and GROUPS==
id
groups
echo ==SGE_O_WORKDIR==
echo $SGE_O_WORKDIR
echo ==/VOL/SCRATCH==
ls -ltr /vol/scratch/
echo /LOCAL/TMP
ls -ltr /local/tmp/
echo ==/VOL/GRID==
ls -l /vol/grid-solar/sgeusers/
#
# OK, where are we starting from and what's the environment we're in
#
echo ==RUN HOME==
pwd
ls
echo ==ENV==
env
echo ==SET==
set
#
echo == WHATS IN LOCAL/TMP ON THE MACHINE WE ARE RUNNING ON ==
ls -ltra /local/tmp | tail
#
# Now let's do something useful, but first create a directory specific to this job
# and copy something we already know exists into it
#
mkdir -p /local/tmp/fred/$JOB_ID
#
# Check we have somewhere to work now and if we don't, exit nicely.
# We could do more to try and run here but this is just a test
#
if [ -d /local/tmp/fred/$JOB_ID ]; then
cd /local/tmp/fred/$JOB_ID
else
echo "There's no job directory to change into "
echo "Here's LOCAL TMP "
ls -la /local/tmp
echo "AND LOCAL TMP FRED "
ls -la /local/tmp/fred
echo "Exiting"
exit 1
fi
#
# Now we are in the job-specific directory so
#
echo == WHATS IN LOCAL TMP FRED JOB_ID AT THE START==
ls -la
#
# Copy the input file ot the local directory
#
cp /vol/grid-solar/sgeusers/fred/krb_tkt_flow.JPG .
echo ==WHATS THERE HAVING COPIED STUFF OVER AS INPUT==
ls -la
#
# Note that we need the full path to this utility, as it is not on the PATH
#
/usr/pkg/bin/convert krb_tkt_flow.JPG krb_tkt_flow.png
#
echo ==AND NOW, HAVING DONE SOMTHING USEFUL AND CREATED SOME OUTPUT==
ls -la
#
# Now we move the output to a place to pick it up from later and clean up
# (really should check that directory exists too, but this is just a test)
#
mkdir -p /vol/grid-solar/sgeusers/fred/$JOB_ID
cp krb_tkt_flow.png /vol/grid-solar/sgeusers/fred/$JOB_ID
#
# Do the cleaning up from our starting directory
#
echo ==CLEANING UP==
cd /vol/grid-solar/sgeusers/fred
pwd
rm -fr /local/tmp/fred/$JOB_ID
echo ==WHATS LEFT IN MY LOCAL TMP==
ls -ltra /local/tmp/fred | tail
#
echo "Ran through OK"
Emailed output
Fred will see an email message like this when the job starts:
Subject: Job 341642 (freds_test.sh) Started
Job 341642 (freds_test.sh) Started
User = fred
Queue = GX755
Host = lumiere.ecs.vuw.ac.nz
Start Time = 03/18/2009 16:20:54
and one like this when it ends:
Subject: Job 341642 (freds_test.sh) Complete
Job 341642 (freds_test.sh) Complete
User = fred
Queue = GX755@lumiere.ecs.vuw.ac.nz
Host = lumiere.ecs.vuw.ac.nz
Start Time = 03/18/2009 16:20:54
End Time = 03/18/2009 16:20:55
User Time = 00:00:00
System Time = 00:00:00
Wallclock Time = 00:00:01
CPU = NA
Max vmem = NA
Exit Status = 0
Specialised job summission
As previously detailed, basic jobs just need to run on some machine within the Grid.
There may, of course, be classes of job where ensuring that all tasks run on the same architecture is
desirable, an example within ECS being a desire to ensure run timings were not influenced by differences
in the model of machine that individual tasks were executed on.
Similarly, temporary resource partitioning requests that ensure students in a lab tutorial can target the machines
in the lab that has been booked for them, require a
handle through which the user can access a subset
of the full SGE Grid.
A number of the SGE utilities, including
qsub allow for a
resource request list to be defined by
use of the
-l resource=value
where the
resources are maintained within the
SGE Complex
Currently, the local additions (some of which may not always be populated) to the
SGE Complex are
ecs_df_local
ecs_model
ecs_netgroup
ecs_room
so a user wishing to target only those machines which are the model
GX745 would need to add
-l ecs_model=GX745
to their
SGE command.
Update 2011: Changes to the ECS/SGE Grid
Following the introduction of this year's new hardware, some of the ECS/MSOR machines
will be running a GNU/Linux variant operating system (OS) as opposed to the previous
NetBSD OS.
The primary effect of this for the ECS Grid is that programs you've
compiled (or obtained) for the NetBSD OS are unlikely to run
should your grid job end up being scheduled to a GNU/Linux machine,
and vice versa, once you start compiling on the new platform.
However, platform neutral stuff, eg school-wide packages you run in
batch mode, or Java programs, should be less affected, if at all.
Two of the better approaches for handling the change are:
- specifically requesting the OS you want when submitting the job
- creating multiple copies of your "compiled" codes and testing for the OS so as to invoke the correct binary for the machine your grid job runs on.
The first approach can be achieved by using qsub's -l argument.
A qsub command targetting NetBSD machines might now thus be:
qsub -l arch=nbsd-i386 your_script.sh
or to target the ArchLinux machines,
qsub -l arch=lx26-x86 your_script.sh
With the second approach, you could check for the OS your job ends up
running on, in the submission script, and then branch so as to run
the appropriate binary.
You could differentiate the binaries by using directories or
filename extensions.
The SGE will actually set an environmental variable for you to test against,eg
SGE_ARCH=nbsd-i386
You should also have access to the utility that SGE uses to provide
its own view of things, on all the machines in the grid, as
/usr/pkg/sge/util/arch
an invocation of which will return
nbsd-i386 or
lx26-x86
So, for example, you might have, if you choose to use the value
that SGE would return to differentitate, directories containing
OS-specific binaries with the same name:
/vol/grid-solar/username/mycodes/bin/nbsd-i386/prog1
/vol/grid-solar/username/mycodes/bin/lx26-x86/prog1
or these programs, where the names differentiate them:
/vol/grid-solar/username/mycodes/bin/prog1.nbsd-i386
/vol/grid-solar/username/mycodes/bin/prog1.lx26-x86
Here is a template (in Bourne shell syntax) that will allow you
to branch your submission script, your commands would go where
the "I could run" echo statements appear:
if [ -z "$SGE_ARCH" ]; then
echo "Can't determine SGE ARCH"
else
if [ "$SGE_ARCH" = "nbsd-i386" ]; then
echo "I could run a NetBSD binary"
fi
if [ "$SGE_ARCH" = "lx26-x86" ]; then
echo "I could run a Linux x86 binary"
fi
fi
and a similar version for the C shell syntax (though it is a good idea to
write your job submission scripts in Bourne shell syntax)
if ( $?SGE_ARCH == 0 ) then
echo "Can't determine SGE ARCH"
else
if ( $SGE_ARCH == "nbsd-i386" ) then
echo "I could run a NetBSD binary"
endif
if ( $SGE_ARCH == "lx26-x86" ) then
echo "I could run a Linux x86 binary"
endif
endif
Compilation for the mixed environment
One potential "gotcha", brought to light by Aaron Scoble, is that, following the upgrade, there is no labb
access to a NetBSD machine that allows you to compile new codes that let you target NetBSD machines
in the ECS Grid.
You'll thus need to login to a server, eg,
greta-pt.
Similarly, if you are someone without access to the ECS lab machines, you'll not have access to
an
ArchLinux machine on which to compile code targetting those grid resources.
In this case, however, you should find that binaries compiled for
i386 (so not
x86_64) on
other GNU/Linux machines you have access to may work.
If you experience other problems in this area, please get in touch with us.
Running Java programs on the ECS/SGE Grid
Running Java programs on the ECS/SGE Grid has always been compilcated by the fact that
to set up the full Java environment you would get when running interactively, using a
need javaXYZ
command, you did not have access to the
need facility from within the default Grid environment.
With the introduction into the School of the
ArchLinux machines, currently (May 2011) running alongside
the older
NetBSD boxes, it seems worth providing some basic guidelines that should allow many Java
programs to operate across as many machines as possible.
With a bit of guinea-pig'ing by
Roman Klapaukh, it would appear that the following stanza should allow
one to submit Java programs to the ECS Grid, without worrying about the OS your job ends up running
against.
Note that this solution uses the mechanism outlined above for determining the OS and thus, should
you need to branch any other operations on that test, you could combine them.
Note also that you're probaly not the user
fred so YOU WILL NEED TO EDIT THE SCRIPT
#!/bin/sh
#
# Force Bourne Shell if not Sun Grid Engine default shell (you never know!)
#
#$ -S /bin/sh
#
# I know I have a directory here so I'll use it as my initial working directory
#
#$ -wd /vol/grid-solar/sgeusers/fred
#
if [ -z "$SGE_ARCH" ]; then
echo "Can't determine SGE ARCH"
else
if [ "$SGE_ARCH" = "nbsd-i386" ]; then
JAVA_HOME="/usr/pkg/java/jdk-1.6.0"
fi
if [ "$SGE_ARCH" = "lx26-x86" ]; then
JAVA_HOME="/usr/pkg/java/sun-6"
fi
fi
if [ -z "$JAVA_HOME" ]; then
echo "Can't define a JAVA_HOME"
else
export JAVA_HOME
PATH="/usr/pkg/java/bin:${JAVA_HOME}/bin:${PATH}"; export PATH
java Hello
fi
Using DRMAA with the ECS/SGE Grid
This section provides some information and typical commands required
to compile and run codes making use of DRMAA.
Background
Some simple source code examples of using DRMAA via the C and Java
bindings have been placed below:
/vol/grid-solar/sgeusers/admin/DRMAA
as an introduction to users wishing to experiment with DRMAA codes.
The C example sources originally come from a Dr Dobbs Journal article
(2004, Frederic Pariente)
http://www.ddj.com/184405932
which can seemingly still be found at:
http://www.drdobbs.com/184405932
though an archived version, less the Flash adverts, of the original
article is provided locally:
/vol/grid-solar/sgeusers/admin/DRMAA/DrDobbs-2004-Article/
The Java example sources come from the SGE source code distribution,
although a small change is required in order to have the codes work
as expected.
C Bindings
The C bindings make use of the header file
/usr/pkg/sge/include/drmaa.h
and the default SGE shared library
/usr/pkg/sge/lib/nbsd-i386/libdrmaa.so.1.0
Java Bindings
The Java bindings make use of a locally-compiled JAR-file and dynamic
library
/vol/grid-solar/sgeusers/admin/drmaa.jar
/vol/grid-solar/sgeusers/admin/libdrmaa.so
which required a rebuild from the SGE sources.
Simple, proof of concept example: C Binding
Create a directory
~/DRMAA, change into that directory and copy
the exammple codes provided over,
% cp /vol/grid-solar/sgeusers/admin/DRMAA/DrDobbs-Code/* .
Compile and link the proof of concept source
Note that the linking options differ across the two OS-s in use within ECS
*NetBSD*
% gcc -c -I/usr/pkg/sge/include/ ListingOne.c
% gcc -o ListingOne \
-L/usr/pkg/sge/lib/nbsd-i386/ \
-R/usr/pkg/sge/lib/nbsd-i386/ -ldrmaa ListingOne.o
*ArchLinux*
% gcc -c -I/usr/pkg/sge/include/ ListingOne.c
% gcc -o ListingOne \
-L/usr/pkg/sge/lib/lx26-x86/ \
-Wl,R/usr/pkg/sge/lib/lx26-x86/ -ldrmaa ListingOne.o
You did rememeber to
% need sgegrid
Now we can test that things work
% ./ListingOne
Successfully started the DRMAA library
Spawning an actual job into the SGE: C Binding
The file
drdobbs-shell.c is a slightly modified version of
ListingTwo.c,
which allows one to specify the script to be executed as a commnad line
argument and sets a SGE-native option required to tell SGE to "do the right
thing"
Compile the source
Note that the linking options differ across the two OS-s in use within ECS
*NetBSD*
% gcc -c -I/usr/pkg/sge/include/drdobbs-shell.c
% gcc -o drdobbs-shell \
-L/usr/pkg/sge/lib/nbsd-i386/ \
-R/usr/pkg/sge/lib/nbsd-i386/ -ldrmaa \
drdobbs-shell.o
*ArchLinux*
% gcc -c -I/usr/pkg/sge/include/drdobbs-shell.c
% gcc -o drdobbs-shell \
-L/usr/pkg/sge/lib/lx26-x86/ \
-Wl,R/usr/pkg/sge/lib/lx26-x86/ -ldrmaa \
drdobbs-shell.o
Edit, or otherwise replace, the placeholder username
"fred" used within
the job submission script to match your username
% mv i_am_alive.sh i_am_alive.sh.orig
% sed -e "s/fred/yourusername/g" i_am_alive.sh.orig > i_am_alive.sh
% chmod u+x i_am_alive.sh
As written the DRMAA code that will spawn the job will not pay attention
to the directory you are in when you use the DRMAA executable to spawn
your job script into the SGE, so we run as follows:
% ~/DRMAA/drdobbs-shell ~/DRMAA/i_am_alive.sh
Your job "/u/students/fred/DRMAA/i_am_alive.sh"has been submitted with id 000000
%
after which you should find the log files from the running of your script
in your home directory
% ls -ltr ~
...
drwx------ 2 fred students 512 Oct 6 12:02 DRMAA
-rw-r--r-- 1 fred students 0 Oct 6 12:20 i_am_alive.sh.e000000
-rw-r--r-- 1 fred students 29753 Oct 6 12:20 i_am_alive.sh.o000000
%
Note that, as written, the directive at the top of the job submission
script which requests an initial working directory
#$ -wd /vol/grid-solar/sgeusers/fred
has been ignored and inspection of the output file confirms this:
% cat ~/i_am_alive.sh.o000000
==UNAME==
breaker.msor.vuw.ac.nz
==WHO AM I and GROUPS==
uid=0000(fred) gid=25(students) groups=25(students),1500(c302t1)
students c302t1
==SGE_O_WORKDIR==
/home/rialto1/fred/DRMAA
==/VOL/SCRATCH==
...
Spawning an actual job into the SGE: Java Binding
*Has been rewritten for use with the ArchLinux machines*
The modified version of the SGE-provided
Howto2.java adds the setting of
an SGE-native option required to tell SGE to "do the right thing" and
removes the
package qualification from the code.
Create a directory DRMAA, change into that directory and copy
the exammple codes provided over,
cp /vol/grid-solar/sgeusers/admin/DRMAA/SGE-Code/* .
Compile the Java source against the locally-built DRMAA JAR-file
% javac -cp /vol/grid-solar/sgeusers/admin/DRMAA/lx26-x86 drmaa.jar:. Howto2.java
As written, the DRMAA code will look for the hard-coded script it is going
to launch (
sleeper.sh) as the SGE job, in your home directory so we copy
it there and ensure it is executable, before running the DRMAA code.
You did remember to
% need sgegrid
as well though?
Notice that we need to tell Java to use the locally built dynamic library
as well, by defining the search path within the Java environment
% cp sleeper.sh ~
% chmod u+x ~/sleeper.sh
% java -Djava.library.path=/vol/grid-solar/sgeusers/admin/DRMAA/lx26-86 \
-cp /vol/grid-solar/sgeusers/admin/DRMAA/lx26-86/drmaa.jar:. Howto2
Your job has been submitted with id 000000
%
after which you should find the log files from the running of your script
in your home directory
% ls -ltr ~
...
drwx------ 2 fred students 512 Oct 6 12:02 DRMAA
-rw-r--r-- 1 fred students 0 Oct 6 12:43 Sleeper.e000000
-rw-r--r-- 1 fred students 97 Oct 6 12:43 Sleeper.o000000
% cat ~/Sleeper.o000000
Here I am. Sleeping now at: Wed Oct 6 12:43:19 NZDT 2010
Now it is: Wed Oct 6 12:43:24 NZDT 2010
%
Note that the job submission has taken notice of the script directive
requesting that our job have the name
"Sleeper"
#$ -N Sleeper
and that this is reflected in the names of the logfiles.
Caveats
Environmental variables now have an SGE_ prefix not GE_
As noticed by many but formally pointed out by Kourosh Neshatian (and, belatedly, Lloyd Parkes):
The Sun documentation and man pages for the Sun Grid Engine (
SGE) mention environmental
variables of the form
GE_SOME_THING.
The docs are out of date with respect to current
SGE implementations and users should be using
environmental variables of the form
SGE_SOME_THING