[glue-wg] Software publishing - TeraGrid use cases
JP Navarro
navarro at mcs.anl.gov
Mon Feb 18 17:48:18 CST 2008
On a GLUE2 call last week or the week before I agreed to review the
currently
proposed schema for publishing software information and comment on
whether it
addresses TeraGrid requirements and use cases.
The TeraGrid has been publishing software information in information
services
since the summer of 2007. Our goals were to support the following
use cases:
1) users can discover which compute resources offer a specific
software package
2) users can discover the version, or versions, of available software
packages
3) users can discover if a package is in the default login/execution
environment
(users or application do not need to do anything to access this
software)
4) if a package is not in the default login/execution environment,
how can a user or job access the software package
We support the above use cases by publishing the following attributes
about the
software available on each compute resource:
- TeraGrid standard software name
- TeraGrid standard software version
- In default environment (yes/no)
- How to access the software:
- access technique (the TeraGrid currently only uses the
"softenv" technique)
- access key (a key/string understood by a technique handler)
Some notes:
Even though we currently only use the "softenv" technique we
abstracted the schema
a little so we could eventually support other techniques (i.e.
"modules", "path", ..)
A software component may include multiple binaries, have man page
directories, have
various library directories, have multiple binary directories (bin/,
sbin/, etc),
be parallel/non-parallel, threaded/non-threaded, scripted/compiled,
and have other
arbitrary but relevant user information. We chose not to design or
implement a schema
complex enough to communicate all this information, but instead
expect users will
discover such details thru other methods. What we offer is a way for
users/jobs to
request that a specific piece of software be available in their login
or execution
environment, and will set all the appropriate environment variables
for them to make
that software component available (using the access technique and key
listed above).
Example: User wants to use "mpich-gm" and must know ahead of time how
to compile
(mpicc, mpicxx, mpif77, mpif90, etc) and run (mpirun)
1) a single info service query can return the compute
resources that offer
"mpich-gm", which version(s), the access technique and key
for each available
"mpich-gm", and the endpoint of the execution service where
each "mpich-gm"
is available
2 the user then submits jobs to the endpoint with the
information:
request software "softenv:<an-mpich-gm-key>"
mpirun <myprogram> <arguments>
The softenv handler will make sure the libraries, paths,
and anything else
needed by the application are configured correctly. The
user doesn't need
to know the mpirun binary path because the correct mpirun
will be first in
their PATH (and LD_LIBRARY_PATH and other variables will be
set also)
Comparing the proposed GLUE2 ApplicationEnvironment entity attributes:
ID
Name
Version
State
License
LifeTime
InstalledRoot
EnvironmentSetup
Description
The attributes that would directly map to TeraGrid attributes, or be
generated
automatically include:
ID
Name
Version
Description
For the TeraGrid the "EnvironmentSetup" attribute is a tuple
"EnvironmentSetupMethod"
and "EnvironmentSetupKey". This could be represented as a compound
value inside
"EnvironmentSetup" (i.e. softenv:<key>), but it would be better if
the schema had
the attributes separately. We propose "EnvironmentSetupMethod" and
"EnvironmentSetupKey".
We currently have no need for License, LifeTime, or InstalledRoot, so
would suggest that
these attributes all be OPTIONAL.
Lastly the TeraGrid's "in default environment" attribute doesn't map
to any proposed
GLUE2 attributes. It could be added as an OPTIONAL attribute, or the
TeraGrid could add
it as a local extension in our implementation of GLUE2.
Regards,
JP
More information about the glue-wg
mailing list