[glue-wg] Software publishing - TeraGrid use cases

Mon Feb 18 17:48:18 CST 2008

On a GLUE2 call last week or the week before I agreed to review the  
currently
proposed schema for publishing software information and comment on  
whether it
addresses TeraGrid requirements and use cases.

The TeraGrid has been publishing software information in information  
services
since the summer of 2007.  Our goals were to support the following  
use cases:

1) users can discover which compute resources offer a specific  
software package
2) users can discover the version, or versions, of available software  
packages
3) users can discover if a package is in the default login/execution  
environment
    (users or application do not need to do anything to access this  
software)
4) if a package is not in the default login/execution environment,
    how can a user or job access the software package

We support the above use cases by publishing the following attributes  
about the
software available on each compute resource:
- TeraGrid standard software name
- TeraGrid standard software version
- In default environment (yes/no)
- How to access the software:
    - access technique (the TeraGrid currently only uses the  
"softenv" technique)
    - access key (a key/string understood by a technique handler)

Some notes:

Even though we currently only use the "softenv" technique we  
abstracted the schema
a little so we could eventually support other techniques (i.e.  
"modules", "path", ..)

A software component may include multiple binaries, have man page  
directories, have
various library directories, have multiple binary directories (bin/,  
sbin/, etc),
be parallel/non-parallel, threaded/non-threaded, scripted/compiled,  
and have other
arbitrary but relevant user information. We chose not to design or  
implement a schema
complex enough to communicate all this information, but instead  
expect users will
discover such details thru other methods. What we offer is a way for  
users/jobs to
request that a specific piece of software be available in their login  
or execution
environment, and will set all the appropriate environment variables  
for them to make
that software component available (using the access technique and key  
listed above).

Example: User wants to use "mpich-gm" and must know ahead of time how  
to compile
          (mpicc, mpicxx, mpif77, mpif90, etc) and run (mpirun)

          1) a single info service query can return the compute  
resources that offer
          "mpich-gm", which version(s), the access technique and key  
for each available
          "mpich-gm", and the endpoint of the execution service where  
each "mpich-gm"
          is available

          2 the user then submits jobs to the endpoint with the  
information:
            request software "softenv:<an-mpich-gm-key>"
            mpirun <myprogram> <arguments>

          The softenv handler will make sure the libraries, paths,  
and anything else
          needed by the application are configured correctly. The  
user doesn't need
          to know the mpirun binary path because the correct mpirun  
will be first in
          their PATH (and LD_LIBRARY_PATH and other variables will be  
set also)

Comparing the proposed GLUE2 ApplicationEnvironment entity attributes:
    ID
    Name
    Version
    State
    License
    LifeTime
    InstalledRoot
    EnvironmentSetup
    Description

The attributes that would directly map to TeraGrid attributes, or be  
generated
automatically include:
    ID
    Name
    Version
    Description

For the TeraGrid the "EnvironmentSetup" attribute is a tuple  
"EnvironmentSetupMethod"
and "EnvironmentSetupKey".  This could be represented as a compound  
value inside
"EnvironmentSetup" (i.e. softenv:<key>), but it would be better if  
the schema had
the attributes separately. We propose "EnvironmentSetupMethod" and  
"EnvironmentSetupKey".

We currently have no need for License, LifeTime, or InstalledRoot, so  
would suggest that
these attributes all be OPTIONAL.

Lastly the TeraGrid's "in default environment" attribute doesn't map  
to any proposed
GLUE2 attributes. It could be added as an OPTIONAL attribute, or the  
TeraGrid could add
it as a local extension in our implementation of GLUE2.

Regards,

JP