[ogsa-wg] GT 4.0 GRAM docs for input to BES discussions

Thu Mar 10 01:06:29 CST 2005

Here is a compact summary of the GT 4.0 WS-GRAM interface and links to
further documentation.  Please start by reading the documentation
unless you are already an expert on GRAM, XSD, WSDL, and WSRF concepts.

---------------------------------------------------------

GT 4.0 WS-GRAM documentation

Note, these documents are in draft form...

1) GRAM Key Concepts
http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/execution/key/

2) WS-GRAM Approach
http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/execution/key/WS_GRAM_Approach.html

3) Semantics and syntax of WSDL
http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/execution/wsgram/WS_GRAM_Public_Interfaces.html#wsdl

4) Job Description Language
http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/execution/wsgram/schemas/mjs_job_description.html

5) Links to more WS-GRAM docs than you can shake a stick at
http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/execution/wsgram/
Because this is already a lot of documentation, the following is a
terse overview of the WSDL rendering of the core WS-GRAM job
discovery, submission, monitoring, and cancellation interface.

---------------------------------------------------------

PORTTYPE ManagedJobFactoryPortType

The Managed Job Factory Resource (MJFR) represents one localized job
scheduler or compute element.  (In GT 4.0, there is a separate MJFR
for each deployed local scheduler adapter on a host.)

It is a WSRF style resource with a resource properties document to
represent the overall status and capabilities of the local compute
element:

<managedJobFactoryResourceProperties>
   <localResourceManager>xsd:string</localResourceManager>
   <globusLocation>xsd:string</globusLocation>
   <hostCPUType>xsd:string</hostCPUType>?
   <hostManufacturer>xsd:string</hostManufacturer>?
   <hostOSName>xsd:string</hostOSName>?
   <hostOSVersion>xsd:string</hostOSVersion>?
   <scratchBaseDirectory>xsd:string</scratchBaseDirectory>?
   <delegationFactoryEndpoint>wsa:EndpointReferenceType</delegationFactoryEndpoint>
   <stagingDelegationFactoryEndpoint>wsa:EndpointReferenceType</stagingDelegationFactoryEndpoint>?
   <condorArchitecture>xsd:string</condorArchitecture>?
   <condorOS>xsd:string</condorOS>?
   <gluece:GLUECE>
      <gluece:Cluster Name=xsd:string UniqueID=xsd:string InformationServiceURL=xsd:anyURI>
         <SubCluster/>*
	 xsd:any##other*
      </gluece:Cluster>*
      <ComputingElement>
         <Info/>?
	 <State/>?
	 <Policy/>?
	 <Job/>*
	 <AccessControlBase/>?
	 xsd:any##other*
      </ComputingElement>*
      xsd:any##other*
   </gluece:GLUECE>?
   <gluece:GLUECESummary/>?
   <ServiceMetaDataInfo>
      <startTime>xsd:dateTime</startTime>
      <version>xsd:string</version>
   </ServiceMetaDataInfo>
</managedJobFactoryResourceProperties>

These properties allow inspection of the underlying compute
platform using an XSD rendering of the GLUE schema:

   gluece:GLUECE, gluece:GLUECESummary

(Please see the GLUE schema for more information, using ad-hoc GRAM
properties:

   hostCPUType, hostManufacturer, hostOSName, hostOSVersion

while a few provide introspection on the GRAM deployment itself:

   localResourceManager, globusLocation, scratchBaseDirectory,
   ServiceMetaDataInfo

The two EPRs are used by a client to discover where to delegate
credentials that will be referenced by future job submissions:

   delegationFactoryEndpoint, stagingDelegationFactoryEndpoint.

----------

OPERATION job:createManagedJob

Request creation of a Managed Executable Job Resource whose EPR will
be returned in the response.

INPUT

message: createManagedJobInputMessage has one part:

   <createManagedJob>
      <InitialTerminationTime>xsd:dateTime</InitialTerminationTime>?
      <JobID>wsa:AttributedURI</JobID>?
      <wsnt:Subscribe></wsnt:Subscribe>?
      <desc:job> ... </desc:job>
   </createManagedJob>

The optional JobID element is used to request idempotent invocation
semantics in a binding-independent manner.  The optional
wsnt:Subscribe element is used to request automatic subscription to
the newly created Managed Job.

This call can also create a Managed Multi-Job Resource, i.e. a
co-allocated job spread across multiple WS-GRAM hosts, because the job
element is actually in an XSD choice with a multijob element.

OUTPUT

message: createManagedJobOutputMessage has one part:

   <createManagedJobResponse>
      <NewTerminationTime>xsd:dateTime</NewTerminationTime>
      <CurrentTime>xsd:dateTime</CurrentTime>
      <managedJobEndpoint>wsa:EndpointReferenceType</managedJobEndpoint>
      <subscriptionEndpoint>wsa:EndpointReferenceType</subscriptionEndpoint>?
   </createManagedJobResponse>

The optional subscriptionEndpoint is returned if 

-----------

Other operations, composed from the WSRF service environment, are:

[From WS-ResourceProperties] -- get access to factory status information
      GetResourceProperty
      QueryResourceProperties
      GetMultipleResourceProperties

---------------------------------------------------------

PORTTYPE ManagedExecutableJobPortType

A Managed Executable Job Resource (MEJR) represents one job that has
already been submitted by a client.

It is a WSRF style resource with a resource properties document to
represent status of the job:

<managedExecutableJobResourceProperties>
   <stdoutURL>xsd:anyURI</stdoutURL>?
   <stderrURL>xsd:anyURI</stderrURL>?
   <credentialPath>xsd:string</credentialPath>?
   <exitCode>xsd:int<exitCode/>?

   <serviceLevelAgreement>
      <desc:job> ... </desc:job>
   </serviceLevelAgreement>
   <Capacity>xsd:int</Capacity>
   <userSubject>xsd:string</userSubject>
   <fault/>

   <TopicExpressionDialects>xsd:anyURI</TopicExpressionDialects>
   <Topic Dialect=xsd:anyURI>
      xsd:any?
   </Topic>+

   <TerminationTime>xsd:dateTime</TerminationTime>
   <localUserId>xsd:string</localUserId>
   <CurrentTime>xsd:dateTime</CurrentTime>
   <holding>xsd:boolean</holding>
   <RegistrantData>xsd:base64Binary</RegistrantData>
   <RendezvousCompleted>xsd:boolean</RendezvousCompleted>
   <FixedTopicSet>xsd:boolean</FixedTopicSet>
   <state>Unsubmitted|StageIn|Pending|Active|Suspended|StageOut|Cleanup|Done|Failed</state>
</managedExecutableJobResourceProperties>

These properties relate to job output file management:

  stdoutURL, stderrURL

delegated credential management:

  credentialPath

parallel task rendezvous (for MPICH-G2):

  Capacity, RegistrantData, RendezvousCompleted

job status:

  exitCode, fault, holding, state

job introspection:

  credentialPath, serviceLevelAgreement, localUserId

for WSRF introspection:

  TopicExpressionDialects, Topic, FixedTopicSet,
  TerminationTime, CurrentTime.

----------

OPERATION exec:release

Releases job from hold state.  The hold state is an optional behavior
selected in the job description to prevent post-execution file
deletions (clean-up) from occuring while a remote client is still
attempting to access the files.  The release operation permits the
normal clean-up to occur.

INPUT

message: releaseInputMessage has one (empty) part:

   <release/>

OUTPUT

message: releaseOutputMessage has one (empty) part:

   <releaseResponse/>

-----------

In addition, a rendezvous provides an additional operation:

[From GT4 rendezvous manager type] -- support for bootstrapping MPICH-G2 etc.
      register 

Other operations, composed from the WSRF service environment, are:

[From WS-ResourceLifetime] -- schedule termination of a job
      SetTerminationTime
      Destroy

[From WS-ResourceProperties] -- get access to job status information
      GetResourceProperty
      QueryResourceProperties
      GetMultipleResourceProperties

[From WS-BaseNotification] -- subscribe for job status notifications
      Subscribe
      GetCurrentMessage

---------------------------------------------------------

Job description document syntax for use in creating a Managed
Executable Job Resource:

<job>

   <factoryEndpoint>wsa:EndpointReferenceType</factoryEndpoint>?

   <jobCredentialEndpoint>wsa:EndpointReferenceType</jobCredentialEndpoint>?

   <stagingCredentialEndpoint>wsa:EndpointReferenceType</stagingCredentialEndpoint>?

   <localUserId> ... </localUserId> [0..1]

   <holdState> ... </holdState> [0..1]

   <executable>xsd:string</executable>?

   <directory>xsd:string</directory>?

   <argument>xsd:string</argument>*

   <environment>
	<name>xsd:string</name>
	<value>xsd:string</value>
   </environment>*

   <stdin>xsd:string</stdin>?

   <stdout>xsd:string</stdout>?

   <stderr>xsd:string</stderr>?

   <count>xsd:positiveInteger</count>?

   <libraryPath>xsd:string</libraryPath>*

   <hostCount>xsd:positiveInteger</hostCount>?

   <project>xsd:string</project>?

   <queue>xsd:string</queue>?

   <maxTime>xsd:long</maxTime>?

   <maxWallTime>xsd:long</maxWallTime>?

   <maxCpuTime>xsd:long</maxCpuTime>?

   <maxMemory>xsd:nonNegativeInteger</maxMemory>?

   <minMemory>xsd:nonNegativeInteger</minMemory>?

   <jobType>mpi|single|multiple|condor</jobType>?

   <fileStageIn>rft:TransferRequestType</fileStageIn>?

   <fileStageOut>rft:TransferRequestType</fileStageOut>?

   <fileCleanUp>rft:DeleteRequestType</fileCleanUp>?

   <extensions>xsd:any##other</extensions>?

</job>

An extended form of this syntax consists of an array of the above
descriptions to define a "multi-job".

-- 
Karl Czajkowski
karlcz at univa.com