[jsdl-wg] DataStaging concerns

Donal K. Fellows donal.k.fellows at manchester.ac.uk
Fri May 20 05:26:39 CDT 2005


Peter G.Lane wrote:
> Forgive me if I'm reiterating on a topic.  I've only be reading up on 
> JSDL since yesterday.  I have a few concerns about the DataStaging 
> section.  Primarily, I'm wondering if it really makes sense to have it 
> as part of the core schema.  I think it would be better to have 
> extensions like POSIXApplication for more specific DRM configurations.  

We suspect that there's going to be quite a bit of extension in that
area, and welcome feedback for post-1.0 (we're very very unlikely to
change anything for JSDL 1.0 now; it doesn't do everything, but it does
a useful fraction and too many people need something - anything! - now).

> 1) There's still controversy over whether staging should or should not 
> be integrated into a DRM.  As far as I can tell, for example, the BES 
> doesn't have any plans to implement staging.  DRMMA makes this 
> optional.  If BES ends up using JSDL, wouldn't this be a violation of 
> the spec which requires each element to be supported in some way?

"Supported" has a very particular meaning within a JSDL context, and the
effective meaning could include a definite response "I don't know how to
do data staging, man!" We discussed data staging quite a few time
(around a year ago IIRC) and what we came up with is a minimum to allow
processing of jobs on a number of different systems including domains
like cross-cluster deployment where everything has to be shipped in first.

If we'd had a proper workflow language too, we'd have done data staging
differently. But there wasn't something suitable already existing (BPEL
does something else) and if we'd have had to develop our own, we'd still
be arguing about it now.

> 3) I don't particularly like that the DataStaging sections include an 
> option to remove the file at the end of the job.  If I'm staging out 
> data then this doesn't make a whole lot of sense.  I'd much prefer a 
> separate section which explicitly lists all the files that are to be 
> removed from the submission machine after the job has been completed.  
> This would also cover the case of data that is created rather than 
> staged in but still needs to be removed after job completion.

The "remove data at the end of the job" applies after staging it out
(it'd be a bit silly otherwise). It is also the case that you can list a
file in the data staging section and not have it staged in or out, but
just deleted at end-of-job.

> 4) Based on the current GRAM incarnation, it would be nice to let RFT's 
> transfer request description extend a base staging schema and then use 
> that in the JSDL document rather than adding a bunch of extensions to 
> DataStaging.  This is similar to how I'd want to go about using 
> POSIXApplication.

I don't fully grasp what you mean here. The following is legal (modulo
namespaces) according to draft-18:

   <jsdl:DataStaging>
      <jsdl:FileName>example</jsdl:FileName>
      <!-- I think we agreed to default this next element -->
      <jsdl:CreationFlag>jsdl:overwrite</jsdl:CreationFlag>
      <jsdl:Source>
        <!-- I've no idea what your RFT schema might look like -->
        <rft:SynchFileWithSomewhere>...</rft:SynchFileWithSomewhere>
      </jsdl:Source>
   </jsdl:DataStaging>

(Hmm, the non-normative examples in the data-staging section of d18 seem
out of step. Bother.)

Given that the above is legal, what's the problem?

Donal.





More information about the jsdl-wg mailing list