[jsdl-wg] Question on file stage-in (fwd)

Andrew Grimshaw grimshaw at cs.virginia.edu
Thu Jun 9 08:27:42 CDT 2005


Donal,
See my embedded comments <AG> ... </AG>

-----Original Message-----
From: Donal K. Fellows [mailto:donal.k.fellows at manchester.ac.uk] 
Sent: Wednesday, June 08, 2005 11:04 AM
To: Andrew Grimshaw
Cc: JSDL WG
Subject: Re: [jsdl-wg] Question on file stage-in (fwd)

Andrew Grimshaw wrote:
> In the data staging elements there is a creation flag that indicates
whether
> to over-write or append.
> 
> Often, one only wants to overwrite if the target and the source are
> different, for example if I am using a large data set that changes
> infrequently or several jobs on the same resource will use the "same"
input
> file. 
> 
> Is there a way to do conditional stage in?

My understanding is that the implementation is free to use some
mechanism outside the JSDL scope to optimize downloads. From the JSDL
perspective (not that this is the only valid perspective, of course) it
doesn't matter all that much. Looking at the latest version of the spec
though, I see that the value "jsdl:dontOverwrite" could be interpreted
to mean "transfer only if not pre-existing". Of course, at that point
you then have to worry about whether the two ends refer to the same
data, but if you're piling the data into some job-specific dir that's
IMO not a huge issue.

<AG>
Exactly, if I specify "don't overwrite" AND the base data has in fact
"changed" ... then I may WANT to overwrite.
</AG>



> I know the document says
> 
> "More complex file transfers, for example, conditional transfers based on
> job termination status are out of scope. "
> 
> But we're talking about a major performance optimization.

Well, perhaps. But if we add it, we have to worry about systems that
don't have a mechanism for doing this sort of thing in the first place.
Indeed, the JSDL 1.0[*] spec leaves out many things that could be major
performance wins (e.g. compressed data transfers) either because they're
complicated in their own right, or because we decided to get a spec
going *this* decade as opposed to the next one. :^) None of which means
that we will not go back and revisit these issues once there is some
more data and experience reports available on actual deployed
implementations. In particular, as it is a spec put together by mainly
compute guys, we know that the data-related part is in need of fleshing
out in the future.

We've not reached the end of the story. Maybe just the end of the
chapter instead. ;^)

<AG>
I'm not suggesting changing anything at this late date ... though it will
likely come up later.
</AG>

Donal.
[*] As a side note, it should be done just about now as I write this. :)





More information about the jsdl-wg mailing list