[dmis-bof] Comments on the charter?

Wed Dec 14 09:46:48 CST 2005

> -----Original Message-----
> From: Peter Kunszt [mailto:Peter.Kunszt at cern.ch] 
> Sent: Wednesday, December 14, 2005 8:30 AM
> To: allcock at mcs.anl.gov; Hiro Kishimoto
> Cc: dmis-bof at ggf.org; Ian Foster
> Subject: RE: [dmis-bof] Comments on the charter?
> 
> 
> hi bill
> 
> here it comes  - my dreaded comments ;-) 
> 
> >  - I changed the name to OGSA-DMI
> 
> i don't like that. having my EGEE hat on, please consider 
> that we cannot
> adapt and use OGSA in any of the near future (next year) as everything
> is set up for production pretty much now. there is no time 
> for us to migrate
> to OGSA quickly and to deploy and use it. there are lots of 
> question marks
> concerning especially the security infrastructure and how 
> that will interoperate
> with classical transport-level security.
> 
> so to recap: we would very much appreciate a pure WS-I 
> interface standardization
> first, where we can talk cleanly about interfaces and 
> semantics and we would not
> clog our discussions about which semantics of WSRF should be 
> applied to what. we
> could focus on the transfer specs. then someone can take the 
> spec and 'ogsafy' it.
> so please consider to keep this just as DMIS-WG. (sorry hiro 
> - i have to be pragmatic here)

As I said in the BOF, I suspected this was going to be the major issue.  You
have FTS which is pure WS-I and we have RFT which is WSRF.  Hiro suggested
that we do a functional spec, then have two (or more) operational specs, one
to WS-I, one to WSRF.  I believe this is what SRM has done, also.  However,
that doesn't really solve the interoperability problem.  I think we could
probably agree on the submission format, but if we don't implement all the
operations to poll for state that you want, your clients wont work with our
service and if you don't implement the RPs that we want, then our clients
wont work with your services.

> 
> >  - we need to address naming.  What will we accept as source 
> > and destination names? URLs? URIs? any string? EPRs?
> 
> URL seems like a good choice, this would probably suit all use cases.
> this is what we use in the GSM-WG. we have storage URLs.
> an EPR is nothing more than a decorated URL so a simple URL would
> suit that too.

URLs work fine for files, and I *suspect* it will be what we use for V1.0,
but if we take our file blinders off and try and think about source other
than files, i.e. a service that is virtualizing some non-file data source,
we might want EPRs.  This is related to the next issue.

> 
> >  - There is a general issue which will affect a lot of this 
> > and that is just extensible WSDL.  How do we allow parameters 
> > to change when options in the WSDL change.
> 
> the pragmatic approach is to increase the minor version number
> for compatible changes and the major one for incompatible changes.
> however, there are alternatives that we have investigated also in
> the GSM-WG: it is possible to leave an extensibility hook in the WSDL,
> so that new functionality can be tried on existing services, before
> it is migrated into the mainstream WSDL. this is a good idea also
> for interoperating between versions.

Are you suggesting that we simply keep growing the WSDL with optional
blocks, one for every variant of any option?  Here is my concern.  We want
this to be transport agnostic.  That means it should be able to support
GridFTP, HTTP, bbftp, bbcp, scp, etc, and GridFTP at least, could run over
TCP or UDT and we hope to be adding others.  If I understand what you are
suggesting then the WSDL would have a different, optional block for every
one of those options.  I guess that is an option, but the WSDL will get huge
and I wonder what kind of problems might be caused by this thing growing out
of control.

I would be interested in more of a description of the extensibility hook you
mentioned.  I don't know if WSDL can support this or not, but what I would
like to see would be some method of querying the service for options it
supports.  You might get back something that looks like this:

<Transports_Supported>
  <Transport>GridFTP-TCP</Transport>
  <Transport>GridFTP-UDT</Transport>
  <Transport>HTTP-TCP</Transport>
  <Transport>bbftp</Transport>
  <Transport>bbcp</Transport>
  <Transport>scp</Transport>
</Transports_Suppported> 

Then in the WSDL there is a <Transport_Chosen> tag, which would contain one
of the above choices, and based on that, it could selectively load and
validate against the appropriate schema.  The nice thing about this is, a
site can deploy its own custom baked transport (or anything else) without
causing the "standard WSDL" to fail validation.

I guess now that I think about it, your way works too.  As long as the query
mechanism is there, as long as you don't send a block the site doesn't
support, you wont fail validation... I don't like it, but unless we discover
a bigger problem than that, I guess it is an option :-).

> >  - I think we all agree that this needs to be transport 
> > agnostic, but we need to figure out how best to implement 
> > that (related to the WSDL question)
> 
> unless i'm mistaken, doc/literal should just do the trick..?

Yes, if we support the "WSDL with every possible option it" approach.

> 
> >  - what statement do we want to make (and does the OGSA data 
> > architecture
> > need) about delivery semantics
> 
> i think we need to make as detailed statements as possible/reasonable!
> we need to define what states the transfer service exposes to 
> the client
> and what failure modes the client has to expect.

I agree, I phrased the question wrong.  It should have been "what delivery
semantics do we want to support".  I.e., do we want at least once, exactly
once, at most once, non-repudiation, etc..

> 
> >  - what about scheduling / planning aspects?  Do we want to 
> > include elements in this WSDL that specify rate (bandwidth), 
> > quantity (file size), and timing (START BY, FINISH BY, etc)
> 
> our experience: individually for each transfer job: no. that 
> is very complicated and of questionable use.
> on the scale of the service itself as a service 
> parameter/configuration: yes.

I half agree with you.  Clearly the service, may want to have limits that
can be set.  In that regard, it is acting as a resource manager and
protecting the resource for which it is responsible, much like a compute
scheduler.  Btw, I would argue that this is outside the scope of this
working group, since an external entity has no need to be involved with
that.  At most we might want to agree on some state that such a service MAY
(in the SPEC sense of the word) expose.

However, it is not clear to me that requests per job are of questionable
use.  Clearly they should be optional elements, but if the submission has no
such information, on what basis does the service make decisions when it has
a resource constraint?  I would also add priority to the list.  If I have
multiple jobs submitted, I may want to make sure that one of them gets
service over the others.

> 
> >  - everybody's favorite: security.  I hope we can basically 
> > punt on this and say we will do whatever OGSA does
> 
> well.. please see 
> http://www.globus.org/toolkit/docs/4.0/security/GT4-GSI-Overview.pdf
> 
> and the discussion here on transport and message-level security.
> GT4 currently supports both in parallel.
> 
> together with my comment on top, i think for this essential service
> we need to at least task the OGSA security group to give us a clear
> path of how both can interoperate and what are the migration paths.
> this is highly nontrivial and until it is not given, it will not be
> practical for us to talk about message-level security at all. we don't
> have the effort available to do what GT4 did and to run both 
> everywhere.

Ok.  I am guilty here of apparently not knowing all the implications of the
statement I made.  I agree with you that if the OGSA security model does not
allow for transport security, then that is a problem.  The performance hit
was just too much too take.  I guess a better way of saying what I meant was
I hope we can make most or all of the security pluggable via some reasonable
interfaces (the authz interface in GridFTP has been very successful in this
respect).  This will allow communities to plug in whatever is appropriate
for them.

> 
> >  - A sort of pet project of mine is monitoring / 
> > troubleshooting.  I would like to potentially include 
> > elements in the WSDL or state that is exposed that would 
> > enable better/easier monitoring and troubleshooting.  For 
> > instance, some sort of unique job identifier that can be 
> > passed down to children, so that you can trace the chain back 
> > when you have a failure.
> 
> yes, statistics gathering is another one. very important to have -
> i already sent the wsdl's we have today to this list..

yes, and we also have a set and part of what this working group will need to
do is compare these (and others) and come to an agreement on them.

> 
> >  - groups that we need to be aware of to one extent or 
> > another include OGSA, OGSA-D, info-d, gsm, byte-io, naming, 
> > grid file systems, authz (other security groups), 
> > ws-agreement (GRAAP?).  are there others? how should we 
> > liaise with those groups?  some will require more work than others.
> 
> :-) the best thing to have is if there are individuals participating
> in both who can act as liaisons. i am happy to take gsm and parts of
> ogsa-d, together with you of course.

Yes, having participants from those groups would be ideal.  I guess we need
to start a recruiting campaign :-).  btw, if you actively participate in any
of those groups and are on this list, I would appreciate it if you would
identify yourself, or if there is any other group in GGF (or any other
standards body for that matter) that you are a member of and think should
liaise with us.

Bill
> 
> > 
> > Let me know what you think.
> 
> keep'em coming ;-)
> 
> cheers
> peter
>