[Nsi-wg] Post-Rio dessimination of all things NSI

Wed Sep 14 08:58:55 CDT 2011

Hi everyone

(long email, get coffee)

With the demo in Rio done, it is time to reflect a bit over the current protocol and state before moving.

I've collected a list of issues over the last month or so, but have intentionally not communicated them as it was more important to have things ready for Rio.

Protocol:

The schedule has a start time, end time and duration. 
- AFAICT we do not need duration. Can anyone explain what we need it for?

Is there a rationale for the minimum and maximum bandwidth?
(I entered the NSI community a bit late, so humor me).

The callback model makes it very non obvious to handle failures.
- In general, dealing with network problems has not been thought through.
- This becames fairly clear when we had problems establishing connection. There is nothing specified on how to continue from there (i.e., who carries the responsibility for propagating state updates).

The TechnologySpecificAttributes does not seem to have any usage, except "future compatability", but are there no examples or use cases for them.
- I suggest removal, and then adding specific fields if we need them later.

The reservationConfirmed message includes all the reservation details. Is this really necessary? Couldn't it just be a simple acknowledgement.

I would consider adding a state to indicate that the connection failed for some reason. The terminated state is a bit broad in what it describes.

Having two ways of querying seems like one too many. Remember that both have to implemented. Is there a reason we cannot just have one?

The term "NsiExceptionType" seem to have gotten into the spec. It should be called serviceException.
The messageId in the serviceException should either have a number of options or not be there.
- It should probably be called errorId as well (it doesn't identify a message).
- A possiblity could be to adopt the HTTP error codes.
The text and variables solution in serviceException seems like overkill. Why not just have the text there?

Quite often it is not clear what fields are required in a message and which are not.

WSDL:

XML Schema has a value "Terminateing" for ConnectionState.

xsd:dateTime allows value without a timezone, which is problematic.
- I suggest that the protocol dictates that all protocol timestamps should be in zulu time (which is really the only sensible thing to send over a wire IMHO)

WSDL specifies a reservation.reservation, which is somewhat unfortunate. I suggestion reservation.reservationInfo

connectionId is enforced as a UUID, which is not tune with the protocol spec. which specifies that the connectionId only has to be unique within the requester NSA scope.

ServiceException should probably be called "serviceException" to follow the naming convention.

I've also compiled a list of issues which have been confusing people. The purpose of this list is simply have a spec which is easy(er) to implement, which IMHO is very important quality of standard (and one we are far from).

- The URN prefixes
- Requester / provider role fields
- replyTo / addressing in general
- Distributed development
- Reordering of messages from "logical order"
- Bad error messages from SOAP/WSDL stacks (and probably other things as well)
- XML/WSDL namespaces
- Security

I have some comments/suggestions as well:

The URN prefixes are just prefixes. They do not add any value, and have been a source of confusion. I suggest we remove them.

I'm not sure the requester/provider role fields are really necessary. It should be clear from the security context (I'll get back to that), who it is one is communicating with.

The replyTo fields seems to me like being a surrogate for dealing with lack of addressing for not having topology done. On the other hand they also make it a lot easier to make clients as a client does not have to exist in the topology to be able to comminicate with an NSA. I suggest we think a bit over how we want this to work, and how we want to support potentially short-lived clients creating connections (because something needs to initiate connection creation).

The distributed collaboration with developing NSI agents was initally a bit fuzzy and hindered by some barriers. The skype room improved this a lot by bringing down the latency in developer communication.

Reordering of messages from "logical order". I still think the protocol design is a bit clumsy, especially when combined with the lack of how to handle network errors (unavailable hosts, etc.) and short-lived clients. I've been thinking a bit about it, but haven't really come up with any substantial.

Some people (me included, if not especially) have struggled quite a bit with their SOAP/WSDL stacks, and the lack of checks and bugs in there. However several people have also been puzzled with the error messages from them .e.g., getting an integer parsing error when an element was simply missing.

Somewhat similar, I had an issue where my SOAP stack used the wrong namespace on an element. The issue here was no so much the bug in the SOAP stack (it happens), but that I could not figure out what was right and wrong by looking at the WSDLs.

Security. Let me be very clear here. HTTP basic and some SAML attributes have nothing to with security. It did not provide integrity, confidentiality, or assurance. I am also puzzled by the choice of SAML, as SAML is intended for communication between identity providers and services, but there are no identity providers in NSI. Also, I don't think that anyone in this group actually understands SAML (I don't). My suggestion for security is to use TLS with certificates (from a recognized CA) in each end, and nothing more. It is not the most trivial thing in the world (but isn't really difficult either), but it fairly well understood and has widespread support.

Lastly, I think the group is suffering heavily from having thought too much and having constructued to little. This has of somewhat changed after Rio, but I fear that we are now so far into the process with writing the standard, that it is difficult to have any big changes done, as people do not want to change what has already been made. The project group also still operates in very ad-hoc fashion. This can work great to a certain extent, but I think that limit has been crossed some time ago. We need to get better organized, but this is not really my area of expertise, so I won't suggest something.

I'm on vacation until monday, so take your time replying :-)

     Best regards, Henrik

  Henrik Thostrup Jensen <htj at ndgf.org>
  NORDUnet / Nordic Data Grid Facility.