[Nsi-wg] Post-Rio dessimination of all things NSI

Wed Sep 14 15:53:50 CDT 2011

Comments in line.

Before I start I do what to make a generalized comment about the complexity of the NSI protocol.  Going through the implementation this last couple of weeks really opened my eyes.  I had been grumbling about it when writing the reference WSDL, but I would like to have a discussion at the weekly NSI call (or maybe dial into OGF next week) so we can make sure some of the requirements driving the complexity are really not just nice to haves.  I an not talking about namespaces or topology, I am referring to requirements that have forced us into long duration operations and my currently very least favourite provision/release operations.

> 
> Protocol:
> 
> The schedule has a start time, end time and duration. 
> - AFAICT we do not need duration. Can anyone explain what we need it for?

We had it in Fenius and many NRMs support it as well.  Programmatically it can be achieved with endTime unless we decide it has a specific behaviour for reservations that start now, when now make take a while to setup.  This would be extremely complicated to coordinate so i think we should probably take it out.

> Is there a rationale for the minimum and maximum bandwidth?
> (I entered the NSI community a bit late, so humor me).

Some services being offered by networks have flexible bandwidth capabilities such as bursting capabilities to higher bandwidth when other circuits are idle.  Providing a minimum would put a floor (committed) on the reservation bandwidth.

> The callback model makes it very non obvious to handle failures.
> - In general, dealing with network problems has not been thought through.
> - This becames fairly clear when we had problems establishing connection. There is nothing specified on how to continue from there (i.e., who carries the responsibility for propagating state updates).

Yes, the callback model is much more complex but was required based on the input requirements.  We have discussed a number of strategies to handle these failure scenarios.  First off we need to have retries on sending for requests, confirmed, and failed messages.  Obviously, if we are having authentication failures no number of retries will fix the problem.  If your retry timers run out, then you have not option but to toss the message.  In this situation the requestingNSA may reissue the request again, or do a query operation to determine if the state of the state machine on the providerNSA.  Similarly, the providerNSA may query the requesterNSA to see if their states match for the connection in question.

> The TechnologySpecificAttributes does not seem to have any usage, except "future compatability", but are there no examples or use cases for them.
> - I suggest removal, and then adding specific fields if we need them later.

There are there specifically to let service providers add service specific attributes such as frame size, QoS, SLRG, etc parameters into a request.  The protocol would not need to change, but the underlying implementation would to support the parameters.  We did not want to revise the protocol every time a new service parameter was defined.

> 
> The reservationConfirmed message includes all the reservation details. Is this really necessary? Couldn't it just be a simple acknowledgement.

The reason we included the reservation details in the message is that the original reservation request is a "space" that can be more fully qualified by the providerNSA when satisfying the request.  For example, if I specify a number of mandatory and desired parameters in the reservation request, the reservationConfirmed will hold the parameters satisfied by the reservation.  It is more work but is needed.

> 
> I would consider adding a state to indicate that the connection failed for some reason. The terminated state is a bit broad in what it describes.

Yes, I had originally proposed that as well.  I would also like to formalize the minimum length a reservation will remain the in the NSA when in the terminated state so that it can be queried.

> 
> Having two ways of querying seems like one too many. Remember that both have to implemented. Is there a reason we cannot just have one?

There is only a single query message, however, it can go from requester to provider, or provider to requester.  This was needed for some of the error handling cases.  Is this what you mean?

> 
> The term "NsiExceptionType" seem to have gotten into the spec. It should be called serviceException.

Service is a really overloaded term.  Do you mean an NSI protocol exception or a service reservation exception?

> The messageId in the serviceException should either have a number of options or not be there.

The agreement was that we would enumerate through the implmentation but not force a set into the XSD so we have flexibility.

> - It should probably be called errorId as well (it doesn't identify a message).

Good suggestion.

> - A possiblity could be to adopt the HTTP error codes.

That is a possibility.

> The text and variables solution in serviceException seems like overkill. Why not just have the text there?

Here is an example that shows the additional flexibility.  One generic "Invalid or missing parameter" error message and the parameters causing the issue in the variables.

    <messageId>SVC0001</messageId>
    <text>Invalid or missing parameter</text>
    <variables>
        <Attribute Name="replyTo" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic">
            <AttributeValue xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                >&lt;null&gt;</AttributeValue>
        </Attribute>
    </variables>

> 
> Quite often it is not clear what fields are required in a message and which are not.

Yes, we definitely need more documentation round these.

> 
> WSDL:
> 
> XML Schema has a value "Terminateing" for ConnectionState.

Thank you - I will fix that.

> 
> xsd:dateTime allows value without a timezone, which is problematic.
> - I suggest that the protocol dictates that all protocol timestamps should be in zulu time (which is really the only sensible thing to send over a wire IMHO)

 I did look up the dateType specification and am sorry I missed this.  I always use the Java XMLGregorianCalendarImpl class which puts the timezone on by default.  We will put this on the list of items to resolve.

> 
> WSDL specifies a reservation.reservation, which is somewhat unfortunate. I suggestion reservation.reservationInfo

Yes.  This will be done.

> 
> connectionId is enforced as a UUID, which is not tune with the protocol spec. which specifies that the connectionId only has to be unique within the requester NSA scope.

Yes, we will change this.  Local uniqueness only causes implementation issues with zero upside value. You then need to maintain tuples for uniqueness.  NSI is complicated enough without this added complexity.

> 
> ServiceException should probably be called "serviceException" to follow the naming convention.

I bounced back and forth on this during definition.  I need to remember why.

> 
> I've also compiled a list of issues which have been confusing people. The purpose of this list is simply have a spec which is easy(er) to implement, which IMHO is very important quality of standard (and one we are far from).
> 
> - The URN prefixes

We need to use namespaces to allow for flexibility when other namespaces are needed to be used.  We must remember that a good protocol can be used flexibly.

> - Requester / provider role fields

I was concerned with these originally as well.  If we remove the replyTo and place it in topology definition as the csRequesterEndpoint then we will at least need the requesterNSA attribute.

Based on the current spec we should rename these fields if they do not hold an NSA URN.

> - replyTo / addressing in general

As above.  I think we need to maintain the flexibility to support both one and two endpoints.

> - Distributed development
> - Reordering of messages from "logical order"

Does logical order refer to protocol order or order operations were issues.  In any distributed highly parallel system message ordering is hard to maintain when multiple thread processing is involved.  There are queuing strategies to handle some of this but the best mechanism is the requester serializing :-)

> - Bad error messages from SOAP/WSDL stacks (and probably other things as well)

Please provider some examples.

> - XML/WSDL namespaces

You need to get your stack fixed :-)

> - Security

Definitely.  I locked a lot of people out.

> 
> I have some comments/suggestions as well:
> 
> The URN prefixes are just prefixes. They do not add any value, and have been a source of confusion. I suggest we remove them.

Namespaces are needed for flexibility.  If we remove them then NSI can only work with the naming structures we define.  I really want to avoid this if possible.  You should not even be looking into them anyways.  String match only.  I proposed a label for display name in the topology file so we can have both uniqueness and something for people to display in GUIs.

> 
> I'm not sure the requester/provider role fields are really necessary. It should be clear from the security context (I'll get back to that), who it is one is communicating with.

This does need discussion.

> 
> The replyTo fields seems to me like being a surrogate for dealing with lack of addressing for not having topology done. On the other hand they also make it a lot easier to make clients as a client does not have to exist in the topology to be able to comminicate with an NSA. I suggest we think a bit over how we want this to work, and how we want to support potentially short-lived clients creating connections (because something needs to initiate connection creation).

It was mirroring the capabilities of WS-Addressing, but I am okay if we address it through NSA topology.

> 
> The distributed collaboration with developing NSI agents was initally a bit fuzzy and hindered by some barriers. The skype room improved this a lot by bringing down the latency in developer communication.

The only better way is to get everyone in a single room with beer.

> 
> Reordering of messages from "logical order". I still think the protocol design is a bit clumsy, especially when combined with the lack of how to handle network errors (unavailable hosts, etc.) and short-lived clients. I've been thinking a bit about it, but haven't really come up with any substantial.

Please clarify "logical ordering" and we can discuss.

> 
> Some people (me included, if not especially) have struggled quite a bit with their SOAP/WSDL stacks, and the lack of checks and bugs in there. However several people have also been puzzled with the error messages from them .e.g., getting an integer parsing error when an element was simply missing.

Might be an issue with a mandatory field not being provided and the error handling of your stack being coded by a bunch of monkeys :-)

> 
> Somewhat similar, I had an issue where my SOAP stack used the wrong namespace on an element. The issue here was no so much the bug in the SOAP stack (it happens), but that I could not figure out what was right and wrong by looking at the WSDLs.

Takes an experienced eye for these things.  Sorry I couldn't help earlier to identify the problem.

> 
> Security. Let me be very clear here. HTTP basic and some SAML attributes have nothing to with security.
> It did not provide integrity, confidentiality, or assurance. I am also puzzled by the choice of SAML, as SAML is intended for communication between identity providers and services, but there are no identity providers in NSI. Also, I don't think that anyone in this group actually understands SAML (I don't). My suggestion for security is to use TLS with certificates (from a recognized CA) in each end, and nothing more. It is not the most trivial thing in the world (but isn't really difficult either), but it fairly well understood and has widespread support.

There are two solutions required here.  The first is NSA-to-NSA security and the second is end user "session" security.  Each as different requirements and solutions.  The proposed security solution we have discussed and agreed upon for NSA-to-NSA security is:
1. TLS with mutual authentication for encryption and confidentiality (and transport authentication).
2. HTTP Basic authentication for authentication.  Yes this seems like double the effort, but BASIC is supported in software stacks via JAAS.  TLS certificates are not typically supported for application level security, however, there are ways to get access to it. 
3. SOAP digital signatures for message integrity.

The sessionSecurityAttr hold authentication and authorization information for the end user. There is a security proposal document that describes the use of this element, they types of roles supported, and how certificates could be past for integration into existing security solutions for the end user.  We do not use these fields for NSA authentication.

> Lastly, I think the group is suffering heavily from having thought too much and having constructued to little. This has of somewhat changed after Rio, but I fear that we are now so far into the process with writing the standard, that it is difficult to have any big changes done, as people do not want to change what has already been made. The project group also still operates in very ad-hoc fashion. This can work great to a certain extent, but I think that limit has been crossed some time ago. We need to get better organized, but this is not really my area of expertise, so I won't suggest something.

Well, some of the people in the NSI working group are a bit overworked and this was before implementations.  I had offered previously to coordinate development efforts, and I think we need to get a focused implementors group together with dedicated mailing lists and resources.  The GLIF had been looking to form a working group to do this, but I think we can run something very informal.  I will offer it up again.

It is never too late to change the protocol.  This effort was a proof point for the protocol so we could see that needs improvement before official publication of an endorsed protocol.  People know the current version is not final, and I have no issue changing it if needed.

There was a lot of stressing in the last day, so there were some complaints flying, but I think everyone did a great job.  One issue I heard pop up was the excessive chatter on the skype IM session from people, many of who were not actually testing implementations.  People were excited and wanted to help however they could, so I can't fault anyone, but it forced a few of us to break off into individual chat sessions to focus on the task at hand.  We will need to be careful of this in the future.

We need to push on to SuperComputing with a better coordinated effort.  I know many people are taking a deep breath and will spend the next week catching up on work they ignored over the last couple of weeks.  I will try to kick off the more organized effort next week.

> 
> 
> I'm on vacation until monday, so take your time replying :-)
> 

Lucky you.

> 
>     Best regards, Henrik
> 
>  Henrik Thostrup Jensen <htj at ndgf.org>
>  NORDUnet / Nordic Data Grid Facility.
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/nsi-wg