[Nsi-wg] Post-Rio dessimination of all things NSI

Mon Sep 19 09:25:52 CDT 2011

Hi, thanks for answering.

Many/most of these replies here are for the group as a whole, I am not 
targeting John :-).

On Wed, 14 Sep 2011, John MacAuley wrote:

> Before I start I do what to make a generalized comment about the 
> complexity of the NSI protocol.  Going through the implementation this 
> last couple of weeks really opened my eyes.  I had been grumbling about 
> it when writing the reference WSDL, but I would like to have a 
> discussion at the weekly NSI call (or maybe dial into OGF next week) so 
> we can make sure some of the requirements driving the complexity are 
> really not just nice to haves.  I an not talking about namespaces or 
> topology, I am referring to requirements that have forced us into long 
> duration operations and my currently very least favourite 
> provision/release operations.

Well, sure. But I think we have to differentiate between complexity from 
the problem area (provisioning of network connections), and "accidental" 
complexity arising from the protocol we create to solve the problem. The 
latter should be kept to a minimum :-).

>> Protocol:
>>
>> The schedule has a start time, end time and duration.
>> - AFAICT we do not need duration. Can anyone explain what we need it for?
>
> We had it in Fenius and many NRMs support it as well.  Programmatically 
> it can be achieved with endTime unless we decide it has a specific 
> behaviour for reservations that start now, when now make take a while to 
> setup.  This would be extremely complicated to coordinate so i think we 
> should probably take it out.

I think taking it out is the right think. We don't want ambiguity in how 
to define when something should start and end.

>> Is there a rationale for the minimum and maximum bandwidth?
>> (I entered the NSI community a bit late, so humor me).
>
> Some services being offered by networks have flexible bandwidth 
> capabilities such as bursting capabilities to higher bandwidth when 
> other circuits are idle.  Providing a minimum would put a floor 
> (committed) on the reservation bandwidth.

OK, I can see how it can provide a more flexible approach to delivering 
bandwidth. I must admit I doubt it will be used, but I am not a network 
expert.

>> The callback model makes it very non obvious to handle failures. - In 
>> general, dealing with network problems has not been thought through. - 
>> This becames fairly clear when we had problems establishing connection. 
>> There is nothing specified on how to continue from there (i.e., who 
>> carries the responsibility for propagating state updates).
>
> Yes, the callback model is much more complex but was required based on 
> the input requirements.  We have discussed a number of strategies to 
> handle these failure scenarios.  First off we need to have retries on 
> sending for requests, confirmed, and failed messages.  Obviously, if we 
> are having authentication failures no number of retries will fix the 
> problem.  If your retry timers run out, then you have not option but to 
> toss the message.  In this situation the requestingNSA may reissue the 
> request again, or do a query operation to determine if the state of the 
> state machine on the providerNSA.  Similarly, the providerNSA may query 
> the requesterNSA to see if their states match for the connection in 
> question.

Ultimately it is the requester which is interested in the information, so 
I think it would make sense to have the main retry logic there. I am not 
sure the message retry would accomplish a lot except implementation 
complexity.

I am essentially suggestion a fallback to polling, but almost every 
message based system (which we are somewhat emulating with the callback) 
falls back to that for error recovery. I'm just not overly thrilled with 
the aspect of having multiple ways for state propagation.

>> The TechnologySpecificAttributes does not seem to have any usage, 
>> except "future compatability", but are there no examples or use cases 
>> for them. - I suggest removal, and then adding specific fields if we 
>> need them later.
>
> There are there specifically to let service providers add service 
> specific attributes such as frame size, QoS, SLRG, etc parameters into a 
> request.  The protocol would not need to change, but the underlying 
> implementation would to support the parameters.  We did not want to 
> revise the protocol every time a new service parameter was defined.

OK. Are the parameters then optional to understand, or must they all be 
understood. The latter would make sense, but the semantics is still 
unclear. (and some examples in the spec. would be nice).

>> The reservationConfirmed message includes all the reservation details. 
>> Is this really necessary? Couldn't it just be a simple acknowledgement.
>
> The reason we included the reservation details in the message is that 
> the original reservation request is a "space" that can be more fully 
> qualified by the providerNSA when satisfying the request.  For example, 
> if I specify a number of mandatory and desired parameters in the 
> reservation request, the reservationConfirmed will hold the parameters 
> satisfied by the reservation.  It is more work but is needed.

I am not convinced. I think Jerry hit the nail with the argument that we 
are trying to create something more high-level. When creating a connection 
a number of requirements are filled out. Either these requirements can be 
fulfilled or the reservation can be completed. E.g., when reserving a 
hotel room, the room number is not returned to you (sure you get it 
later for practical reasons), but you are interested in the service of 
having a place to sleep, not the number on the door.

>> I would consider adding a state to indicate that the connection failed 
>> for some reason. The terminated state is a bit broad in what it 
>> describes.
>
> Yes, I had originally proposed that as well.  I would also like to 
> formalize the minimum length a reservation will remain the in the NSA 
> when in the terminated state so that it can be queried.

I agree, having that information available for some time is necessary for 
this to make sense.

>> Having two ways of querying seems like one too many. Remember that both 
>> have to implemented. Is there a reason we cannot just have one?
>
> There is only a single query message, however, it can go from requester 
> to provider, or provider to requester.  This was needed for some of the 
> error handling cases.  Is this what you mean?

No, I was referring to the Details/Summary distinction, which seems 
superflorous to me.

>> The term "NsiExceptionType" seem to have gotten into the spec. It 
>> should be called serviceException.
>
> Service is a really overloaded term. Do you mean an NSI protocol 
> exception or a service reservation exception?

In the NSI document, the term "NsiExceptionType" is mentioned, which 
sounds an awfull lot like something from the WSDL. Other places in the 
document it is referred to as serviceException.

>> The messageId in the serviceException should either have a number of 
>> options or not be there.
>
> The agreement was that we would enumerate through the implmentation but 
> not force a set into the XSD so we have flexibility.

OK, I can see the sense in not having it in the WSDL, but it should be in 
spec. at least.

>> - A possiblity could be to adopt the HTTP error codes.
>
> That is a possibility.

Or least use them as a base. We might want some connection 
provision specific indicators.

>> The text and variables solution in serviceException seems like 
>> overkill. Why not just have the text there?
>
> Here is an example that shows the additional flexibility.  One generic 
> "Invalid or missing parameter" error message and the parameters causing 
> the issue in the variables.
>
>    <messageId>SVC0001</messageId>
>    <text>Invalid or missing parameter</text>
>    <variables>
>        <Attribute Name="replyTo" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic">
>            <AttributeValue xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema"
>                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>                >&lt;null&gt;</AttributeValue>
>        </Attribute>
>    </variables>

OK, but wouldn't the messageId just be the code for the "Invalid or 
missing parameter", and hence just having one of those fields redundant.

>> WSDL:
>>
>> xsd:dateTime allows value without a timezone, which is problematic. - I 
>> suggest that the protocol dictates that all protocol timestamps should 
>> be in zulu time (which is really the only sensible thing to send over a 
>> wire IMHO)
>
> I did look up the dateType specification and am sorry I missed this.  I 
> always use the Java XMLGregorianCalendarImpl class which puts the 
> timezone on by default.  We will put this on the list of items to 
> resolve.

Yes, it is certainly not ideal for something that is supposed to send over 
the wire. I would suggest we do not allow time zones either, and just deal 
with everything in UTC / zulu time over the wire.

>> connectionId is enforced as a UUID, which is not tune with the protocol 
>> spec. which specifies that the connectionId only has to be unique 
>> within the requester NSA scope.
>
> Yes, we will change this.  Local uniqueness only causes implementation 
> issues with zero upside value. You then need to maintain tuples for 
> uniqueness.  NSI is complicated enough without this added complexity.

Sorry, change what to what. Change the WSDL to match the spec. or vice 
versa.

>> I've also compiled a list of issues which have been confusing people. 
>> The purpose of this list is simply have a spec which is easy(er) to 
>> implement, which IMHO is very important quality of standard (and one we 
>> are far from).
>>
>> - The URN prefixes
>
> We need to use namespaces to allow for flexibility when other namespaces 
> are needed to be used.  We must remember that a good protocol can be 
> used flexibly.

Maybe we can just agree on allowing direct SSH access to everyones network 
equipment and we won't even have to do the protocol :-). It will be very, 
very flexible :-).

OK, more serious. URNs are typically used to denote a "what" instead of 
"where" (URLs), and are typically used to either decouple location (URN -> 
URL resolving), or to have it possible to mix different types of 
resources. However we are usually very clear when denoting resources, 
e.g.,:

<stpId>urn:ogf:network:stp:Martinique:M1</stpId>

Do we plan on having anything else than an STP in the stpId element? I 
hope not.

>> - Requester / provider role fields
>
> I was concerned with these originally as well.  If we remove the replyTo 
> and place it in topology definition as the csRequesterEndpoint then we 
> will at least need the requesterNSA attribute.
>
> Based on the current spec we should rename these fields if they do not hold an NSA URN.
>
>> - replyTo / addressing in general
>
> As above.  I think we need to maintain the flexibility to support both one and two endpoints.

Yearh, this pretty much falls into the "think about addressing" category.
Though I am not really sure we need the two endpoint flexiblity.

>> - Reordering of messages from "logical order"
>
> Does logical order refer to protocol order or order operations were 
> issues.  In any distributed highly parallel system message ordering is 
> hard to maintain when multiple thread processing is involved.  There are 
> queuing strategies to handle some of this but the best mechanism is the 
> requester serializing :-)

Yearh sure. But some abstractions are easier to work with than others :-)

>> - Bad error messages from SOAP/WSDL stacks (and probably other things as well)
>
> Please provider some examples.

I send to reserve request to the AutoBAHN implementation (I think), end 
they got an "Error parsing BigInteger" or similar. Turned out that a 
bandwidth parameter was missing.

>> - XML/WSDL namespaces
>
> You need to get your stack fixed :-)

Well, yes.

The problem is that once one moves outside the Java world, good SOAP/WSDL 
stacks become very very sparse (perhaps C# should be included). In fact 
many languages does not have one. I think this is a big problem. SOAP and 
WSDL are certainly better than a lot of the alternatives (especially 
custom binary protocols), and the merits can be discussed endlessly (once 
sure does get a lot almost free when it is working / but the position is 
the opposite when it doesn't). It does however the set entry bar for NSI 
at an ackward position.

>> I have some comments/suggestions as well:
>>
>> The URN prefixes are just prefixes. They do not add any value, and have 
>> been a source of confusion. I suggest we remove them.
>
> Namespaces are needed for flexibility.  If we remove them then NSI can 
> only work with the naming structures we define.  I really want to avoid 
> this if possible.  You should not even be looking into them anyways. 
> String match only.  I proposed a label for display name in the topology 
> file so we can have both uniqueness and something for people to display 
> in GUIs.

See comment further up. I still don't see how they add value (now or 
later).

>> I'm not sure the requester/provider role fields are really necessary. 
>> It should be clear from the security context (I'll get back to that), 
>> who it is one is communicating with.
>
> This does need discussion.

An interesting remark: A lot of people confused these with networks 
instead of NSA agents. I think the fields could be replaced with the 
networks. But this raises the issues if they are even needed.

What is needed is a way to identify the entity calling you, but this 
should really be a security thing. If someone/something contacts manages 
to contact you, but intended something else; chances are they won't get 
the provider field correct either :-).

>> Reordering of messages from "logical order". I still think the protocol 
>> design is a bit clumsy, especially when combined with the lack of how 
>> to handle network errors (unavailable hosts, etc.) and short-lived 
>> clients. I've been thinking a bit about it, but haven't really come up 
>> with any substantial.
>
> Please clarify "logical ordering" and we can discuss.

I am referring to the sitation where a reservationConfirmed is received 
before the reservation ACK is received and similar situations.

>> Some people (me included, if not especially) have struggled quite a bit 
>> with their SOAP/WSDL stacks, and the lack of checks and bugs in there. 
>> However several people have also been puzzled with the error messages 
>> from them .e.g., getting an integer parsing error when an element was 
>> simply missing.
>
> Might be an issue with a mandatory field not being provided and the 
> error handling of your stack being coded by a bunch of monkeys :-)

Well, that would certainly explain a lot of things. The error message in 
question came from one of the common stacks in Java (the one the AutoBAHN 
people are using), not the one I was using.

I am not just taking up situations which have troubled me (some of these 
hasn't at all), but in general, these issues are things I've observed.

>> Somewhat similar, I had an issue where my SOAP stack used the wrong 
>> namespace on an element. The issue here was no so much the bug in the 
>> SOAP stack (it happens), but that I could not figure out what was right 
>> and wrong by looking at the WSDLs.
>
> Takes an experienced eye for these things.  Sorry I couldn't help 
> earlier to identify the problem.

It is not really your fault, but more the combination of WSDL complexity 
mixed with too few experts. However it could have been remedied very 
easily if there was examples of the message payloads or similar available.

>> Security. Let me be very clear here. HTTP basic and some SAML 
>> attributes have nothing to with security. It did not provide integrity, 
>> confidentiality, or assurance. I am also puzzled by the choice of SAML, 
>> as SAML is intended for communication between identity providers and 
>> services, but there are no identity providers in NSI. Also, I don't 
>> think that anyone in this group actually understands SAML (I don't). My 
>> suggestion for security is to use TLS with certificates (from a 
>> recognized CA) in each end, and nothing more. It is not the most 
>> trivial thing in the world (but isn't really difficult either), but it 
>> fairly well understood and has widespread support.
>
> There are two solutions required here.  The first is NSA-to-NSA security 
> and the second is end user "session" security.  Each as different 
> requirements and solutions.  The proposed security solution we have 
> discussed and agreed upon for NSA-to-NSA security is:

> 1. TLS with mutual authentication for encryption and confidentiality (and transport authentication).

Yes please.

> 2. HTTP Basic authentication for authentication.  Yes this seems like 
> double the effort, but BASIC is supported in software stacks via JAAS. 
> TLS certificates are not typically supported for application level 
> security, however, there are ways to get access to it.

HTTP Basic will not really provide adequate security for anything. It does 
not protect against tampering or replay and does not provide 
confidentiality. Once a single request with the Authorization header has 
been issued is provides exactly as much "security" as if it wasn't there 
at all.

> 3. SOAP digital signatures for message integrity.

Kill me now :-).

> The sessionSecurityAttr hold authentication and authorization 
> information for the end user. There is a security proposal document that 
> describes the use of this element, they types of roles supported, and 
> how certificates could be past for integration into existing security 
> solutions for the end user.  We do not use these fields for NSA 
> authentication.

OK. But is more than the user identity needed, and is it needed at all. 
The important question is if we trust another NSA to make a reservation or 
not. Sure we can include some metadata, but could we call it something 
without the security word :-).

> We need to push on to SuperComputing with a better coordinated effort. 
> I know many people are taking a deep breath and will spend the next week 
> catching up on work they ignored over the last couple of weeks.  I will 
> try to kick off the more organized effort next week.

Sounds good.

     Best regards, Henrik

  Henrik Thostrup Jensen <htj at ndgf.org>
  NORDUnet / Nordic Data Grid Facility.