[Nsi-wg] A alternative Modify() proposal - a "shadow" approach

Wed Jul 4 19:26:08 EDT 2012

Jerry,

I like your thinking.  Chin and I when down a similar path trying to use exiting operation primitives in conjunction with a new Modify primitive.  We ruled out the idea because we needed to add additional conditional logic to the existing primitive in order to handle the modify specific behaviours.  In addition, it just didn't work since there were situations where you couldn't properly provide the needed modify behaviours (specifically around error handling and backing out a modify failure in a subtree).  It began to feel unnatural.  The bastard child of an unholy wedlock. ;-)

I have provided detailed comments below.  I would like you to give deep thought to whether you think trying to overload the existing primitives to support modify and the added complexity is going to really save us anything over a separate modify command set that has an independent state machine from the existing unmodified state machine.

John.

On 2012-07-03, at 3:09 PM, Jerry Sobieski wrote:

> Hi everyone-
> 
> The connection modification capability for version 2.0 was initially presented as a simple enhancement to extend the scheduled end time.  Or perhaps to increase the bandwidth, on an existing reservation.  This was supposed to be a very limited functional tweek for v2.0.

Yes, I am still proposing this is the only capability we implement in 2.0.

> 
> But then we decide "hitless" was a requirement;  And then we added "path preservation" as a requirement.  It was *assumed* that we needed a unique Modify() primitive to do this...  probably because prior tools have them...      Suddenly, we are re-defining the entire state machine (yet again), and making it still more complex, in order to make this "simple" enhancement.  

Sigh (throwing hands up into the air then smashing head against table)... a new state machine modelling the Modify lifecycle with no changes to the existing state machine.  Two-phase reserve is a separate issue from Modify.

> 
> This increasing complexity is actually counter to what we were trying to do in Oxford: to simplify the state machine.  And in general, counter to good protocol design.

Okay, I will not write an extensive dissertation on how this statement is inaccurate.  The original intent of the Oxford state machine was to fix the delayed provisionConfirm message.  Henrick, Chin, and Tomohiro did a great job trying to rationalize the existing state machine and simplify it where possible.  What we landed on was two separate state machines to more simply describe each NSA role.  Out of Oxford we actually ended up with more state machines and states than we went into Oxford with.  Why is this you ask?  It is because we are designing a complex distributed reservation and provisioning system.  This is not a simple task given the behavioural constraints we have placed on the team.  People need to realize that sometimes correctness is complex and not as simple as first thought.  I think is NSI project is a perfect example of this.

> I think the existing state machine has been thoroughly vetted and is adequate for the protocol, and that we should consider functions like "Modify" as higher layer constructs that should be implemented using the existing atomic primitives we already have.   Things like protection circuits, and diversity attributes, and the like will all pose similar challenges - and we cant keep changing the state machine everytime someone has a "simple" feature they can't live without...
> 

You are making a assumption that I believe is the key flaw in your argument - you are assuming that the existing vetted state machine will not change if you reuse the existing operation primitives.  I am not convinced it would not change.  It all comes down to if we decide to model the "shadow connection" conditional states within the machine.

> Given the developing complexity, we should step back and re-evaluate  a) the urgency for Modify(),   b) the means/scope of implementing it,   and c) the timeline it will require to "do it properly".  

At the rate this working group makes decisions and closed on actions we could still be debating this next year.  We have time to agree and prototype before closing on the NSI 2.0 specification.  If I may point out, the only things we have actually closed on are changes I made to the WSDL that fixed deficiencies in release 1.1.  So far we have no new features in 2.0 fully agreed and committed.  To be honest, I still do not understand the process we follow.

> I would like to also propose an alternative "shadow" approach to provide a modify capability in version 2.0:
> 
> In a shadow approach, we build a simple second "shadow" connection reservation, and then perform a Release()-Provision() sequence to cut over to the modified service instance when ready.  This shadow approach uses only existing protocol primitives and existing state machine.    (This is similar to John's talk about "bridge and roll"... but without a bridge:-)
> 
> Currently, a separate circuit approach like this would require separate STPs as endpoints for the modified connection reservation.  However, given virtual STPs (e.g. VLANs), a shadow connection would not *really* need to terminate at the same source or destination STP to be useful - i.e. the A and Z endpoints of a modified connection could be different VLANs without imposing any detectable performance hit on end-to-end data flow (!) - the sending system simply begins using a new tag when the shadow provisioning is completed.   (This requires the end systems agents to know this will occur, but, strictly speaking, this is entirely feasible.)   The shadow path would likely even be along the same geographic route - i.e. the packets would transit all the same network infrastructure, just with different tags.  Given this situation, the need to "modify" an *existing* connection, particularly with ethernet based STPs, seems somewhat unnecessary if you can simply request another connection with the desired new attributes along the same path and start using it whenever you please... 

This breaks down completely with anything other than VLAN circuits.  If I have an existing EPL circuit that is encapsulating the entire contents of an Ethernet port end-to-end I have no ability to use another Ethernet port as an STP in this operation.  I must use the existing STP since this could be the only port dropped at my location.  Adding the requirement for an additional set of STPs just to modify the endTime of an existing reservation seems to add unnecessary complexity not only to the NSI implementation, but the end user consuming the service.

> Being pragmatic though, there are many applications that will not be able to change their termination point, thus the source/destination STPs should be simultaneously acceptable for both the shadow connection as well as the working connection.  Likewise, other resources (say bandwidth) may not be sufficient to reserve a completely separate upgraded Connection, and so the path finders ought to be able to "double-book" resources assigned to the working connection to be used by the shadow connection.  Since the working conenction and the shadow connection should never both be active, this double booking will never cause a conflict.  This ability for shadows to double-book resources of their working counterpart provides the functionality we initially wanted: simply upgrading the existing path.   

Looks like we agree on the application requirement.  Woo hoo!

> 
> We can easily indicate when we wish to create a shadow Reservation within the existing protocol:
> We simply specify an existing ConnectionID in a Reservation Request.

How do I distinguish between an RA wanting to modify an existing reservation and a naughty NSA sending down a duplicate connectionId that we currently reject? I guess we will now always assume it is a modification...

> If the ReservationRequest specifies an existing Reservation rather than a new Reservation, then a [new] shadow Reservation/Connection is to be created and linked to the original "working" reservation.

So can I assume that the STP are the same?  What else in the reservation needs to be maintained?  Is anything up for change?

> Thus, an otherwise normal Connection is identified as a "shadow" connection solely by the link to a working Connection.  

> When a reservation is confirmed, if it links to a working connection, the RA will immediately replace the working with the shadow and Terminate the working reservation.

When you say "working connection" do you mean provisioned and active in the network?  Just to argue your point, we will need to modify the existing state machine to allow reserve requests to arrive on an existing connection which could be in any of the defined states.  The action an RA takes here on confirm is the following Release?

> In the one case where the working connection is Active, the shadow will remain in its Reserved state as if it had passed the start time and was awaiting a provision request. 
> When a Release occurs for the working connection, a check is made to see if a shadow is linked to it.   If so, the shadow will then replace the working, and the working connection is Terminated.

Are you saying that the Release operation will trigger a Release of the existing working path, and automatically provision the new modified working path, or are you saying I would need to do another Provision?  I think you mean Release and then Provision so that you do an Activated->Releasing->Reserved->Provisioning->Provisioned->Activated sequence of state transitions.  Anything else will require a complete change of the state machine.

The issue with requiring the Release and then Provision again is that your service will take a traffic hit.  We are not talking about a short blip either.  We are talking a considerable period as the operation message filter down and up the tree multiple times.  I would definitely dismiss this mechanism based solely on this deficiency.  I need something not so intrusive, especially if it is just an endTime extension.

The one key piece missing from this strategy is how do I back out a shadow connection?  Once again, the key point of the two phase commit is to handle a failure to reserve the additional resources across the entire connection path.  How do you handle when part of your shadow reservation fails?  I can't send down a terminate since this will terminate the entire connection.  I can't overload the terminate operation since part of the tree has failed the reservation modification, and therefore, no longer has record of it resulting in a termination of the reservation for those NSA upon receiving the Terminate request.  Even if I force the RA to send down another Reserve to force the shadow connection back to the original pre-modify path there is nothing saying the original resources are even available any longer as they may have been consumed for a new reservation.

Also, when I query the connectionId during this shadow reservation do I see the existing in service reservation, or do I see the modified values?

> 
> This process does not change the NSI-CS protocol or the state machine.  It incur [minor] code additions to the existing primitives, but does not change the event driven state transitions.  Pathfinders should to also be enhanced to double-book shadow resources.

I think what you mean to say is that you do not require any new operation primitives.  You have changed the behaviour of the NSI-CS protocol by overloading the existing operation primitives.  I am still not convinced the existing state machine does not need to change, and you have some other issues to address as well.  What I did is call a spade a spade and defined a new operation set to do effectively the same thing as what you are doing.  I guess the big question is do we believe adding the additional complexity to existing operations is worth saving having to introduce an new set of operations better named for the activity at hand.

> 
> This "shadow" approach has this major advantage:  Since it is essentially just building a second reservation, it does not require changing the fundamental NSI-CS protocol or the state machine.   All the "modification" processing is implemented using existing primitives and state transitions.  The cost to the user is minimal: a single *potential* brief hit as the A and Z endpoints are switched to the [new/modified] connection.  And since the user initiated the modify() in the first place, and will need to adjust the behaviour of the application to take advantage of the new characteristics, it does not seem unreasonable to expect the user to be able to deal with a hiccup - if it occurs.

I disagree on the hiccup.  Why take the hit when there is no need to?

> 
> 
> Finally, as a general recommendation:  Modifying the existing primitives and the associated state machine should be a last resort.  Any new feature should have a very strong case for modifying the NSI-CS state machine, and alternatives that do not do so should be strongly encouraged.   We should only modify the NSI core protocols in order to simplify them, delivering additional features through higher level service constructs wherever possible.
> 
> Thoughts?
> Jerry
> 
> 
> 
> On 7/2/12 11:06 PM, John MacAuley wrote:
>> 
>> Peoples,
>> 
>> Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure.  Please study it and prepare questions for the Wednesday call.  We would like to close on this action ASAP.
>> 
>> Thank you,
>> John.
>> 
>> 
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> https://www.ogf.org/mailman/listinfo/nsi-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/nsi-wg/attachments/20120704/20ea6c62/attachment-0001.html>