[Nsi-wg] New state machine with two phase reserve and modify

Wed Jul 4 17:50:48 EDT 2012

On 2012-07-03, at 5:23 PM, Jerry Sobieski wrote:

> Hi John-
> 
> Regarding this state machine-
> We really need to consider how this Modify can be handled more like ballet than hockey.  :-) 
> 
> 
> --- First, what are we trying to solve with 2PC?   The multi-domain pass/fail scenario where some succeed and some fail causing the successful modifications to be rolled back is a resource management issue - not a protocol state issue.  If the successfl domains will still have to be rolled back - we don't escape this fact.   Its just that the RA is not responsible for doing it.   You are placing the task on the PA.  And forcing the RA to now issue two messages to make sure he wants what he just asked for.     
> 
> A better approach would be think of this differently - think of the new modified connection as a separate new connection ... what is it about the new connection that you want?  The same path as the old connection?  Ok. We do not want to release the old resources into the wild...may lose them.  OK.  We want to minimize the impact to the data flow of the resource modification.  OK.  

So far this sounds the same as the modify command, except for the "separate new connection" part.
> 
> So make a constraint for the path finder: "You must use resources that are assigned to ( "available" | "Connection X") "  The protocol runs normally... The reservation is made.  Resources are already multi-booked to different connections for scheduling purposes, so double booking resources to two connections simultaneously should not be technically difficult.  Just need to make sure we can switch them over appropriately - but fortunately this isn't a state machine issue, its a configuration issue and resource database issue..  

Okay, still exact same as the modify.  Path finder is using the exact same set of resources and just seeing if it can extend the time, add additional bandwidth, etc.
> 
> Please see my alternative "shadow" proposal that does not require wholesale replacement of the state machine.

I still do not understand this comment.  The existing state machine stays as is, with the modify state machine being an entirely different overlay machine.  It has no impact on the existing state machine.

> 
> --- This proposal seems to require 2PC for normal Reservations as well as Modifications.   Is this right?

No, the 2PC on reservations was an action i took out of Delft and it has nothing to do with the Modify except someone said "If we are doing a 2PC on Modify why don't we fix the Reserve at the same time?"

> This is a major (and unnecessary IMO) departure from the existing State Machine we have worked on so hard for over two years.   We have had so many Very Long discussions on this in early 2010 in which we decided the simple single Reserve command with an active Terminate was preferable to the 2PC (the 2PC requiring a top down confirm...doubling the number of messages required, and introducing race conditions with timeouts that were not easily resolved.)  Why, after so long and demonstrated success in v1.0, do we suddenly need to change horses to a two phase commit for the entire model?    I fear this will cause extensive re-writes ... for what feature again?  TO extend a schedule? 

As I said - I have no issue leaving Reserve as is, but we did agree back in release 1.0 to revisit it in release 2.0 at Tomohiro's request.  

> 
> 2PC is an optimization strategy where you assume a fail is likely to occur down tree, and the RA does not wish to be responsible for cleaning up unneeded reservations.   If you assume reservations will mostly be successful, then a 2PC imposes twice the messaging, and imposes this coding complexity on all applications so that some very few can use this complex approach to extend their schedule?   This is way too complex an approach.

Given our current topology and pathfinding model I think it is highly likely that the majority of reservations will fail on first attempt.  This will happen even more often if we have successful take-up of the protocol and the network (a scarce resource) becomes more congested.

> 
> --- The 10 minute timeout for backing out a reservation implies that those resources are blocked until they are freed - which poses all sorts of security and resource management problems.  If an agent is sophisticated enough to manage a Modify function, it should be sophisticated enough to be responsible for retracting unused reservations.  Indeed, the 2PC should charge based upon request, rather than confirm - since a user could arbitrarilly request many big connections and just never confirm them.   

10 minutes, 5 minutes, the time is not important.  What you are suggesting is that we do not need a timer to protect the network from dangling resources associated with incomplete reservations or modification operations.  I am okay with this just so long as there is a way for the RA to clean up afterwards, and any resources it strands are counted against the user's network cap.

> 
> --- Other protocols recognize long delays in provisioning.   Most of these have a "Continuing" message that represents a heartbeat of sorts...  If a protocol agent cannot complete its processing in a set time - but is still trying, it sends a "Continuing" message to the requester to indicate it is still working.   As long as CONT messages are being passed between nodes downstream, the upstream nodes will continue issuing CONT messages.   If someone along the line gets fed up, they issue Terminates appropriately and everything is torn down.  NSI could do this as well.  

I have no issue with this as an alternative design, other than it generates unneeded noise within the tree to just hold onto resources temporarily.  I would need to see a detailed message and state machine to understand the proposal.

> 
> --- In slide 6 (?) you describe a bunch of NRM functional requirements.   I think I understand what you mean to do, but we should not phase these as NRM functions.  NSI has nothing to do with NRM functions.  Its inconsistent with the spec and can be very misleading and confusing.   ->  As far as NSI is concerned, all events are NSI protocol events.  For example - given a NSA "John" presiding over a local domain "Ottawa", and say two peering with domains Guy/Cambridge, and Inder/Berkeley, all interactions between John and Guy  or John and Inder are handled by NSI CS protocol.   The local Ottawa domain - which John considers his "local" domain - is managed by John's NRM consisting of a dozen minions with hockey sticks.   From an NSI standpoint, that local Ottawa domain could be placed under control of another [more sophisticated] NSA Jerry, and John would then interact with Jerry/Ottawa using the NSI protocol instead of body checks. (:-)   As far as NSI is concerned, the interaction John had with Ottawa using his traditional ways is functionally and equivallently replaced by NSA Jerry running NSI.  Thus, the *only* events that NSI needs to stipulate are those NSI events.
> :-)   (The NSA implementation is responsible for translating any local events into their appropriate NSI protocol events.  but the spec does not care anything about this translation.)

These are the standard NRM operations and events associated with a reservation, which have existed since version 1 of the state machine.  I just added the equivalent ones for the modify operation.  Are you saying you no longer believe we need the uPA state machine?  Do you just notice these now? 

> 
> --- In general, it seems to me this whole proposal is trying to dance around the issue of adding or removing resources to a connection in service using our existing inflexible path finding and constraints specifications.   All this means is the Path Finder should be able to compute a new path using the existing allocated resources - albeit allocated to a specific connection.  It seems that if we were able to indicate this simple change to path finders, then all the funky modify state becomes unnecessary.  We just reserve a connection that can do double booking of resources, and provision it when it is confirmed.   If the new connection overlays the existing connection, all the better.  Right?   And we can use existing primitives

Okay, just to be 100% clear - I am not "dancing around the issue".  I have taken the time (on numerous occasions now) to propose a viable solution to the Modify requirements.  I used a two-phase commit model to perform controlled securing of network resources against an existing reservation, then the committing and activation of these new resources within the network.  I presented the new set of operations and a funky new state machine.  At its heart it even does with the path finder what you are looking to do, however, it does not overload existing NSI operations, but instead defines a clear set of new operations in support of the reservation modification.

I will now look at your shadow proposal.

> 
> Best regards
> Jerry
> On 7/2/12 11:06 PM, John MacAuley wrote:
>> 
>> Peoples,
>> 
>> Here is the new and improved NSI CS state machine fresh off the presses and ready for your viewing pleasure.  Please study it and prepare questions for the Wednesday call.  We would like to close on this action ASAP.
>> 
>> Thank you,
>> John.
>> 
>> 
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> https://www.ogf.org/mailman/listinfo/nsi-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/nsi-wg/attachments/20120704/52785095/attachment-0001.html>