[Nsi-wg] time issue

Wed Sep 29 08:24:22 CDT 2010

Hi Jeff - glad to see you chime in!   See my response in line...

Jeff W. Boote wrote:
>
> On Sep 28, 2010, at 3:40 PM, Jerry Sobieski wrote:
>
>> Hi Radak - see comment below...
>>
>> Radek Krzywania wrote:
>>> Hi Jerry,
>>> IMHO setup and tear down times should be considered global (all NSA 
>>> along a path). User is not interested in partial results, as he/she 
>>> is not even aware/interested in which NSAs/domains are involved. 
>>> User doesn’t care (if everything works fine ;) ).
>> But we cannot expect all NSAs clocks to be exactly synchronized.    
>> Clocks are critical to bookahead scheduling, and independent 
>> quasi-synchronized clocks (however slightly skewed) will cause 
>> problems.    Some of those problems are evident in this discussion.
>
> Exact synchronization is not required. The protocol can (and probably 
> should) define a reasonable synchronization requirement. i.e. NSAs 
> MUST be synchronized within 1 second. (Or even 10 seconds) That should 
> be a relatively trivial requirement, and bounds this problem.
I think the gotchya here is that even if we define a maximum skew 
between two interacting NSAs, that skew can be additive from one NSA to 
the next down/across the service tree.  And if the service tree contains 
a number of NSAs, that skew may become large.   Despite this,  whether 
the skew is large or small, it must still be analyzed carefully.  The 
protocol can be defined to handle these effectively - we just need to do 
a thorough analysis of the timing permutations.   This is careful work, 
but not particularly difficult. certainly not a roadblock.

Further, if we require *any* type of clock synchronization for the 
protocol to work, we need to then define a mechanism within the protocol 
to either synchronize clocks or to at least detect broken clocks.   And 
this check must be performed periodically during the NSA-NSA session to 
insure the clocks don't drift out of conformance.   IMO, any type or 
required clock synchronization creates substantial complexity.   I think 
it safe to assume that NSA clocks will be "approximately" the same, but 
we still need to handle even slight skew effects rigorously in the protocol.

*Question for discussion:  * What happens if the time-of-day clocks are 
way off?   Effectively, the reservations may never be successful, or the 
provisioning may never succeed.   But the protocol should still work 
correctly even if someone's calendar is messed up.   Remember: The day 
clock may be messed up in one NSA, but the protocol agent must function 
for many possible service trees.     We can discuss this:  I don't think 
we want to make it a function of the Connection Service to insure that 
reservation clocks are right.  Or maybe we should?   Maybe the 
"scheduling" function  -which uses a time-of-day clock that *should* be 
close to correct does need some coordination...what do folks think?    
IMO, the timers and protocol functions for the connection life cycle 
state machine should be able to function with independent clocks that 
are approximately correct - whether that error is a few milliseconds or 
a few days.

>
>>
>> Open issue for discussion:  How do we address the issue of clock skew 
>> across a scheduled network? 
>>>  
>>> For tear down, it does not matter where you start to removing the 
>>> configuration (end point, or any point along the path). Since you 
>>> remove single configuration point – the service is not available any 
>>> more. That the time where available time ends.
>> Actually, I would assert that the "available" time ends when the End 
>> Time elapses.  The "End Time" is by definition when the connection is 
>> no longer available, and the user should therefore not assume they 
>> are usable past that time.   Maybe the path resources will stay in 
>> place, but maybe not.  During reconfiguration, the state of cross 
>> connects along the path is indeterminate, and if /when they are 
>> reconfigured, then user data can be [mis]routed to 3rd parties, and 
>> 3rd party data may be [mis]routed to the egress STP.    
>>
>> The real issue in my mind is "when is the actual End Time?"  Given 
>> that we cannot guarantee exactly when each NSA may reach their 
>> repective End Time, the End Time should be (IMHO:-) an Estimated End 
>> Time "plus or minus", and the user should consider the ramifications 
>> of this.   
>>
>> We do not know which NSA will reach End Time first and begin to tear 
>> down the connection.  Nor do we know the delta between this first 
>> NSA's clock and the user's clock.   
>
> I don't think the NSA should be attempting to understand the delta of 
> the users clock. It can simply treat requests relative to 'true time'. 
> And, we should expect End Time to be done similar to Start Time. In 
> other words, the users requested time indicates when they expect to 
> 'use' the circuit. If tear-down takes time, the resource time should 
> add a delta to that. Tear-down should not start until 'after' the 
> users requested end time.
But who has the "True Time"?   This is the fundamental problem.   Every 
NSA thinks their time is the One True Time (:-).  And they all vary a 
little.   How does the protocol react when it discovers that some 
message or event has not occured according to what it believes to be the 
proper time?  Clocks are either *exactly* synchronized, or they are 
not.  Since the latter is the real world case, we need to just make sure 
the protocol is deigned to handle that.   

IMO, we should treat time as relative.  I.e. Each NSA maintains its own 
Time, but it must allow for others who's Time may be skewed.   So we 
need to consider what it actually /means/ when an event occurs in each 
state and design to protocol to react accordingly.
>>
>> I think the ideal situation is that the user sees the Estimated End 
>> Time approaching (via their own clock) and stops sending some user 
>> defined time prior to End Time.  The user lets the connection drain - 
>> still prior to End TIme.    Once the user traffic is drained, the 
>> user RA issues a Release Request.  We can, I think, assume that if 
>> the user issues the manual Release Request before the scheduled 
>> availability has ended, then the user has verified that no important 
>> data remains in the pipe.    But due to clock skew and due to the 
>> estimated End Time, the user's estimate of how much time is remaining 
>> may be substantially under-estimated.   This is why I think maybe a 
>> 2-minute warning might be useful - the user can request a warning of 
>> "n" seconds, and that warning will bubble up from the NSA who's clock 
>> is most advanced.  The user can then throttle down their traffic 
>> accordingly and then issue a Release Request.    While this might be 
>> nice, it is not fundamentally necessary for the protocol.  It helps 
>> the user, but the protocol must still be able to deterministically 
>> handle a user that ignores the warning and drives off the cliff.
>>
>> Fundamentally, we want to make sure the user isn't surprised by an 
>> earlier than expected release due to clock skew.    And we won't know 
>> until close to the End Time who's clock is going to trigger the 
>> Release first.  The warning announces reveals that NSA and gives the 
>> user fair warning.
>
> This is reasonable and becomes manageable by the user if the protocol 
> defines the maximum clock skew allowed.
Ah...but as noted above - even a maximum clock skew is additive.  And 
the skew is measured against...what?  and when is it measured?  how often?
>>
>> Whatever the method for initiating the release,  the network should 
>> insure that any user data accepted at the ingress prior to the End 
>> Time is not misrouted - even after the end time.    The network's 
>> only options are to try to deliver in-flight data properly or drop 
>> it.   Since the End Time has been reached, the network can no longer 
>> assume that any segments are still usable, so delivering it is not 
>> really an option either.  The network must drop any stranded traffic. 
>>   Thus, we need to have some means of blocking new ingress data, and 
>> insuring bytes in flight get dropped asap.  
>>
>> One might take a different view if we hold the connection in place 
>> for some safety/guard time past the local End Time.  This would do 
>> several things:  1) it would make sure the End Time has elapsed for 
>> all NSAs especially the user RA thus allowing full use during the 
>> available timeframe, and b) wait a few mils longer (latency time) so 
>> that any data in flight is delivered.  At this point (after all NSAs 
>> have reached the end time plus a latency factor) any remaing data in 
>> flight was definately sent after the reservation.  Bad user, bad 
>> user.   In this case, any data in flight is no longer the network's 
>> concern.   Then, we can reconfigure without regard to securing the 
>> user information. 
>>
>> Finally, we might consider how to insure that the connection is not 
>> torn down until *all* NSAs have reached the End Time.   THis could be 
>> indicated by flooding a "End Time Alert" notification  or some simlar 
>> message along the tree.   When that message is acknowledge by all 
>> NSAs in the connection, then a Release can begin.   Of course, here 
>> again, if an acknowledge is not received in a finite time, the 
>> connection is torn down unilaterally.   
>>
>> I do however, think we need to address End Time processing in V1.0   
>> This is important - we need to have a clearly defined lifecycle and 
>> primitievs that do not promise something the protocol cannot 
>> deliver.   >From this discussion, we cannot clearly state when the 
>> availability of the connection ends.
>>
>> These are some very interesting and challenging nuances.   I hope 
>> this was useful musings...
>> br
>> Jerry
>>> We can discuss whether it should be synchronized or signaled, but I 
>>> would even left it for v2 (or v1.1, or whatever we decide). Since 
>>> ALL segments of connection has configuration removed, the resource 
>>> time is ended. I agree that resource time is difficult to forecast, 
>>> yet we need to fit that into calendar full of other reservations and 
>>> synchronize them.  Thus we need to estimate, guess, or use magic to 
>>> get those values as realistic as possible. Overlapping is forbidden, 
>>> and leaving gaps of unused resources will be waste of resources and 
>>> money at the end.
>>>  
>>> “Two minute warning” is not speaking to me. I don’t see a reason to 
>>> warn a domain or user that the connection will be closed soon, while 
>>> user knows what was requested and domain is tracking that with a 
>>> calendar. We can discuss some internal notifiers, but that’s 
>>> implementation.
>>>  
>>> Best regards
>>> Radek
>>>  
>>> ________________________________________________________________________
>>> Radoslaw Krzywania                      Network Research and Development
>>>                                            Poznan Supercomputing and 
>>> radek.krzywania at man.poznan.pl 
>>> <mailto:radek.krzywania at man.poznan.pl>                   Networking 
>>> Center
>>> +48 61 850 25 26                             http://www.man.poznan.pl
>>> ________________________________________________________________________
>>>  
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100929/0479810b/attachment-0001.html