[Nsi-wg] time issue

Thu Sep 30 16:11:47 CDT 2010

Hi Jerry,

oh, wasn't my intention to raise the heat... :-)

I understand the problem, and agree with you on the very point of making 
the protocol
function correctly being the key issue here.
Here's how I would see the main points (most of it appeared in one post 
or the other,
not taking credit for it):

- the provisioning time has to be *subtracted* from the requested time
- each NSA knows how much provisioning time it needs
   - if other NSAs need to be aware of it, this needs to be 
communicated. This would
     apply to the "user initiated" provisioning, if that's gonna stay
   - in this case, the longest provisioning time in all involved domains 
has to be subtracted
     (by all NSAs) to obtain the provisioning time
- each NSA is synchronised to a good time source (requirement). A few 
seconds are
   subtracted to get the definitive provisioning time - as in Aaron's 
very good mail.
- the NSA which received the original request receives the status from 
each NSA in the path,
   and once all are up, it notifies the user agent
- if the entire path is not provisioned within a timeout value (a minute 
could be acceptable),
   an error condition is declared, and user notified, provisioning 
cancelled including teardown of
   already provisioned segments.
   Or, if the status changes are always propagated to the user agent, we 
can get rid of the
   timeout altogether, since the user will know when the circuit comes 
live. If that doesn't
   happen within the user app's timeout, it can abort.

There's no explicit synchronisation between the NSAs. So one remaining 
key issue will
be how to catch a de-synchronisation situation, and how to deal with it?
Using timestamps in the messages between the NSAs would be an easy 
option for v1.
So when an NSA gets any message from another NSA, it will check it 
against its own
clock, and raise an alarm if it detects a time skew of more than N seconds.
For v2, we can discuss if it makes sense to add explicit sync messages.

Makes sense?

Cheers,
Artur

On 09/30/2010 06:53 PM, Jerry Sobieski wrote:
> Hi Artur-  I accept the challenge!
>
> First, let me calm the nerves...  The questionof setup time - 
> particularly the issue of taking 10 minutes or more has mostly to do 
> with provisioning all optical systems where amplification and 
> attenuation across a mesh takes a significant time.   In most cases, 
> the provisioning will be a more conventional few seconds to a minute 
> (or so).     And for smaller domains with more conventional switcing 
> gear, maybe a few seconds at most.
>
> So we should all try to keep perspective here that much of this 
> discussion has to do with insuring the protocol functions correctly, 
> consistently, and reliably as service infrastructure.   And much of 
> this is driven by making sure it works such even in the corner cases 
> where it might take 15 minutes to provision, or two NSAs might have 
> clocks that differ by 10 seconds. etc.
>
> But the real world facts are that nothing is perfect, and a globally 
> distributed complex system such as our networks are never practically 
> if not theoretically going to be perfectly synchronized or offer exect 
> predictability.   We ar trying to provide more/better predictabiity, 
> not perfect predictability.
>
> You are right about the users expectations that the connection will be 
> available at the requested time.   But nothing is exact.  Even if we 
> knew and could predict exactly the setup time, if something was broken 
> in the network and we couldn't meet the committed start time, what 
> would the user do?
>
> Ok.  deep breath....exhale.... feel better?   Ok good.   Now let me 
> defend our discussions...
>
> To be blunt, it could be argued that any user application that blindly 
> puts data into the pipe without getting some verification that the 
> pipe *AND* the application agent at the far end is ready has no real 
> idea if it is working AT ALL!   If the agent at the other end is not 
> functioning (not a network problem), this is fundamentally 
> indistinguishable from a network connection not being available.   How 
> would the user be able to claim the network is broken?
>
> On the other hand, if there *is* out of band coordination going on 
> between the local user agent and the destination user agent, then the 
> application is trying to deal with an imperfect world in which it 
> needs to determine and synchronze the state of the application agent 
> on the far end before it proceeds.  --->  Why would doing so with the 
> network resource not be of equal importance?
>
> In *general* (Fuzzy logic alert) we will make the start time.  Indeed, 
> in most instances we will be ready *before* the start time.   But if 
> by chance we miss the start time by only 15 seconds, is that 
> acceptable to you?  Or to the application that just dumped 19 MBytes 
> of data down a hole?
>
> What if was the user application that had a slightly fast clock and 
> started 10 seconds early?   *His* clock said 1pm, mine said 12:59:50.  
> Who is broken?   The result is the same.   What if the delta was 5 
> minutes, or 50 milliseconds?   Where do we draw the line?   Draw a 
> line, and there will still be some misses...
>
> The point here is that nothing is perfect and exact.  And yet these 
> systems function "correctly"!  We need to construct a protocol that 
> can function in the face of these minor (on a human scale) time 
> deltas.  But even seconds are not minor on the scale that a computer 
> agent functions.  So we necessarilly need to address these nuances so 
> that it works correctly on a timescale of milliseconds and less.
>
> In order to address the issue of [typically] slight variations of 
> actual start time, we are proposing that the protocol would *always* 
> notify the originating RA when the circuit is ready, albeit after the 
> fact, but it says determinitistically "the circuit is now ready."   
> And we are also proposing a means for the RA to determine the state if 
> that ProvisionComplete message is not received when it was expected - 
> if there is a hard error or just a slow/late provisioning process 
> still taking place.
>
> But given the fact that we cannot *exactly* synchronize each and every 
> agent and system around the world- and keep them that way, and that we 
> cannot predict perfectly how long each task will take before the fact, 
> we have to face facts that we need to be able to function correctly 
> with these uncertainties.   Without meaning to preach, the user 
> application needs to do so too.
>
> Small is relative. (there is an old joke here about a prositute and an 
> old man...but I won't go into it.:-)
>
> Best regards
> Jerry
>
>
>
> So we want to provide the service at the request time.  And we will 
> make our best effort to do so.  And in most cases we will succeed.  
> But what will the application do if we miss it?     What should the 
> protocol do in an imperfect world?   It truly cannot function on fuzzy 
> logic.
>
> One approach to addressing this is to say the RA will always be 
> notified when the connection goes into service.   This is a positive 
> sign that the connection is end-to-end.
>
> Artur Barczyk wrote:
>> Hi Radek, All,
>>
>> hmmmm, I for my part would be quite annoyed (to put it mildly), if I 
>> miss the first
>> 15 minutes of todays HD conf call just because I reserved the 
>> resources a week
>> in advance. "Around" has no place in a well defined protocol. No 
>> fuzzy logic, please :-)
>> Consider also the "bored child in a car" scenario:
>> RA: are we there yet? PA: no... RA: are we there yet? PA: nooo.... 
>> RA: are we there yet? PA: NO! etc.
>>
>> Be aware that users complaining are users quite quickly lost. You 
>> don't want that.
>>
>> So let's consider two example users:
>> - high volume data transfers through a managed system: a data 
>> movement scheduler has
>>   reserved some bandwidth at a given time. This time comes, the 
>> application will just
>>   throw data on the network, it might use connection-less protocol, 
>> or not, but it will
>>   result in an error. It cannot wait "around" 15 minutes, as it will 
>> bring the transfer schedule
>>   in complete disorder. Such a "service" is just useless.
>> - video conferencing/streaming. You reserve the network resource for 
>> 3pm because your
>>   meeting starts then. How do you explain to the video conference 
>> participant that the
>>   network prevented the conference to start for "around" 15 minutes? 
>> (Well, you can, but
>>   this will be the last time you'll see the user using your network :-) )
>>
>> In short, the only reasonable thing to do is to put the right 
>> mechanism in place to
>> guarantee the service is up when the user requested it (and you 
>> confirmed it).
>> The only acceptable reason for failing this is an error condition 
>> like network down (and we'll
>> talk about protection in v2 :-) )
>>
>> I also think it is very dangerous to use "providing a service" as 
>> argument while the underlying
>> protocols are not yet correctly specified. This is not theoretical, 
>> the service needs to be useful
>> to the end-user, if you want some uptake. Fuzzy statements make it 
>> useless. The very reason people
>> are interested in this is that it's deterministic - you know what you 
>> get and when. Otherwise use the
>> routed network. :-)
>>
>> Cheers,
>> Artur
>>
>>
>>
>> On 09/30/2010 03:37 PM, Radek Krzywania wrote:
>>>
>>> Hi,
>>>
>>> It’s getting hard to solve everything here, so let’s don’t try to 
>>> solve everything here at once. So how about defining a start time as 
>>> a best effort for v1? So we promise to deliver the service, yet we 
>>> are unable to guarantee the exact start time in precision of 
>>> seconds. If user want connection to be available at 2pm, it will be 
>>> around that time, but we can’t guarantee when exactly (1:50, 2:01, 
>>> 2:15). Let’s take a quite long time as a timeout (e.g. 20 minutes), 
>>> and start booking the circuit in 5 or 10 minutes in advance (no 
>>> discussion for v1, just best feeling guess) . The result will be 
>>> that in most cases we will deliver the service at AROUND specified 
>>> time. For v1 is enough, as we will be able to deliver a service, 
>>> while in v2 we can discuss possible upgrades (unless our engineering 
>>> approach discovers it’s fine enough :) ).
>>>
>>> For #1 – it may a problem for instant reservations. Here user want a 
>>> circuit ASAP. We define ASAP as (see above approach) less than 20 
>>> minutes (typically 5-10 minutes probably, but that’s my guess), or 
>>> not at all. Users may or may not complain on that. In the first case 
>>> we are good. For the second case we will need to design an upgrade 
>>> for v2.
>>>
>>> Synchronization IMHO is important, and out of scope at the same 
>>> time. We can make an assumption that agents times are synchronized 
>>> with precision of let say 10 seconds, which should be far enough. 
>>> The agents will use system clocks, so they need to be synchronized 
>>> at the end (NTP or whatever), but that not even implementation but 
>>> deployment issue. So let put into specification: “NSI protocol 
>>> requires time synchronization with precision not less than 
>>> 10seconds”. If we discover it’s insufficient, let’s upgrade it for v2.
>>>
>>> We already have some features to implement, just to see if it works 
>>> fine (works at all, actually). If user is booking a circuit a week 
>>> in advance, I guess he will not mind if we set it up 15 minutes 
>>> after start time (user IS aware of that as we specify this in the 
>>> protocol description). We can’t however deliver the service shorter 
>>> than user defined time. So we can agree (by voting, not discussing) 
>>> the fixed time values. My proposal is as above:
>>>
>>> - 20 minutes for reservation as set up time
>>>
>>> - Service availability time (e.g. 13 h)
>>>
>>> - Service tear down time (it’s not important from user perspective, 
>>> as since any segment of connection is removed, the service is not 
>>> available any more, but let’s say 15 minutes)
>>>
>>> In that way, calendar booking needs to have reserve resources for 
>>> 13h 35 minutes. IMHO we can agree on that by simply vote for v1 
>>> (doodle maybe), and collect more detailed requirements for v2 later 
>>> on. I get the feeling we started quite theoretical discussion based 
>>> on assumptions and guessing “what if”, instead of focusing on 
>>> delivering any service (event with limited guarantee).
>>>
>>> Best regards
>>>
>>> Radek
>>>
>>> ________________________________________________________________________
>>>
>>> Radoslaw Krzywania                      Network Research and Development
>>>
>>>                                            Poznan Supercomputing and
>>>
>>> radek.krzywania at man.poznan.pl 
>>> <mailto:radek.krzywania at man.poznan.pl>                   Networking 
>>> Center
>>>
>>> +48 61 850 25 26 http://www.man.poznan.pl
>>>
>>> ________________________________________________________________________
>>>
>>> *From:* nsi-wg-bounces at ogf.org [mailto:nsi-wg-bounces at ogf.org] *On 
>>> Behalf Of *Jerry Sobieski
>>> *Sent:* Wednesday, September 29, 2010 9:33 PM
>>> *To:* Jeff W.Boote
>>> *Cc:* nsi-wg at ogf.org
>>> *Subject:* Re: [Nsi-wg] time issue
>>>
>>> Ok.  I can buy this approach of #1.   The Requested Start Time is 
>>> immutable as the request goes down the tree (which disallows #2) - 
>>> it is still a Requested Start Time, but NSAs are not allowed to 
>>> change requested start time as the request goes down the tree.   But 
>>> you can't prevent #3 if thats what an NSA somewhere down the tree 
>>> decides to do.   The result would be a promise he may not be able to 
>>> keep - but thats acceptable because the Estimated Start Time is just 
>>> an estimate, its not binding.
>>>
>>> The point is, the local NSA cannot tell whether a remote NSA is 
>>> using #1 or #3 since its totally up to the remote NSA to select the 
>>> guard time appropriate for that request.   Likewise, even if the 
>>> remote NSA misses the Estimated Start Time, the requesting RA has no 
>>> recourse other than to a) just wait until the provisioning completes 
>>> or b) give up and release the connection.    An SLA might influence 
>>> the bad NSA to not low ball his provisioning guard time in the 
>>> future, or it may provide a rebate for the jilted user, but these 
>>> are not a protocol or a standards issue.
>>>
>>> This goes to John's comment on the call today about what happens 
>>> inside the NSA between the PA role and the RA role...  These actions 
>>> are captured in "state routines" that are invoked when protocol 
>>> events occur.   These actions are generalized in the standard, but 
>>> any heuristics like these approaches to guard time cannot always be 
>>> mandated.   In a protocol standard, what ever components are 
>>> "required" or "must" items, must be verifiable in a conformance 
>>> test.   I.e. if someone comes up with an NSI imlementation, we 
>>> should be able to put the reference implementation against the test 
>>> implementation and we should be able to tell via protocol operation 
>>> if the implementation under test is doing all the "must" items.   If 
>>> we say an NSA must use #1 above, there is no way to test it and 
>>> confirm that it is doing so.   If the test implementation uses #3, 
>>> the only outward sign is that it may miss the start time on some 
>>> connection(s), but it could have as easily just been a poor judgment 
>>> call on the provisioning time - which is ok.
>>>
>>> So, in the standard, we can only recommend #1 be used.   Or we can 
>>> say the NSA "should" use #1.   But we cannot require it.
>>>
>>> my $.02
>>> Jerry
>>>
>>> Jeff W.Boote wrote:
>>>
>>> On Sep 29, 2010, at 7:31 AM, Gigi Karmous-Edwards wrote:
>>>
>>>
>>>
>>> Jerry,
>>>
>>> For your question : " While the guard times may be network specific, 
>>> we do need to at least consider what we would like an NSA to do if 
>>> for instance a provisioning guard time pushes a reservation forward 
>>> into a previous reservation:   Do we  1) reject the request since we 
>>> can't prepend our guard time and still make the Requested Start 
>>> Time?   OR  2)  Do we retard the Estimated Start Time to allow for 
>>> the guard time?   OR 3) do we reduce the guard time to fit the 
>>> available lead time?"
>>>
>>> In my opinion, I  think the answer here has to be # 1) each NSA must 
>>> reject the request if their process to establish the connection 
>>> requested can not meet the Start time. In my opinion an NSA should 
>>> NOT be allowed to change the requested start time (this will cause 
>>> all types of problems for other NSAs), so # 2) is not an option. The 
>>> guard time for each NSA will most likely be vastly different and 
>>> very dependent on the tools used by that network domain to configure 
>>> the network elements for the requested path, so an individual guard 
>>> time of an NSA is also nonnegotiable, so option # 3) is not an option.
>>>
>>> I agree #1 seems the most deterministic.
>>>
>>>
>>>
>>>
>>> I agree with Radek, ONLY Start times and End times should be used in 
>>> the protocol and that guard times are only private functions of each 
>>> individual NSA.
>>>
>>> I agree with this. The guard times are not additive across each NSA. 
>>> The guard time from the perspective of the user will effectively be 
>>> the maximum of each NSAa guard time in the chain. But, the user 
>>> doesn't care as long as provisioning is accomplished by the users 
>>> requested start time. That time would be in the protocol and would 
>>> remain unchanged through each step of the chain. And, it shouldn't 
>>> matter how long it takes to tear down the circuit either as long as 
>>> the circuit is available until their requested end time.
>>>
>>> As to how to manage this time synchronization... I think it is 
>>> totally reasonable to depend upon existing protocols. There are 
>>> other protocols that already depend upon time synchronization, and 
>>> many of them use NTP. We are not talking about needing very tight 
>>> synchronization anyway. 1 second or even 10 seconds is plenty close 
>>> enough. It is more about bounding that error.
>>>
>>> jeff
>>>
>>>
>>>
>>>
>>> Kind regards,
>>> Gigi
>>>
>>> On 9/29/10 8:45 AM, Jerry Sobieski wrote:
>>>
>>> Hi Inder-   I am not sure I agree with all of this...
>>>
>>> Inder Monga wrote:
>>>
>>> Radek
>>>
>>> I agree with your statements;
>>>
>>>      User is not interested in partial results, as he/she is not
>>>     even aware/interested in which NSAs/domains are involved. User
>>>     doesn’t care (if everything works fine ;) ).
>>>
>>> The protocol should be designed with the user in mind. The user does 
>>> not care about guard time values, differences in setup times for 
>>> MPLS vs optical lambdas, and concern itself with choices an NSA/NRM 
>>> will make in path-finding.
>>>
>>> The protocol designers can keep the user in mind, but /the protocol 
>>> is between the RA and the PA/ and and has a specific purpose: to 
>>> reserve and instantiate a connection across the globe.  We need to 
>>> keep in mind that the RA is not always the end user - it is by 
>>> definition another NSA and could be an NSA in the tree/chain 
>>> somewhere.  If we want to differentiate between the user and the 
>>> network, then we can create a simplified User to Network API, and a 
>>> different Network to Network API...but I don't think thats what we 
>>> want to do (:-)   We need to IMO *not* think about the user, but to 
>>> think about the Requesting Agent - regardless of who it represents.
>>>
>>> Perhaps once the RA-PA protocol is tightly defined in all its 
>>> nuances, we can develop/recommend an end user API that simplifies 
>>> the the application's required interactions ??   This would allow an 
>>> application to embed an RA in a runtime library/module and the 
>>> application itself would only have to deal with the basic connection 
>>> requirements....  just a thought.
>>>
>>> In my opinion,
>>>
>>> a. the user should specify "Expected Start Time, Expected End Time". 
>>> The NSAs/domains along the path determine resource availability and 
>>> booking in their schedules based on their own configured guard time 
>>> (guard times are not specified by NSI protocol. NSI connection 
>>> service architecture should discuss them as a suggested concept).
>>>
>>> While the guard times may be network specific, we do need to at 
>>> least consider what we would like an NSA to do if for instance a 
>>> provisioning guard time pushes a reservation forward into a previous 
>>> reservation:   Do we  1) reject the request since we can't prepend 
>>> our guard time and still make the Requested Start Time?   OR  2)  Do 
>>> we retard the Estimated Start Time to allow for the guard time?   OR 
>>> 3) do we reduce the guard time to fit the available lead time?
>>>
>>> I think we now agree that the Start Time is just an estimate, due 
>>> primarily to the guard time itself being just an estimate.  So none 
>>> of these times are etched in stone...So which option do we recommend 
>>> or require?   The protocol is sensitive to these various times - 
>>> they cause timers to go off, messages to be sent, error handling to 
>>> kick in...   If they are adjusted during scheduling or provisioning, 
>>> we MUST understand what impact they will have to the protocol and 
>>> how that will be carried through the service tree.
>>>
>>> b. Within reasonable limits, the connection should be up as close to 
>>> the start time as possible. The user can set his own 
>>> policy/configuration on how long to wait after the start time to 
>>> accept a connection. Since the resources are guaranteed, this is a 
>>> connection of setup/provisioning only. Hence, there is no protocol 
>>> state transition when start time is passed other than the messages 
>>> that indicate the circuit is established end to end or teardown 
>>> message initiated by the client.
>>>
>>> Ah, but the rub here is that the "user" is an RA...but not all RAs 
>>> are the end user.  We are defining the actions of an RA, regardless 
>>> of whether it is a user NSA or an network NSA.  So we must insure 
>>> that if the RA gets tired of waiting for provisioning to complete, 
>>> that whatever actions it is allowed to take will be consistent and 
>>> predictable through out the service tree for all the RA/PA 
>>> interactions.    So the "user" actions are not irrelevant to the 
>>> protocol.
>>>
>>>
>>> c. We should not design a protocol that depends on time 
>>> synchronization to work. In my opinion, the start time, expected 
>>> time to provision aka guard time is best handled/shared as a 
>>> SLA/Service definition issue.
>>>
>>> I agree:  We cannot expect perfectly/exactly synchronized clocks 
>>> anywhere in the network.  And therefore we cannot depend upon clock 
>>> synchronization for any part of the protocol to work.   Which 
>>> implies that the protocol must work when the clocks are NOT 
>>> synchronized.   How do we insure this?   --> rigorous protocol analysis.
>>>
>>> While the values of certain timers may be left to the Service 
>>> Definition/SLA, as I state before, we must make sure that the 
>>> protocol can function predictably and consistently in the face of 
>>> all possible timing permutations that are possible among NSAs.  This 
>>> rapidly gets very complex if we allow too many variables for the 
>>> SD/SLA to define.  Sometimes, its ok to identify constants that the 
>>> protocol must use so that we can validate the protocol and simplify 
>>> implementation and deployment.  Indeed, often times when clocks are 
>>> only slightly skewed they introduce race conditions that become more 
>>> likely to occur requiring more careful consideration.
>>>
>>> d. Similar semantics apply to the end-time as well.
>>>
>>> Pretty much.  Across the board,  things like clock events, 
>>> estimates, and service specific choices will create situations where 
>>> we need to insure  the protocol and state machines will function 
>>> properly across the full range of possible permuted values.   This 
>>> is in general why protocol designers say "make it only as complex as 
>>> it needs to be, and no more" - options breed complexity.
>>>
>>> br
>>> Jerry
>>>
>>>
>>>   
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org  <mailto:nsi-wg at ogf.org>
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>        
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>>
>>>
>>>   
>>> ------------------------------------------------------------------------
>>>   
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org  <mailto:nsi-wg at ogf.org>
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>    
>>>
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>    
>>
>> -- 
>> Dr Artur Barczyk
>> California Institute of Technology
>> c/o CERN, 1211 Geneve 23, Switzerland
>> Tel:    +41 22 7675801

-- 
Dr Artur Barczyk
California Institute of Technology
c/o CERN, 1211 Geneve 23, Switzerland
Tel:    +41 22 7675801

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100930/c1e86951/attachment-0001.html