[Nsi-wg] time issue

Thu Sep 30 15:51:57 CDT 2010

Very good, support.

Cheers,
Artur

On 09/30/2010 07:54 PM, Aaron Brown wrote:
> I'm probably oversimplifying, but it seems to me this problem becomes 
> much easier with Jeff's idea about having all clocks synchronized 
> within a period of no more than some number seconds. If the clocks 
> aren't synchronized, you run into a whole bunch of errors related to 
> making absolute time-based reservations anyway.
>
> The protocol mandates clock offsets of no more than X seconds. Each 
> domain selects its own setup time of "no more than Y minutes" and a 
> tear down time of "Z minutes". If a user requests a reservation from 
> time A to time B, the domain reserves time from A-X-Y through B+X+Z. 
> When it comes to setting up the circuit, the domain starts setting it 
> up at time A-X-Y. If the circuit isn't ready by time A-X, the domain 
> throws a setup error and handles that error condition the way it'd 
> handle an actual error occurred during circuit setup. The circuit 
> remains active until time B+X, at which time the domain starts tearing 
> it down. If, while the circuit is running, the hosts become 
> desychronized, one of the domains will (from the either the clients or 
> other domains' perspectives) end the circuit earlier than expected and 
> report the tear down. The other domains/clients will handle that 
> similar to if a cancel had occurred.
>
> Again, I may be vastly oversimplifying the problem.
>
> Cheers,
> Aaron
>
> On Sep 30, 2010, at 1:31 PM, Radek Krzywania wrote:
>
>> Hi,
>> Setting up a circuit via Alcatel NMS takes 2 minutes. This time is 
>> mostly consumed by NMS to find a path through domain and warm the 
>> room with CPU heat. A seconds or minute is still a guess anyway :) I 
>> can agree to use those values (instead of 20 minutes) but according 
>> to my current experience – lot of timeouts will appear.
>> I fully support the statement of “We are trying to provide 
>> more/better predictability, not perfect predictability.” This should 
>> be on the title page of NSI design BTW :) All the case is about 
>> everything is relevant and not exact. The example of user clocks 
>> depicts it quite well (thanks Jerry for pointing that).
>> Best regards
>> Radek
>> ________________________________________________________________________
>> Radoslaw Krzywania                      Network Research and Development
>>                                            Poznan Supercomputing and
>> radek.krzywania at man.poznan.pl 
>> <mailto:radek.krzywania at man.poznan.pl>                   Networking 
>> Center
>> +48 61 850 25 26 http://www.man.poznan.pl
>> ________________________________________________________________________
>> *From:* Jerry Sobieski [mailto:jerry at nordu.net]
>> *Sent:* Thursday, September 30, 2010 6:54 PM
>> *To:* Artur Barczyk
>> *Cc:* radek.krzywania at man.poznan.pl 
>> <mailto:radek.krzywania at man.poznan.pl>; 'Jeff W.Boote'; 
>> nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>> *Subject:* Re: [Nsi-wg] time issue
>> Hi Artur-  I accept the challenge!
>>
>> First, let me calm the nerves...  The questionof setup time - 
>> particularly the issue of taking 10 minutes or more has mostly to do 
>> with provisioning all optical systems where amplification and 
>> attenuation across a mesh takes a significant time.   In most cases, 
>> the provisioning will be a more conventional few seconds to a minute 
>> (or so).     And for smaller domains with more conventional switcing 
>> gear, maybe a few seconds at most.
>>
>> So we should all try to keep perspective here that much of this 
>> discussion has to do with insuring the protocol functions correctly, 
>> consistently, and reliably as service infrastructure.   And much of 
>> this is driven by making sure it works such even in the corner cases 
>> where it might take 15 minutes to provision, or two NSAs might have 
>> clocks that differ by 10 seconds. etc.
>>
>> But the real world facts are that nothing is perfect, and a globally 
>> distributed complex system such as our networks are never practically 
>> if not theoretically going to be perfectly synchronized or offer 
>> exect predictability.   We ar trying to provide more/better 
>> predictabiity, not perfect predictability.
>>
>> You are right about the users expectations that the connection will 
>> be available at the requested time.   But nothing is exact.  Even if 
>> we knew and could predict exactly the setup time, if something was 
>> broken in the network and we couldn't meet the committed start time, 
>> what would the user do?
>>
>> Ok.  deep breath....exhale.... feel better?   Ok good.   Now let me 
>> defend our discussions...
>>
>> To be blunt, it could be argued that any user application that 
>> blindly puts data into the pipe without getting some verification 
>> that the pipe *AND* the application agent at the far end is ready has 
>> no real idea if it is working AT ALL!   If the agent at the other end 
>> is not functioning (not a network problem), this is fundamentally 
>> indistinguishable from a network connection not being available.   
>> How would the user be able to claim the network is broken?
>>
>> On the other hand, if there *is* out of band coordination going on 
>> between the local user agent and the destination user agent, then the 
>> application is trying to deal with an imperfect world in which it 
>> needs to determine and synchronze the state of the application agent 
>> on the far end before it proceeds.  --->  Why would doing so with the 
>> network resource not be of equal importance?
>>
>> In *general* (Fuzzy logic alert) we will make the start time.  
>> Indeed, in most instances we will be ready *before* the start time.   
>> But if by chance we miss the start time by only 15 seconds, is that 
>> acceptable to you?  Or to the application that just dumped 19 MBytes 
>> of data down a hole?
>>
>> What if was the user application that had a slightly fast clock and 
>> started 10 seconds early?   *His* clock said 1pm, mine said 
>> 12:59:50.  Who is broken?   The result is the same.   What if the 
>> delta was 5 minutes, or 50 milliseconds?   Where do we draw the 
>> line?   Draw a line, and there will still be some misses...
>>
>> The point here is that nothing is perfect and exact.  And yet these 
>> systems function "correctly"!  We need to construct a protocol that 
>> can function in the face of these minor (on a human scale) time 
>> deltas.  But even seconds are not minor on the scale that a computer 
>> agent functions.  So we necessarilly need to address these nuances so 
>> that it works correctly on a timescale of milliseconds and less.
>>
>> In order to address the issue of [typically] slight variations of 
>> actual start time, we are proposing that the protocol would *always* 
>> notify the originating RA when the circuit is ready, albeit after the 
>> fact, but it says determinitistically "the circuit is now ready."   
>> And we are also proposing a means for the RA to determine the state 
>> if that ProvisionComplete message is not received when it was 
>> expected - if there is a hard error or just a slow/late provisioning 
>> process still taking place.
>>
>> But given the fact that we cannot *exactly* synchronize each and 
>> every agent and system around the world- and keep them that way, and 
>> that we cannot predict perfectly how long each task will take before 
>> the fact, we have to face facts that we need to be able to function 
>> correctly with these uncertainties.   Without meaning to preach, the 
>> user application needs to do so too.
>>
>> Small is relative. (there is an old joke here about a prositute and 
>> an old man...but I won't go into it.:-)
>>
>> Best regards
>> Jerry
>>
>>
>>
>> So we want to provide the service at the request time.  And we will 
>> make our best effort to do so.  And in most cases we will succeed.  
>> But what will the application do if we miss it?     What should the 
>> protocol do in an imperfect world?   It truly cannot function on 
>> fuzzy logic.
>>
>> One approach to addressing this is to say the RA will always be 
>> notified when the connection goes into service.   This is a positive 
>> sign that the connection is end-to-end.
>>
>> Artur Barczyk wrote:
>> Hi Radek, All,
>>
>> hmmmm, I for my part would be quite annoyed (to put it mildly), if I 
>> miss the first
>> 15 minutes of todays HD conf call just because I reserved the 
>> resources a week
>> in advance. "Around" has no place in a well defined protocol. No 
>> fuzzy logic, please :-)
>> Consider also the "bored child in a car" scenario:
>> RA: are we there yet? PA: no... RA: are we there yet? PA: nooo.... 
>> RA: are we there yet? PA: NO! etc.
>>
>> Be aware that users complaining are users quite quickly lost. You 
>> don't want that.
>>
>> So let's consider two example users:
>> - high volume data transfers through a managed system: a data 
>> movement scheduler has
>>   reserved some bandwidth at a given time. This time comes, the 
>> application will just
>>   throw data on the network, it might use connection-less protocol, 
>> or not, but it will
>>   result in an error. It cannot wait "around" 15 minutes, as it will 
>> bring the transfer schedule
>>   in complete disorder. Such a "service" is just useless.
>> - video conferencing/streaming. You reserve the network resource for 
>> 3pm because your
>>   meeting starts then. How do you explain to the video conference 
>> participant that the
>>   network prevented the conference to start for "around" 15 minutes? 
>> (Well, you can, but
>>   this will be the last time you'll see the user using your network :-) )
>>
>> In short, the only reasonable thing to do is to put the right 
>> mechanism in place to
>> guarantee the service is up when the user requested it (and you 
>> confirmed it).
>> The only acceptable reason for failing this is an error condition 
>> like network down (and we'll
>> talk about protection in v2 :-) )
>>
>> I also think it is very dangerous to use "providing a service" as 
>> argument while the underlying
>> protocols are not yet correctly specified. This is not theoretical, 
>> the service needs to be useful
>> to the end-user, if you want some uptake. Fuzzy statements make it 
>> useless. The very reason people
>> are interested in this is that it's deterministic - you know what you 
>> get and when. Otherwise use the
>> routed network. :-)
>>
>> Cheers,
>> Artur
>>
>>
>>
>> On 09/30/2010 03:37 PM, Radek Krzywania wrote:
>> Hi,
>> It’s getting hard to solve everything here, so let’s don’t try to 
>> solve everything here at once. So how about defining a start time as 
>> a best effort for v1? So we promise to deliver the service, yet we 
>> are unable to guarantee the exact start time in precision of seconds. 
>> If user want connection to be available at 2pm, it will be around 
>> that time, but we can’t guarantee when exactly (1:50, 2:01, 2:15). 
>> Let’s take a quite long time as a timeout (e.g. 20 minutes), and 
>> start booking the circuit in 5 or 10 minutes in advance (no 
>> discussion for v1, just best feeling guess) . The result will be that 
>> in most cases we will deliver the service at AROUND specified time. 
>> For v1 is enough, as we will be able to deliver a service, while in 
>> v2 we can discuss possible upgrades (unless our engineering approach 
>> discovers it’s fine enough :) ).
>> For #1 – it may a problem for instant reservations. Here user want a 
>> circuit ASAP. We define ASAP as (see above approach) less than 20 
>> minutes (typically 5-10 minutes probably, but that’s my guess), or 
>> not at all. Users may or may not complain on that. In the first case 
>> we are good. For the second case we will need to design an upgrade 
>> for v2.
>> Synchronization IMHO is important, and out of scope at the same time. 
>> We can make an assumption that agents times are synchronized with 
>> precision of let say 10 seconds, which should be far enough. The 
>> agents will use system clocks, so they need to be synchronized at the 
>> end (NTP or whatever), but that not even implementation but 
>> deployment issue. So let put into specification: “NSI protocol 
>> requires time synchronization with precision not less than 
>> 10seconds”. If we discover it’s insufficient, let’s upgrade it for v2.
>> We already have some features to implement, just to see if it works 
>> fine (works at all, actually). If user is booking a circuit a week in 
>> advance, I guess he will not mind if we set it up 15 minutes after 
>> start time (user IS aware of that as we specify this in the protocol 
>> description). We can’t however deliver the service shorter than user 
>> defined time. So we can agree (by voting, not discussing) the fixed 
>> time values. My proposal is as above:
>> 20 minutes for reservation as set up time
>> Service availability time (e.g. 13 h)
>> Service tear down time (it’s not important from user perspective, as 
>> since any segment of connection is removed, the service is not 
>> available any more, but let’s say 15 minutes)
>> In that way, calendar booking needs to have reserve resources for 13h 
>> 35 minutes. IMHO we can agree on that by simply vote for v1 (doodle 
>> maybe), and collect more detailed requirements for v2 later on. I get 
>> the feeling we started quite theoretical discussion based on 
>> assumptions and guessing “what if”, instead of focusing on delivering 
>> any service (event with limited guarantee).
>> Best regards
>> Radek
>> ________________________________________________________________________
>> Radoslaw Krzywania                      Network Research and Development
>>                                            Poznan Supercomputing and
>> radek.krzywania at man.poznan.pl 
>> <mailto:radek.krzywania at man.poznan.pl>                   Networking 
>> Center
>> +48 61 850 25 26 http://www.man.poznan.pl
>> ________________________________________________________________________
>> *From:* nsi-wg-bounces at ogf.org <mailto:nsi-wg-bounces at ogf.org> 
>> [mailto:nsi-wg-bounces at ogf.org] *On Behalf Of *Jerry Sobieski
>> *Sent:* Wednesday, September 29, 2010 9:33 PM
>> *To:* Jeff W.Boote
>> *Cc:* nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>> *Subject:* Re: [Nsi-wg] time issue
>> Ok.  I can buy this approach of #1.   The Requested Start Time is 
>> immutable as the request goes down the tree (which disallows #2) - it 
>> is still a Requested Start Time, but NSAs are not allowed to change 
>> requested start time as the request goes down the tree.   But you 
>> can't prevent #3 if thats what an NSA somewhere down the tree decides 
>> to do.   The result would be a promise he may not be able to keep - 
>> but thats acceptable because the Estimated Start Time is just an 
>> estimate, its not binding.
>>
>> The point is, the local NSA cannot tell whether a remote NSA is using 
>> #1 or #3 since its totally up to the remote NSA to select the guard 
>> time appropriate for that request.   Likewise, even if the remote NSA 
>> misses the Estimated Start Time, the requesting RA has no recourse 
>> other than to a) just wait until the provisioning completes or b) 
>> give up and release the connection.    An SLA might influence the bad 
>> NSA to not low ball his provisioning guard time in the future, or it 
>> may provide a rebate for the jilted user, but these are not a 
>> protocol or a standards issue.
>>
>> This goes to John's comment on the call today about what happens 
>> inside the NSA between the PA role and the RA role...  These actions 
>> are captured in "state routines" that are invoked when protocol 
>> events occur.   These actions are generalized in the standard, but 
>> any heuristics like these approaches to guard time cannot always be 
>> mandated.   In a protocol standard, what ever components are 
>> "required" or "must" items, must be verifiable in a conformance 
>> test.   I.e. if someone comes up with an NSI imlementation, we should 
>> be able to put the reference implementation against the test 
>> implementation and we should be able to tell via protocol operation 
>> if the implementation under test is doing all the "must" items.   If 
>> we say an NSA must use #1 above, there is no way to test it and 
>> confirm that it is doing so.   If the test implementation uses #3, 
>> the only outward sign is that it may miss the start time on some 
>> connection(s), but it could have as easily just been a poor judgment 
>> call on the provisioning time - which is ok.
>>
>> So, in the standard, we can only recommend #1 be used.   Or we can 
>> say the NSA "should" use #1.   But we cannot require it.
>>
>> my $.02
>> Jerry
>>
>> Jeff W.Boote wrote:
>> On Sep 29, 2010, at 7:31 AM, Gigi Karmous-Edwards wrote:
>>
>>
>>
>> Jerry,
>>
>> For your question : " While the guard times may be network specific, 
>> we do need to at least consider what we would like an NSA to do if 
>> for instance a provisioning guard time pushes a reservation forward 
>> into a previous reservation:   Do we  1) reject the request since we 
>> can't prepend our guard time and still make the Requested Start 
>> Time?   OR  2)  Do we retard the Estimated Start Time to allow for 
>> the guard time?   OR 3) do we reduce the guard time to fit the 
>> available lead time?"
>>
>> In my opinion, I  think the answer here has to be # 1) each NSA must 
>> reject the request if their process to establish the connection 
>> requested can not meet the Start time. In my opinion an NSA should 
>> NOT be allowed to change the requested start time (this will cause 
>> all types of problems for other NSAs), so # 2) is not an option. The 
>> guard time for each NSA will most likely be vastly different and very 
>> dependent on the tools used by that network domain to configure the 
>> network elements for the requested path, so an individual guard time 
>> of an NSA is also nonnegotiable, so option # 3) is not an option.
>> I agree #1 seems the most deterministic.
>>
>>
>>
>>
>> I agree with Radek, ONLY Start times and End times should be used in 
>> the protocol and that guard times are only private functions of each 
>> individual NSA.
>> I agree with this. The guard times are not additive across each NSA. 
>> The guard time from the perspective of the user will effectively be 
>> the maximum of each NSAa guard time in the chain. But, the user 
>> doesn't care as long as provisioning is accomplished by the users 
>> requested start time. That time would be in the protocol and would 
>> remain unchanged through each step of the chain. And, it shouldn't 
>> matter how long it takes to tear down the circuit either as long as 
>> the circuit is available until their requested end time.
>> As to how to manage this time synchronization... I think it is 
>> totally reasonable to depend upon existing protocols. There are other 
>> protocols that already depend upon time synchronization, and many of 
>> them use NTP. We are not talking about needing very tight 
>> synchronization anyway. 1 second or even 10 seconds is plenty close 
>> enough. It is more about bounding that error.
>> jeff
>>
>>
>>
>>
>> Kind regards,
>> Gigi
>>
>> On 9/29/10 8:45 AM, Jerry Sobieski wrote:
>> Hi Inder-   I am not sure I agree with all of this...
>>
>> Inder Monga wrote:
>> Radek
>> I agree with your statements;
>>
>>      User is not interested in partial results, as he/she is not even
>>     aware/interested in which NSAs/domains are involved. User doesn’t
>>     care (if everything works fine ;) ).
>>
>> The protocol should be designed with the user in mind. The user does 
>> not care about guard time values, differences in setup times for MPLS 
>> vs optical lambdas, and concern itself with choices an NSA/NRM will 
>> make in path-finding.
>> The protocol designers can keep the user in mind, but /the protocol 
>> is between the RA and the PA/ and and has a specific purpose: to 
>> reserve and instantiate a connection across the globe.  We need to 
>> keep in mind that the RA is not always the end user - it is by 
>> definition another NSA and could be an NSA in the tree/chain 
>> somewhere.  If we want to differentiate between the user and the 
>> network, then we can create a simplified User to Network API, and a 
>> different Network to Network API...but I don't think thats what we 
>> want to do (:-)   We need to IMO *not* think about the user, but to 
>> think about the Requesting Agent - regardless of who it represents.
>>
>> Perhaps once the RA-PA protocol is tightly defined in all its 
>> nuances, we can develop/recommend an end user API that simplifies the 
>> the application's required interactions ??   This would allow an 
>> application to embed an RA in a runtime library/module and the 
>> application itself would only have to deal with the basic connection 
>> requirements....  just a thought.
>>
>>
>> In my opinion,
>> a. the user should specify "Expected Start Time, Expected End Time". 
>> The NSAs/domains along the path determine resource availability and 
>> booking in their schedules based on their own configured guard time 
>> (guard times are not specified by NSI protocol. NSI connection 
>> service architecture should discuss them as a suggested concept).
>> While the guard times may be network specific, we do need to at least 
>> consider what we would like an NSA to do if for instance a 
>> provisioning guard time pushes a reservation forward into a previous 
>> reservation:   Do we  1) reject the request since we can't prepend 
>> our guard time and still make the Requested Start Time?   OR  2)  Do 
>> we retard the Estimated Start Time to allow for the guard time?   OR 
>> 3) do we reduce the guard time to fit the available lead time?
>>
>> I think we now agree that the Start Time is just an estimate, due 
>> primarily to the guard time itself being just an estimate.  So none 
>> of these times are etched in stone...So which option do we recommend 
>> or require?   The protocol is sensitive to these various times - they 
>> cause timers to go off, messages to be sent, error handling to kick 
>> in...   If they are adjusted during scheduling or provisioning, we 
>> MUST understand what impact they will have to the protocol and how 
>> that will be carried through the service tree.
>>
>>
>> b. Within reasonable limits, the connection should be up as close to 
>> the start time as possible. The user can set his own 
>> policy/configuration on how long to wait after the start time to 
>> accept a connection. Since the resources are guaranteed, this is a 
>> connection of setup/provisioning only. Hence, there is no protocol 
>> state transition when start time is passed other than the messages 
>> that indicate the circuit is established end to end or teardown 
>> message initiated by the client.
>> Ah, but the rub here is that the "user" is an RA...but not all RAs 
>> are the end user.  We are defining the actions of an RA, regardless 
>> of whether it is a user NSA or an network NSA.  So we must insure 
>> that if the RA gets tired of waiting for provisioning to complete, 
>> that whatever actions it is allowed to take will be consistent and 
>> predictable through out the service tree for all the RA/PA 
>> interactions.    So the "user" actions are not irrelevant to the 
>> protocol.
>>
>>
>>
>> c. We should not design a protocol that depends on time 
>> synchronization to work. In my opinion, the start time, expected time 
>> to provision aka guard time is best handled/shared as a SLA/Service 
>> definition issue.
>> I agree:  We cannot expect perfectly/exactly synchronized clocks 
>> anywhere in the network.  And therefore we cannot depend upon clock 
>> synchronization for any part of the protocol to work.   Which implies 
>> that the protocol must work when the clocks are NOT synchronized.   
>> How do we insure this?   --> rigorous protocol analysis.
>>
>> While the values of certain timers may be left to the Service 
>> Definition/SLA, as I state before, we must make sure that the 
>> protocol can function predictably and consistently in the face of all 
>> possible timing permutations that are possible among NSAs.  This 
>> rapidly gets very complex if we allow too many variables for the 
>> SD/SLA to define.  Sometimes, its ok to identify constants that the 
>> protocol must use so that we can validate the protocol and simplify 
>> implementation and deployment.  Indeed, often times when clocks are 
>> only slightly skewed they introduce race conditions that become more 
>> likely to occur requiring more careful consideration.
>>
>>
>> d. Similar semantics apply to the end-time as well.
>> Pretty much.  Across the board,  things like clock events, estimates, 
>> and service specific choices will create situations where we need to 
>> insure  the protocol and state machines will function properly across 
>> the full range of possible permuted values.   This is in general why 
>> protocol designers say "make it only as complex as it needs to be, 
>> and no more" - options breed complexity.
>>
>> br
>> Jerry
>>
>>
>>
>>   
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org  <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>        
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>
>>
>>
>>   
>> ------------------------------------------------------------------------
>>   
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org  <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>    
>>   
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org  <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>    
>>
>>
>> -- 
>> Dr Artur Barczyk
>> California Institute of Technology
>> c/o CERN, 1211 Geneve 23, Switzerland
>> Tel:    +41 22 7675801
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>

-- 
Dr Artur Barczyk
California Institute of Technology
c/o CERN, 1211 Geneve 23, Switzerland
Tel:    +41 22 7675801

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100930/28671fb9/attachment-0001.html