[Nsi-wg] time issue

Thu Sep 30 13:20:09 CDT 2010

++

jeff

On Sep 30, 2010, at 12:02 PM, Evangelos Chaniotakis wrote:

> Seconding this.
>
> On Sep 30, 2010, at 1:54 PM, Aaron Brown wrote:
>
>> I'm probably oversimplifying, but it seems to me this problem
>> becomes much easier with Jeff's idea about having all clocks
>> synchronized within a period of no more than some number seconds. If
>> the clocks aren't synchronized, you run into a whole bunch of errors
>> related to making absolute time-based reservations anyway.
>>
>> The protocol mandates clock offsets of no more than X seconds. Each
>> domain selects its own setup time of "no more than Y minutes" and a
>> tear down time of "Z minutes". If a user requests a reservation from
>> time A to time B, the domain reserves time from A-X-Y through B+X+Z.
>> When it comes to setting up the circuit, the domain starts setting
>> it up at time A-X-Y. If the circuit isn't ready by time A-X, the
>> domain throws a setup error and handles that error condition the way
>> it'd handle an actual error occurred during circuit setup. The
>> circuit remains active until time B+X, at which time the domain
>> starts tearing it down. If, while the circuit is running, the hosts
>> become desychronized, one of the domains will (from the either the
>> clients or other domains' perspectives) end the circuit earlier than
>> expected and report the tear down. The other domains/clients will
>> handle that similar to if a cancel had occurred.
>>
>> Again, I may be vastly oversimplifying the problem.
>>
>> Cheers,
>> Aaron
>>
>> On Sep 30, 2010, at 1:31 PM, Radek Krzywania wrote:
>>
>>> Hi,
>>> Setting up a circuit via Alcatel NMS takes 2 minutes. This time is
>>> mostly consumed by NMS to find a path through domain and warm the
>>> room with CPU heat. A seconds or minute is still a guess anyway :)
>>> I can agree to use those values (instead of 20 minutes) but
>>> according to my current experience – lot of timeouts will appear.
>>> I fully support the statement of “We are trying to provide more/
>>> better predictability, not perfect predictability.” This should be
>>> on the title page of NSI design BTW :) All the case is about
>>> everything is relevant and not exact. The example of user clocks
>>> depicts it quite well (thanks Jerry for pointing that).
>>>
>>> Best regards
>>> Radek
>>>
>>> ________________________________________________________________________
>>> Radoslaw Krzywania                      Network Research and
>>> Development
>>>                                          Poznan Supercomputing and
>>> radek.krzywania at man.poznan.pl                   Networking Center
>>> +48 61 850 25 26                             http:// 
>>> www.man.poznan.pl
>>> ________________________________________________________________________
>>>
>>> From: Jerry Sobieski [mailto:jerry at nordu.net]
>>> Sent: Thursday, September 30, 2010 6:54 PM
>>> To: Artur Barczyk
>>> Cc: radek.krzywania at man.poznan.pl; 'Jeff W.Boote'; nsi-wg at ogf.org
>>> Subject: Re: [Nsi-wg] time issue
>>>
>>> Hi Artur-  I accept the challenge!
>>>
>>> First, let me calm the nerves...  The questionof setup time -
>>> particularly the issue of taking 10 minutes or more has mostly to
>>> do with provisioning all optical systems where amplification and
>>> attenuation across a mesh takes a significant time.   In most
>>> cases, the provisioning will be a more conventional few seconds to
>>> a minute (or so).     And for smaller domains with more
>>> conventional switcing gear, maybe a few seconds at most.
>>>
>>> So we should all try to keep perspective here that much of this
>>> discussion has to do with insuring the protocol functions
>>> correctly, consistently, and reliably as service infrastructure.
>>> And much of this is driven by making sure it works such even in the
>>> corner cases where it might take 15 minutes to provision, or two
>>> NSAs might have clocks that differ by 10 seconds. etc.
>>>
>>> But the real world facts are that nothing is perfect, and a
>>> globally distributed complex system such as our networks are never
>>> practically if not theoretically going to be perfectly synchronized
>>> or offer exect predictability.   We ar trying to provide more/
>>> better predictabiity, not perfect predictability.
>>>
>>> You are right about the users expectations that the connection will
>>> be available at the requested time.   But nothing is exact.  Even
>>> if we knew and could predict exactly the setup time, if something
>>> was broken in the network and we couldn't meet the committed start
>>> time, what would the user do?
>>>
>>> Ok.  deep breath....exhale.... feel better?   Ok good.   Now let me
>>> defend our discussions...
>>>
>>> To be blunt, it could be argued that any user application that
>>> blindly puts data into the pipe without getting some verification
>>> that the pipe *AND* the application agent at the far end is ready
>>> has no real idea if it is working AT ALL!   If the agent at the
>>> other end is not functioning (not a network problem), this is
>>> fundamentally indistinguishable from a network connection not being
>>> available.   How would the user be able to claim the network is
>>> broken?
>>>
>>> On the other hand, if there *is* out of band coordination going on
>>> between the local user agent and the destination user agent, then
>>> the application is trying to deal with an imperfect world in which
>>> it needs to determine and synchronze the state of the application
>>> agent on the far end before it proceeds.  --->  Why would doing so
>>> with the network resource not be of equal importance?
>>>
>>> In *general* (Fuzzy logic alert) we will make the start time.
>>> Indeed, in most instances we will be ready *before* the start
>>> time.   But if by chance we miss the start time by only 15 seconds,
>>> is that acceptable to you?  Or to the application that just dumped
>>> 19 MBytes of data down a hole?
>>>
>>> What if was the user application that had a slightly fast clock and
>>> started 10 seconds early?   *His* clock said 1pm, mine said
>>> 12:59:50.  Who is broken?   The result is the same.   What if the
>>> delta was 5 minutes, or 50 milliseconds?   Where do we draw the
>>> line?   Draw a line, and there will still be some misses...
>>>
>>> The point here is that nothing is perfect and exact.  And yet these
>>> systems function "correctly"!  We need to construct a protocol that
>>> can function in the face of these minor (on a human scale) time
>>> deltas.  But even seconds are not minor on the scale that a
>>> computer agent functions.  So we necessarilly need to address these
>>> nuances so that it works correctly on a timescale of milliseconds
>>> and less.
>>>
>>> In order to address the issue of [typically] slight variations of
>>> actual start time, we are proposing that the protocol would
>>> *always* notify the originating RA when the circuit is ready,
>>> albeit after the fact, but it says determinitistically "the circuit
>>> is now ready."   And we are also proposing a means for the RA to
>>> determine the state if that ProvisionComplete message is not
>>> received when it was expected - if there is a hard error or just a
>>> slow/late provisioning process still taking place.
>>>
>>> But given the fact that we cannot *exactly* synchronize each and
>>> every agent and system around the world- and keep them that way,
>>> and that we cannot predict perfectly how long each task will take
>>> before the fact, we have to face facts that we need to be able to
>>> function correctly with these uncertainties.   Without meaning to
>>> preach, the user application needs to do so too.
>>>
>>> Small is relative. (there is an old joke here about a prositute and
>>> an old man...but I won't go into it.:-)
>>>
>>> Best regards
>>> Jerry
>>>
>>>
>>>
>>> So we want to provide the service at the request time.  And we will
>>> make our best effort to do so.  And in most cases we will succeed.
>>> But what will the application do if we miss it?     What should the
>>> protocol do in an imperfect world?   It truly cannot function on
>>> fuzzy logic.
>>>
>>> One approach to addressing this is to say the RA will always be
>>> notified when the connection goes into service.   This is a
>>> positive sign that the connection is end-to-end.
>>>
>>> Artur Barczyk wrote:
>>> Hi Radek, All,
>>>
>>> hmmmm, I for my part would be quite annoyed (to put it mildly), if
>>> I miss the first
>>> 15 minutes of todays HD conf call just because I reserved the
>>> resources a week
>>> in advance. "Around" has no place in a well defined protocol. No
>>> fuzzy logic, please :-)
>>> Consider also the "bored child in a car" scenario:
>>> RA: are we there yet? PA: no... RA: are we there yet? PA: nooo....
>>> RA: are we there yet? PA: NO! etc.
>>>
>>> Be aware that users complaining are users quite quickly lost. You
>>> don't want that.
>>>
>>> So let's consider two example users:
>>> - high volume data transfers through a managed system: a data
>>> movement scheduler has
>>> reserved some bandwidth at a given time. This time comes, the
>>> application will just
>>> throw data on the network, it might use connection-less protocol,
>>> or not, but it will
>>> result in an error. It cannot wait "around" 15 minutes, as it will
>>> bring the transfer schedule
>>> in complete disorder. Such a "service" is just useless.
>>> - video conferencing/streaming. You reserve the network resource
>>> for 3pm because your
>>> meeting starts then. How do you explain to the video conference
>>> participant that the
>>> network prevented the conference to start for "around" 15 minutes?
>>> (Well, you can, but
>>> this will be the last time you'll see the user using your
>>> network :-) )
>>>
>>> In short, the only reasonable thing to do is to put the right
>>> mechanism in place to
>>> guarantee the service is up when the user requested it (and you
>>> confirmed it).
>>> The only acceptable reason for failing this is an error condition
>>> like network down (and we'll
>>> talk about protection in v2 :-) )
>>>
>>> I also think it is very dangerous to use "providing a service" as
>>> argument while the underlying
>>> protocols are not yet correctly specified. This is not theoretical,
>>> the service needs to be useful
>>> to the end-user, if you want some uptake. Fuzzy statements make it
>>> useless. The very reason people
>>> are interested in this is that it's deterministic - you know what
>>> you get and when. Otherwise use the
>>> routed network. :-)
>>>
>>> Cheers,
>>> Artur
>>>
>>>
>>>
>>> On 09/30/2010 03:37 PM, Radek Krzywania wrote:
>>> Hi,
>>> It’s getting hard to solve everything here, so let’s don’t try to
>>> solve everything here at once. So how about defining a start time
>>> as a best effort for v1? So we promise to deliver the service, yet
>>> we are unable to guarantee the exact start time in precision of
>>> seconds. If user want connection to be available at 2pm, it will be
>>> around that time, but we can’t guarantee when exactly (1:50, 2:01,
>>> 2:15). Let’s take a quite long time as a timeout (e.g. 20 minutes),
>>> and start booking the circuit in 5 or 10 minutes in advance (no
>>> discussion for v1, just best feeling guess) . The result will be
>>> that in most cases we will deliver the service at AROUND specified
>>> time. For v1 is enough, as we will be able to deliver a service,
>>> while in v2 we can discuss possible upgrades (unless our
>>> engineering approach discovers it’s fine enough :) ).
>>> For #1 – it may a problem for instant reservations. Here user want
>>> a circuit ASAP. We define ASAP as (see above approach) less than 20
>>> minutes (typically 5-10 minutes probably, but that’s my guess), or
>>> not at all. Users may or may not complain on that. In the first
>>> case we are good. For the second case we will need to design an
>>> upgrade for v2.
>>>
>>> Synchronization IMHO is important, and out of scope at the same
>>> time. We can make an assumption that agents times are synchronized
>>> with precision of let say 10 seconds, which should be far enough.
>>> The agents will use system clocks, so they need to be synchronized
>>> at the end (NTP or whatever), but that not even implementation but
>>> deployment issue. So let put into specification: “NSI protocol
>>> requires time synchronization with precision not less than
>>> 10seconds”. If we discover it’s insufficient, let’s upgrade it for
>>> v2.
>>>
>>> We already have some features to implement, just to see if it works
>>> fine (works at all, actually). If user is booking a circuit a week
>>> in advance, I guess he will not mind if we set it up 15 minutes
>>> after start time (user IS aware of that as we specify this in the
>>> protocol description). We can’t however deliver the service shorter
>>> than user defined time. So we can agree (by voting, not discussing)
>>> the fixed time values. My proposal is as above:
>>> 20 minutes for reservation as set up time
>>> Service availability time (e.g. 13 h)
>>> Service tear down time (it’s not important from user perspective,
>>> as since any segment of connection is removed, the service is not
>>> available any more, but let’s say 15 minutes)
>>> In that way, calendar booking needs to have reserve resources for
>>> 13h 35 minutes. IMHO we can agree on that by simply vote for v1
>>> (doodle maybe), and collect more detailed requirements for v2 later
>>> on. I get the feeling we started quite theoretical discussion based
>>> on assumptions and guessing “what if”, instead of focusing on
>>> delivering any service (event with limited guarantee).
>>>
>>> Best regards
>>> Radek
>>> ________________________________________________________________________
>>> Radoslaw Krzywania                      Network Research and
>>> Development
>>>                                          Poznan Supercomputing and
>>> radek.krzywania at man.poznan.pl                   Networking Center
>>> +48 61 850 25 26                             http:// 
>>> www.man.poznan.pl
>>> ________________________________________________________________________
>>>
>>> From: nsi-wg-bounces at ogf.org [mailto:nsi-wg-bounces at ogf.org] On
>>> Behalf Of Jerry Sobieski
>>> Sent: Wednesday, September 29, 2010 9:33 PM
>>> To: Jeff W.Boote
>>> Cc: nsi-wg at ogf.org
>>> Subject: Re: [Nsi-wg] time issue
>>>
>>> Ok.  I can buy this approach of #1.   The Requested Start Time is
>>> immutable as the request goes down the tree (which disallows #2) -
>>> it is still a Requested Start Time, but NSAs are not allowed to
>>> change requested start time as the request goes down the tree.
>>> But you can't prevent #3 if thats what an NSA somewhere down the
>>> tree decides to do.   The result would be a promise he may not be
>>> able to keep - but thats acceptable because the Estimated Start
>>> Time is just an estimate, its not binding.
>>>
>>> The point is, the local NSA cannot tell whether a remote NSA is
>>> using #1 or #3 since its totally up to the remote NSA to select the
>>> guard time appropriate for that request.   Likewise, even if the
>>> remote NSA misses the Estimated Start Time, the requesting RA has
>>> no recourse other than to a) just wait until the provisioning
>>> completes or b) give up and release the connection.    An SLA might
>>> influence the bad NSA to not low ball his provisioning guard time
>>> in the future, or it may provide a rebate for the jilted user, but
>>> these are not a protocol or a standards issue.
>>>
>>> This goes to John's comment on the call today about what happens
>>> inside the NSA between the PA role and the RA role...  These
>>> actions are captured in "state routines" that are invoked when
>>> protocol events occur.   These actions are generalized in the
>>> standard, but any heuristics like these approaches to guard time
>>> cannot always be mandated.   In a protocol standard, what ever
>>> components are "required" or "must" items, must be verifiable in a
>>> conformance test.   I.e. if someone comes up with an NSI
>>> imlementation, we should be able to put the reference
>>> implementation against the test implementation and we should be
>>> able to tell via protocol operation if the implementation under
>>> test is doing all the "must" items.   If we say an NSA must use #1
>>> above, there is no way to test it and confirm that it is doing
>>> so.   If the test implementation uses #3, the only outward sign is
>>> that it may miss the start time on some connection(s), but it could
>>> have as easily just been a poor judgment call on the provisioning
>>> time - which is ok.
>>>
>>> So, in the standard, we can only recommend #1 be used.   Or we can
>>> say the NSA "should" use #1.   But we cannot require it.
>>>
>>> my $.02
>>> Jerry
>>>
>>> Jeff W.Boote wrote:
>>>
>>> On Sep 29, 2010, at 7:31 AM, Gigi Karmous-Edwards wrote:
>>>
>>>
>>>
>>> Jerry,
>>>
>>> For your question : " While the guard times may be network
>>> specific, we do need to at least consider what we would like an NSA
>>> to do if for instance a provisioning guard time pushes a
>>> reservation forward into a previous reservation:   Do we  1) reject
>>> the request since we can't prepend our guard time and still make
>>> the Requested Start Time?   OR  2)  Do we retard the Estimated
>>> Start Time to allow for the guard time?   OR 3) do we reduce the
>>> guard time to fit the available lead time?"
>>>
>>> In my opinion, I  think the answer here has to be # 1) each NSA
>>> must reject the request if their process to establish the
>>> connection requested can not meet the Start time. In my opinion an
>>> NSA should NOT be allowed to change the requested start time (this
>>> will cause all types of problems for other NSAs), so # 2) is not an
>>> option. The guard time for each NSA will most likely be vastly
>>> different and very dependent on the tools used by that network
>>> domain to configure the network elements for the requested path, so
>>> an individual guard time of an NSA is also nonnegotiable, so option
>>> # 3) is not an option.
>>>
>>> I agree #1 seems the most deterministic.
>>>
>>>
>>>
>>>
>>> I agree with Radek, ONLY Start times and End times should be used
>>> in the protocol and that guard times are only private functions of
>>> each individual NSA.
>>>
>>> I agree with this. The guard times are not additive across each
>>> NSA. The guard time from the perspective of the user will
>>> effectively be the maximum of each NSAa guard time in the chain.
>>> But, the user doesn't care as long as provisioning is accomplished
>>> by the users requested start time. That time would be in the
>>> protocol and would remain unchanged through each step of the chain.
>>> And, it shouldn't matter how long it takes to tear down the circuit
>>> either as long as the circuit is available until their requested
>>> end time.
>>>
>>> As to how to manage this time synchronization... I think it is
>>> totally reasonable to depend upon existing protocols. There are
>>> other protocols that already depend upon time synchronization, and
>>> many of them use NTP. We are not talking about needing very tight
>>> synchronization anyway. 1 second or even 10 seconds is plenty close
>>> enough. It is more about bounding that error.
>>>
>>> jeff
>>>
>>>
>>>
>>>
>>> Kind regards,
>>> Gigi
>>>
>>> On 9/29/10 8:45 AM, Jerry Sobieski wrote:
>>> Hi Inder-   I am not sure I agree with all of this...
>>>
>>> Inder Monga wrote:
>>> Radek
>>>
>>> I agree with your statements;
>>> User is not interested in partial results, as he/she is not even
>>> aware/interested in which NSAs/domains are involved. User doesn’t
>>> care (if everything works fine ;) ).
>>>
>>> The protocol should be designed with the user in mind. The user
>>> does not care about guard time values, differences in setup times
>>> for MPLS vs optical lambdas, and concern itself with choices an NSA/
>>> NRM will make in path-finding.
>>>
>>> The protocol designers can keep the user in mind, but the protocol
>>> is between the RA and the PA and and has a specific purpose: to
>>> reserve and instantiate a connection across the globe.  We need to
>>> keep in mind that the RA is not always the end user - it is by
>>> definition another NSA and could be an NSA in the tree/chain
>>> somewhere.  If we want to differentiate between the user and the
>>> network, then we can create a simplified User to Network API, and a
>>> different Network to Network API...but I don't think thats what we
>>> want to do (:-)   We need to IMO *not* think about the user, but to
>>> think about the Requesting Agent - regardless of who it represents.
>>>
>>> Perhaps once the RA-PA protocol is tightly defined in all its
>>> nuances, we can develop/recommend an end user API that simplifies
>>> the the application's required interactions ??   This would allow
>>> an application to embed an RA in a runtime library/module and the
>>> application itself would only have to deal with the basic
>>> connection requirements....  just a thought.
>>>
>>>
>>> In my opinion,
>>> a. the user should specify "Expected Start Time, Expected End
>>> Time". The NSAs/domains along the path determine resource
>>> availability and booking in their schedules based on their own
>>> configured guard time (guard times are not specified by NSI
>>> protocol. NSI connection service architecture should discuss them
>>> as a suggested concept).
>>> While the guard times may be network specific, we do need to at
>>> least consider what we would like an NSA to do if for instance a
>>> provisioning guard time pushes a reservation forward into a
>>> previous reservation:   Do we  1) reject the request since we can't
>>> prepend our guard time and still make the Requested Start Time?
>>> OR  2)  Do we retard the Estimated Start Time to allow for the
>>> guard time?   OR 3) do we reduce the guard time to fit the
>>> available lead time?
>>>
>>> I think we now agree that the Start Time is just an estimate, due
>>> primarily to the guard time itself being just an estimate.  So none
>>> of these times are etched in stone...So which option do we
>>> recommend or require?   The protocol is sensitive to these various
>>> times - they cause timers to go off, messages to be sent, error
>>> handling to kick in...   If they are adjusted during scheduling or
>>> provisioning, we MUST understand what impact they will have to the
>>> protocol and how that will be carried through the service tree.
>>>
>>>
>>> b. Within reasonable limits, the connection should be up as close
>>> to the start time as possible. The user can set his own policy/
>>> configuration on how long to wait after the start time to accept a
>>> connection. Since the resources are guaranteed, this is a
>>> connection of setup/provisioning only. Hence, there is no protocol
>>> state transition when start time is passed other than the messages
>>> that indicate the circuit is established end to end or teardown
>>> message initiated by the client.
>>> Ah, but the rub here is that the "user" is an RA...but not all RAs
>>> are the end user.  We are defining the actions of an RA, regardless
>>> of whether it is a user NSA or an network NSA.  So we must insure
>>> that if the RA gets tired of waiting for provisioning to complete,
>>> that whatever actions it is allowed to take will be consistent and
>>> predictable through out the service tree for all the RA/PA
>>> interactions.    So the "user" actions are not irrelevant to the
>>> protocol.
>>>
>>>
>>>
>>> c. We should not design a protocol that depends on time
>>> synchronization to work. In my opinion, the start time, expected
>>> time to provision aka guard time is best handled/shared as a SLA/
>>> Service definition issue.
>>> I agree:  We cannot expect perfectly/exactly synchronized clocks
>>> anywhere in the network.  And therefore we cannot depend upon clock
>>> synchronization for any part of the protocol to work.   Which
>>> implies that the protocol must work when the clocks are NOT
>>> synchronized.   How do we insure this?   --> rigorous protocol
>>> analysis.
>>>
>>> While the values of certain timers may be left to the Service
>>> Definition/SLA, as I state before, we must make sure that the
>>> protocol can function predictably and consistently in the face of
>>> all possible timing permutations that are possible among NSAs.
>>> This rapidly gets very complex if we allow too many variables for
>>> the SD/SLA to define.  Sometimes, its ok to identify constants that
>>> the protocol must use so that we can validate the protocol and
>>> simplify implementation and deployment.  Indeed, often times when
>>> clocks are only slightly skewed they introduce race conditions that
>>> become more likely to occur requiring more careful consideration.
>>>
>>>
>>>
>>> d. Similar semantics apply to the end-time as well.
>>> Pretty much.  Across the board,  things like clock events,
>>> estimates, and service specific choices will create situations
>>> where we need to insure  the protocol and state machines will
>>> function properly across the full range of possible permuted
>>> values.   This is in general why protocol designers say "make it
>>> only as complex as it needs to be, and no more" - options breed
>>> complexity.
>>>
>>> br
>>> Jerry
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>>
>>>
>>> -- 
>>> Dr Artur Barczyk
>>> California Institute of Technology
>>> c/o CERN, 1211 Geneve 23, Switzerland
>>> Tel:    +41 22 7675801
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/nsi-wg