[Nsi-wg] time issue
Evangelos Chaniotakis
haniotak at es.net
Thu Sep 30 13:02:17 CDT 2010
Seconding this.
On Sep 30, 2010, at 1:54 PM, Aaron Brown wrote:
> I'm probably oversimplifying, but it seems to me this problem
> becomes much easier with Jeff's idea about having all clocks
> synchronized within a period of no more than some number seconds. If
> the clocks aren't synchronized, you run into a whole bunch of errors
> related to making absolute time-based reservations anyway.
>
> The protocol mandates clock offsets of no more than X seconds. Each
> domain selects its own setup time of "no more than Y minutes" and a
> tear down time of "Z minutes". If a user requests a reservation from
> time A to time B, the domain reserves time from A-X-Y through B+X+Z.
> When it comes to setting up the circuit, the domain starts setting
> it up at time A-X-Y. If the circuit isn't ready by time A-X, the
> domain throws a setup error and handles that error condition the way
> it'd handle an actual error occurred during circuit setup. The
> circuit remains active until time B+X, at which time the domain
> starts tearing it down. If, while the circuit is running, the hosts
> become desychronized, one of the domains will (from the either the
> clients or other domains' perspectives) end the circuit earlier than
> expected and report the tear down. The other domains/clients will
> handle that similar to if a cancel had occurred.
>
> Again, I may be vastly oversimplifying the problem.
>
> Cheers,
> Aaron
>
> On Sep 30, 2010, at 1:31 PM, Radek Krzywania wrote:
>
>> Hi,
>> Setting up a circuit via Alcatel NMS takes 2 minutes. This time is
>> mostly consumed by NMS to find a path through domain and warm the
>> room with CPU heat. A seconds or minute is still a guess anyway :)
>> I can agree to use those values (instead of 20 minutes) but
>> according to my current experience – lot of timeouts will appear.
>> I fully support the statement of “We are trying to provide more/
>> better predictability, not perfect predictability.” This should be
>> on the title page of NSI design BTW :) All the case is about
>> everything is relevant and not exact. The example of user clocks
>> depicts it quite well (thanks Jerry for pointing that).
>>
>> Best regards
>> Radek
>>
>> ________________________________________________________________________
>> Radoslaw Krzywania Network Research and
>> Development
>> Poznan Supercomputing and
>> radek.krzywania at man.poznan.pl Networking Center
>> +48 61 850 25 26 http://www.man.poznan.pl
>> ________________________________________________________________________
>>
>> From: Jerry Sobieski [mailto:jerry at nordu.net]
>> Sent: Thursday, September 30, 2010 6:54 PM
>> To: Artur Barczyk
>> Cc: radek.krzywania at man.poznan.pl; 'Jeff W.Boote'; nsi-wg at ogf.org
>> Subject: Re: [Nsi-wg] time issue
>>
>> Hi Artur- I accept the challenge!
>>
>> First, let me calm the nerves... The questionof setup time -
>> particularly the issue of taking 10 minutes or more has mostly to
>> do with provisioning all optical systems where amplification and
>> attenuation across a mesh takes a significant time. In most
>> cases, the provisioning will be a more conventional few seconds to
>> a minute (or so). And for smaller domains with more
>> conventional switcing gear, maybe a few seconds at most.
>>
>> So we should all try to keep perspective here that much of this
>> discussion has to do with insuring the protocol functions
>> correctly, consistently, and reliably as service infrastructure.
>> And much of this is driven by making sure it works such even in the
>> corner cases where it might take 15 minutes to provision, or two
>> NSAs might have clocks that differ by 10 seconds. etc.
>>
>> But the real world facts are that nothing is perfect, and a
>> globally distributed complex system such as our networks are never
>> practically if not theoretically going to be perfectly synchronized
>> or offer exect predictability. We ar trying to provide more/
>> better predictabiity, not perfect predictability.
>>
>> You are right about the users expectations that the connection will
>> be available at the requested time. But nothing is exact. Even
>> if we knew and could predict exactly the setup time, if something
>> was broken in the network and we couldn't meet the committed start
>> time, what would the user do?
>>
>> Ok. deep breath....exhale.... feel better? Ok good. Now let me
>> defend our discussions...
>>
>> To be blunt, it could be argued that any user application that
>> blindly puts data into the pipe without getting some verification
>> that the pipe *AND* the application agent at the far end is ready
>> has no real idea if it is working AT ALL! If the agent at the
>> other end is not functioning (not a network problem), this is
>> fundamentally indistinguishable from a network connection not being
>> available. How would the user be able to claim the network is
>> broken?
>>
>> On the other hand, if there *is* out of band coordination going on
>> between the local user agent and the destination user agent, then
>> the application is trying to deal with an imperfect world in which
>> it needs to determine and synchronze the state of the application
>> agent on the far end before it proceeds. ---> Why would doing so
>> with the network resource not be of equal importance?
>>
>> In *general* (Fuzzy logic alert) we will make the start time.
>> Indeed, in most instances we will be ready *before* the start
>> time. But if by chance we miss the start time by only 15 seconds,
>> is that acceptable to you? Or to the application that just dumped
>> 19 MBytes of data down a hole?
>>
>> What if was the user application that had a slightly fast clock and
>> started 10 seconds early? *His* clock said 1pm, mine said
>> 12:59:50. Who is broken? The result is the same. What if the
>> delta was 5 minutes, or 50 milliseconds? Where do we draw the
>> line? Draw a line, and there will still be some misses...
>>
>> The point here is that nothing is perfect and exact. And yet these
>> systems function "correctly"! We need to construct a protocol that
>> can function in the face of these minor (on a human scale) time
>> deltas. But even seconds are not minor on the scale that a
>> computer agent functions. So we necessarilly need to address these
>> nuances so that it works correctly on a timescale of milliseconds
>> and less.
>>
>> In order to address the issue of [typically] slight variations of
>> actual start time, we are proposing that the protocol would
>> *always* notify the originating RA when the circuit is ready,
>> albeit after the fact, but it says determinitistically "the circuit
>> is now ready." And we are also proposing a means for the RA to
>> determine the state if that ProvisionComplete message is not
>> received when it was expected - if there is a hard error or just a
>> slow/late provisioning process still taking place.
>>
>> But given the fact that we cannot *exactly* synchronize each and
>> every agent and system around the world- and keep them that way,
>> and that we cannot predict perfectly how long each task will take
>> before the fact, we have to face facts that we need to be able to
>> function correctly with these uncertainties. Without meaning to
>> preach, the user application needs to do so too.
>>
>> Small is relative. (there is an old joke here about a prositute and
>> an old man...but I won't go into it.:-)
>>
>> Best regards
>> Jerry
>>
>>
>>
>> So we want to provide the service at the request time. And we will
>> make our best effort to do so. And in most cases we will succeed.
>> But what will the application do if we miss it? What should the
>> protocol do in an imperfect world? It truly cannot function on
>> fuzzy logic.
>>
>> One approach to addressing this is to say the RA will always be
>> notified when the connection goes into service. This is a
>> positive sign that the connection is end-to-end.
>>
>> Artur Barczyk wrote:
>> Hi Radek, All,
>>
>> hmmmm, I for my part would be quite annoyed (to put it mildly), if
>> I miss the first
>> 15 minutes of todays HD conf call just because I reserved the
>> resources a week
>> in advance. "Around" has no place in a well defined protocol. No
>> fuzzy logic, please :-)
>> Consider also the "bored child in a car" scenario:
>> RA: are we there yet? PA: no... RA: are we there yet? PA: nooo....
>> RA: are we there yet? PA: NO! etc.
>>
>> Be aware that users complaining are users quite quickly lost. You
>> don't want that.
>>
>> So let's consider two example users:
>> - high volume data transfers through a managed system: a data
>> movement scheduler has
>> reserved some bandwidth at a given time. This time comes, the
>> application will just
>> throw data on the network, it might use connection-less protocol,
>> or not, but it will
>> result in an error. It cannot wait "around" 15 minutes, as it will
>> bring the transfer schedule
>> in complete disorder. Such a "service" is just useless.
>> - video conferencing/streaming. You reserve the network resource
>> for 3pm because your
>> meeting starts then. How do you explain to the video conference
>> participant that the
>> network prevented the conference to start for "around" 15 minutes?
>> (Well, you can, but
>> this will be the last time you'll see the user using your
>> network :-) )
>>
>> In short, the only reasonable thing to do is to put the right
>> mechanism in place to
>> guarantee the service is up when the user requested it (and you
>> confirmed it).
>> The only acceptable reason for failing this is an error condition
>> like network down (and we'll
>> talk about protection in v2 :-) )
>>
>> I also think it is very dangerous to use "providing a service" as
>> argument while the underlying
>> protocols are not yet correctly specified. This is not theoretical,
>> the service needs to be useful
>> to the end-user, if you want some uptake. Fuzzy statements make it
>> useless. The very reason people
>> are interested in this is that it's deterministic - you know what
>> you get and when. Otherwise use the
>> routed network. :-)
>>
>> Cheers,
>> Artur
>>
>>
>>
>> On 09/30/2010 03:37 PM, Radek Krzywania wrote:
>> Hi,
>> It’s getting hard to solve everything here, so let’s don’t try to
>> solve everything here at once. So how about defining a start time
>> as a best effort for v1? So we promise to deliver the service, yet
>> we are unable to guarantee the exact start time in precision of
>> seconds. If user want connection to be available at 2pm, it will be
>> around that time, but we can’t guarantee when exactly (1:50, 2:01,
>> 2:15). Let’s take a quite long time as a timeout (e.g. 20 minutes),
>> and start booking the circuit in 5 or 10 minutes in advance (no
>> discussion for v1, just best feeling guess) . The result will be
>> that in most cases we will deliver the service at AROUND specified
>> time. For v1 is enough, as we will be able to deliver a service,
>> while in v2 we can discuss possible upgrades (unless our
>> engineering approach discovers it’s fine enough :) ).
>> For #1 – it may a problem for instant reservations. Here user want
>> a circuit ASAP. We define ASAP as (see above approach) less than 20
>> minutes (typically 5-10 minutes probably, but that’s my guess), or
>> not at all. Users may or may not complain on that. In the first
>> case we are good. For the second case we will need to design an
>> upgrade for v2.
>>
>> Synchronization IMHO is important, and out of scope at the same
>> time. We can make an assumption that agents times are synchronized
>> with precision of let say 10 seconds, which should be far enough.
>> The agents will use system clocks, so they need to be synchronized
>> at the end (NTP or whatever), but that not even implementation but
>> deployment issue. So let put into specification: “NSI protocol
>> requires time synchronization with precision not less than
>> 10seconds”. If we discover it’s insufficient, let’s upgrade it for
>> v2.
>>
>> We already have some features to implement, just to see if it works
>> fine (works at all, actually). If user is booking a circuit a week
>> in advance, I guess he will not mind if we set it up 15 minutes
>> after start time (user IS aware of that as we specify this in the
>> protocol description). We can’t however deliver the service shorter
>> than user defined time. So we can agree (by voting, not discussing)
>> the fixed time values. My proposal is as above:
>> 20 minutes for reservation as set up time
>> Service availability time (e.g. 13 h)
>> Service tear down time (it’s not important from user perspective,
>> as since any segment of connection is removed, the service is not
>> available any more, but let’s say 15 minutes)
>> In that way, calendar booking needs to have reserve resources for
>> 13h 35 minutes. IMHO we can agree on that by simply vote for v1
>> (doodle maybe), and collect more detailed requirements for v2 later
>> on. I get the feeling we started quite theoretical discussion based
>> on assumptions and guessing “what if”, instead of focusing on
>> delivering any service (event with limited guarantee).
>>
>> Best regards
>> Radek
>> ________________________________________________________________________
>> Radoslaw Krzywania Network Research and
>> Development
>> Poznan Supercomputing and
>> radek.krzywania at man.poznan.pl Networking Center
>> +48 61 850 25 26 http://www.man.poznan.pl
>> ________________________________________________________________________
>>
>> From: nsi-wg-bounces at ogf.org [mailto:nsi-wg-bounces at ogf.org] On
>> Behalf Of Jerry Sobieski
>> Sent: Wednesday, September 29, 2010 9:33 PM
>> To: Jeff W.Boote
>> Cc: nsi-wg at ogf.org
>> Subject: Re: [Nsi-wg] time issue
>>
>> Ok. I can buy this approach of #1. The Requested Start Time is
>> immutable as the request goes down the tree (which disallows #2) -
>> it is still a Requested Start Time, but NSAs are not allowed to
>> change requested start time as the request goes down the tree.
>> But you can't prevent #3 if thats what an NSA somewhere down the
>> tree decides to do. The result would be a promise he may not be
>> able to keep - but thats acceptable because the Estimated Start
>> Time is just an estimate, its not binding.
>>
>> The point is, the local NSA cannot tell whether a remote NSA is
>> using #1 or #3 since its totally up to the remote NSA to select the
>> guard time appropriate for that request. Likewise, even if the
>> remote NSA misses the Estimated Start Time, the requesting RA has
>> no recourse other than to a) just wait until the provisioning
>> completes or b) give up and release the connection. An SLA might
>> influence the bad NSA to not low ball his provisioning guard time
>> in the future, or it may provide a rebate for the jilted user, but
>> these are not a protocol or a standards issue.
>>
>> This goes to John's comment on the call today about what happens
>> inside the NSA between the PA role and the RA role... These
>> actions are captured in "state routines" that are invoked when
>> protocol events occur. These actions are generalized in the
>> standard, but any heuristics like these approaches to guard time
>> cannot always be mandated. In a protocol standard, what ever
>> components are "required" or "must" items, must be verifiable in a
>> conformance test. I.e. if someone comes up with an NSI
>> imlementation, we should be able to put the reference
>> implementation against the test implementation and we should be
>> able to tell via protocol operation if the implementation under
>> test is doing all the "must" items. If we say an NSA must use #1
>> above, there is no way to test it and confirm that it is doing
>> so. If the test implementation uses #3, the only outward sign is
>> that it may miss the start time on some connection(s), but it could
>> have as easily just been a poor judgment call on the provisioning
>> time - which is ok.
>>
>> So, in the standard, we can only recommend #1 be used. Or we can
>> say the NSA "should" use #1. But we cannot require it.
>>
>> my $.02
>> Jerry
>>
>> Jeff W.Boote wrote:
>>
>> On Sep 29, 2010, at 7:31 AM, Gigi Karmous-Edwards wrote:
>>
>>
>>
>> Jerry,
>>
>> For your question : " While the guard times may be network
>> specific, we do need to at least consider what we would like an NSA
>> to do if for instance a provisioning guard time pushes a
>> reservation forward into a previous reservation: Do we 1) reject
>> the request since we can't prepend our guard time and still make
>> the Requested Start Time? OR 2) Do we retard the Estimated
>> Start Time to allow for the guard time? OR 3) do we reduce the
>> guard time to fit the available lead time?"
>>
>> In my opinion, I think the answer here has to be # 1) each NSA
>> must reject the request if their process to establish the
>> connection requested can not meet the Start time. In my opinion an
>> NSA should NOT be allowed to change the requested start time (this
>> will cause all types of problems for other NSAs), so # 2) is not an
>> option. The guard time for each NSA will most likely be vastly
>> different and very dependent on the tools used by that network
>> domain to configure the network elements for the requested path, so
>> an individual guard time of an NSA is also nonnegotiable, so option
>> # 3) is not an option.
>>
>> I agree #1 seems the most deterministic.
>>
>>
>>
>>
>> I agree with Radek, ONLY Start times and End times should be used
>> in the protocol and that guard times are only private functions of
>> each individual NSA.
>>
>> I agree with this. The guard times are not additive across each
>> NSA. The guard time from the perspective of the user will
>> effectively be the maximum of each NSAa guard time in the chain.
>> But, the user doesn't care as long as provisioning is accomplished
>> by the users requested start time. That time would be in the
>> protocol and would remain unchanged through each step of the chain.
>> And, it shouldn't matter how long it takes to tear down the circuit
>> either as long as the circuit is available until their requested
>> end time.
>>
>> As to how to manage this time synchronization... I think it is
>> totally reasonable to depend upon existing protocols. There are
>> other protocols that already depend upon time synchronization, and
>> many of them use NTP. We are not talking about needing very tight
>> synchronization anyway. 1 second or even 10 seconds is plenty close
>> enough. It is more about bounding that error.
>>
>> jeff
>>
>>
>>
>>
>> Kind regards,
>> Gigi
>>
>> On 9/29/10 8:45 AM, Jerry Sobieski wrote:
>> Hi Inder- I am not sure I agree with all of this...
>>
>> Inder Monga wrote:
>> Radek
>>
>> I agree with your statements;
>> User is not interested in partial results, as he/she is not even
>> aware/interested in which NSAs/domains are involved. User doesn’t
>> care (if everything works fine ;) ).
>>
>> The protocol should be designed with the user in mind. The user
>> does not care about guard time values, differences in setup times
>> for MPLS vs optical lambdas, and concern itself with choices an NSA/
>> NRM will make in path-finding.
>>
>> The protocol designers can keep the user in mind, but the protocol
>> is between the RA and the PA and and has a specific purpose: to
>> reserve and instantiate a connection across the globe. We need to
>> keep in mind that the RA is not always the end user - it is by
>> definition another NSA and could be an NSA in the tree/chain
>> somewhere. If we want to differentiate between the user and the
>> network, then we can create a simplified User to Network API, and a
>> different Network to Network API...but I don't think thats what we
>> want to do (:-) We need to IMO *not* think about the user, but to
>> think about the Requesting Agent - regardless of who it represents.
>>
>> Perhaps once the RA-PA protocol is tightly defined in all its
>> nuances, we can develop/recommend an end user API that simplifies
>> the the application's required interactions ?? This would allow
>> an application to embed an RA in a runtime library/module and the
>> application itself would only have to deal with the basic
>> connection requirements.... just a thought.
>>
>>
>> In my opinion,
>> a. the user should specify "Expected Start Time, Expected End
>> Time". The NSAs/domains along the path determine resource
>> availability and booking in their schedules based on their own
>> configured guard time (guard times are not specified by NSI
>> protocol. NSI connection service architecture should discuss them
>> as a suggested concept).
>> While the guard times may be network specific, we do need to at
>> least consider what we would like an NSA to do if for instance a
>> provisioning guard time pushes a reservation forward into a
>> previous reservation: Do we 1) reject the request since we can't
>> prepend our guard time and still make the Requested Start Time?
>> OR 2) Do we retard the Estimated Start Time to allow for the
>> guard time? OR 3) do we reduce the guard time to fit the
>> available lead time?
>>
>> I think we now agree that the Start Time is just an estimate, due
>> primarily to the guard time itself being just an estimate. So none
>> of these times are etched in stone...So which option do we
>> recommend or require? The protocol is sensitive to these various
>> times - they cause timers to go off, messages to be sent, error
>> handling to kick in... If they are adjusted during scheduling or
>> provisioning, we MUST understand what impact they will have to the
>> protocol and how that will be carried through the service tree.
>>
>>
>> b. Within reasonable limits, the connection should be up as close
>> to the start time as possible. The user can set his own policy/
>> configuration on how long to wait after the start time to accept a
>> connection. Since the resources are guaranteed, this is a
>> connection of setup/provisioning only. Hence, there is no protocol
>> state transition when start time is passed other than the messages
>> that indicate the circuit is established end to end or teardown
>> message initiated by the client.
>> Ah, but the rub here is that the "user" is an RA...but not all RAs
>> are the end user. We are defining the actions of an RA, regardless
>> of whether it is a user NSA or an network NSA. So we must insure
>> that if the RA gets tired of waiting for provisioning to complete,
>> that whatever actions it is allowed to take will be consistent and
>> predictable through out the service tree for all the RA/PA
>> interactions. So the "user" actions are not irrelevant to the
>> protocol.
>>
>>
>>
>> c. We should not design a protocol that depends on time
>> synchronization to work. In my opinion, the start time, expected
>> time to provision aka guard time is best handled/shared as a SLA/
>> Service definition issue.
>> I agree: We cannot expect perfectly/exactly synchronized clocks
>> anywhere in the network. And therefore we cannot depend upon clock
>> synchronization for any part of the protocol to work. Which
>> implies that the protocol must work when the clocks are NOT
>> synchronized. How do we insure this? --> rigorous protocol
>> analysis.
>>
>> While the values of certain timers may be left to the Service
>> Definition/SLA, as I state before, we must make sure that the
>> protocol can function predictably and consistently in the face of
>> all possible timing permutations that are possible among NSAs.
>> This rapidly gets very complex if we allow too many variables for
>> the SD/SLA to define. Sometimes, its ok to identify constants that
>> the protocol must use so that we can validate the protocol and
>> simplify implementation and deployment. Indeed, often times when
>> clocks are only slightly skewed they introduce race conditions that
>> become more likely to occur requiring more careful consideration.
>>
>>
>>
>> d. Similar semantics apply to the end-time as well.
>> Pretty much. Across the board, things like clock events,
>> estimates, and service specific choices will create situations
>> where we need to insure the protocol and state machines will
>> function properly across the full range of possible permuted
>> values. This is in general why protocol designers say "make it
>> only as complex as it needs to be, and no more" - options breed
>> complexity.
>>
>> br
>> Jerry
>>
>>
>>
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>
>>
>>
>>
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>
>>
>>
>> --
>> Dr Artur Barczyk
>> California Institute of Technology
>> c/o CERN, 1211 Geneve 23, Switzerland
>> Tel: +41 22 7675801
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/nsi-wg
More information about the nsi-wg
mailing list