[Nsi-wg] NSI error handling draft - next version
John MacAuley
john.macauley at surfnet.nl
Wed Apr 28 10:50:09 CDT 2010
Peoples,
Had someone show up in my office so I missed the conversation over
"Resource change from available to not available." I thought I would
provide some input on the topic based on my DRAC experiences.
I think there are three types of events that can initiate a topology
change that should be understood when defining the error handling. Two
of these are actually not errors but normal operating procedures within
a network:
1. Physical network failure resulting in a topology change - typically
the temporary removal of a link from topology with no knowledge of when
it will be restored.
2. The permanent removal of a link from the topology by a network
administrator. Actually, this one should include the reconfiguration of
the network where an entire node could be removed.
3. The temporary removal of a link by a network administrator for
maintainence purposes. This will typically have a defined start and end
time based on the maintenance window.
#1 is interesting in that it impacts existing schedules in an in-service
state, reserved schedules not yet in service, and any new reservation
requests.
a) Those schedules in-service using the links impacted by the topology
change may undergo some type of restoration. If this was a protected
circuit then underlying transport will restore the service and we may
not want to do anything about it. If this was an unprotected service
then perhaps re-dial could be initiated by the NRM in an attempt to
achieve a lazy restore.
b) Depending on the estimated length of the temporary topology change we
may need to recompute the paths of those schedules reserved but not yet
provisioned. We should not recompute the paths from the point of
failure to the end of time but for some predefined floating window
optimistic enough to give the failure time to recover, and reduce the
amount schedules that would be recomputed. For example, a floating one
hour window would mean all reservations up to an hour in the future that
could be impacted by the failure can be recomputed. If the failure is
cleared and the topology is restored then there is a one hour window
that should have been cleared. The interesting side-effect is we now
have a window of time to make sure the link remains trouble free. The
question is have we blocked that link from use or can a new schedule use
the remaining hour if it comes in after the trouble has cleared.
c) If a new reservation request for a future point in time arrives while
a failure has taken the link out of topology do we remove the link from
computation, or do we add an optimistic guard time after which we can
assume the link will be restored?
#2 is different from a fault condition in that an administrator has
removed the link from topology. We can model this gracefully if we can
have a high priority (preemptive) administration reservation that can
block the bandwidth on a link from the point in time the link will be
removed through until infinity. Any schedules this preemptive schedule
impacts will need to be recomputed as discussed in the previous example,
or if provisioned switched to protection/re-dialed to restore. At some
point on or after the start of the preemptive schedule the link can be
permanently removed from topology and the reservation blocking that link
cleared.
#3 is similar to #2 except there is a defined end time for the
preemptive schedule blocking the link. Only reservations overlapping
with the maintenance window would need to be recomputed. Obviously, any
provisioned schedules would need to be switched to protection or
re-dialed to restore.
John.
On 10-04-28 2:14 AM, Inder Monga wrote:
> Hi All,
>
> An updated draft based on comments. We attached a table in the front
> to summarize and use it for discussions. Look forward to discuss this
> tomorrow.
>
> Thanks,
> Inder
>
>
>
> On Apr 20, 2010, at 10:49 PM, Chin Guok wrote:
>
>> Hi all,
>>
>> I've attached a draft of the error handling section that Inder and I
>> came up with for the NSI Architecture document.
>>
>> This is a rough first draft, and there are some obvious portions
>> missing, but it gives an idea of where we heading.
>>
>> Comments are most welcomed.
>>
>> Thanks.
>>
>> - Chin<NSI Error Handling Chin_Inder
>> v2.docx>_______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
>
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/nsi-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100428/45a7bfc8/attachment.html
More information about the nsi-wg
mailing list