[Nsi-wg] NSI error handling draft - next version

John MacAuley john.macauley at surfnet.nl
Wed Apr 28 10:50:09 CDT 2010


Peoples,

Had someone show up in my office so I missed the conversation over 
"Resource change from available to not available."  I thought I would 
provide some input on the topic based on my DRAC experiences.

I think there are three types of events that can initiate a topology 
change that should be understood when defining the error handling.  Two 
of these are actually not errors but normal operating procedures within 
a network:

1. Physical network failure resulting in a topology change - typically 
the temporary removal of a link from topology with no knowledge of when 
it will be restored.

2. The permanent removal of a link from the topology by a network 
administrator.  Actually, this one should include the reconfiguration of 
the network where an entire node could be removed.

3. The temporary removal of a link by a network administrator for 
maintainence purposes.  This will typically have a defined start and end 
time based on the maintenance window.

#1 is interesting in that it impacts existing schedules in an in-service 
state, reserved schedules not yet in service, and any new reservation 
requests.

a) Those schedules in-service using the links impacted by the topology 
change may undergo some type of restoration.  If this was a protected 
circuit then underlying transport will restore the service and we may 
not want to do anything about it.  If this was an unprotected service 
then perhaps re-dial could be initiated by the NRM in an attempt to 
achieve a lazy restore.

b) Depending on the estimated length of the temporary topology change we 
may need to recompute the paths of those schedules reserved but not yet 
provisioned.  We should not recompute the paths from the point of 
failure to the end of time but for some predefined floating window 
optimistic enough to give the failure time to recover, and reduce the 
amount schedules that would be recomputed.  For example, a floating one 
hour window would mean all reservations up to an hour in the future that 
could be impacted by the failure can be recomputed.  If the failure is 
cleared and the topology is restored then there is a one hour window 
that should have been cleared.  The interesting side-effect is we now 
have a window of time to make sure the link remains trouble free.  The 
question is have we blocked that link from use or can a new schedule use 
the remaining hour if it comes in after the trouble has cleared.

c) If a new reservation request for a future point in time arrives while 
a failure has taken the link out of topology do we remove the link from 
computation, or do we add an optimistic guard time after which we can 
assume the link will be restored?

#2 is different from a fault condition in that an administrator has 
removed the link from topology.  We can model this gracefully if we can 
have a high priority (preemptive) administration reservation that can 
block the bandwidth on a link from the point in time the link will be 
removed through until infinity.  Any schedules this preemptive schedule 
impacts will need to be recomputed as discussed in the previous example, 
or if provisioned switched to protection/re-dialed to restore.  At some 
point on or after the start of the preemptive schedule the link can be 
permanently removed from topology and the reservation blocking that link 
cleared.

#3 is similar to #2 except there is a defined end time for the 
preemptive schedule blocking the link.  Only reservations overlapping 
with the maintenance window would need to be recomputed.  Obviously, any 
provisioned schedules would need to be switched to protection or 
re-dialed to restore.

John.

On 10-04-28 2:14 AM, Inder Monga wrote:
> Hi All,
>
> An updated draft based on comments. We attached a table in the front 
> to summarize and use it for discussions. Look forward to discuss this 
> tomorrow.
>
> Thanks,
> Inder
>
>
>
> On Apr 20, 2010, at 10:49 PM, Chin Guok wrote:
>
>> Hi all,
>>
>> I've attached a draft of the error handling section that Inder and I 
>> came up with for the NSI Architecture document.
>>
>> This is a rough first draft, and there are some obvious portions 
>> missing, but it gives an idea of where we heading.
>>
>> Comments are most welcomed.
>>
>> Thanks.
>>
>> - Chin<NSI Error Handling Chin_Inder 
>> v2.docx>_______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
>
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/nsi-wg
>    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100428/45a7bfc8/attachment.html 


More information about the nsi-wg mailing list