[Nsi-wg] NSI error handling draft - next version

John MacAuley john.macauley at surfnet.nl
Wed Apr 28 13:46:53 CDT 2010


I think the unwritten suggestion was that network administrators should 
have the facilities in the NRM to identify/manage #2 and #3 in a 
graceful fashion to avoid chaos during scheduled maintenance.  They 
would need to incorporate the additional procedures into their process.  
We just need to make sure we support the concept of preemptive schedules 
on topological links (although I know some people who would like a 
general preemptive schedule to whack existing schedules if needed).

John.

On 10-04-28 2:38 PM, Inder Monga wrote:
> John,
>
> Great points about administrative and maintenance procedures.
>
> We would have to make an assumption that the NSA/NRM gets an event 
> with the right "notification" of the reason for topology change - 
> through the OSS/network management platform. Otherwise, we will not be 
> able to differentiate between the cause of the topology change and 
> will not be able to estimate the duration of that change like in case 
> of maintenance. We can assume the default case to be #1 if the not 
> notified of the exact cause.
>
> Thanks,
> inder
>
> On Apr 28, 2010, at 8:50 AM, John MacAuley wrote:
>
>> Peoples,
>>
>> Had someone show up in my office so I missed the conversation over 
>> "Resource change from available to not available."  I thought I would 
>> provide some input on the topic based on my DRAC experiences.
>>
>> I think there are three types of events that can initiate a topology 
>> change that should be understood when defining the error handling.  
>> Two of these are actually not errors but normal operating procedures 
>> within a network:
>>
>> 1. Physical network failure resulting in a topology change - 
>> typically the temporary removal of a link from topology with no 
>> knowledge of when it will be restored.
>>
>> 2. The permanent removal of a link from the topology by a network 
>> administrator.  Actually, this one should include the reconfiguration 
>> of the network where an entire node could be removed.
>>
>> 3. The temporary removal of a link by a network administrator for 
>> maintainence purposes.  This will typically have a defined start and 
>> end time based on the maintenance window.
>>
>> #1 is interesting in that it impacts existing schedules in an 
>> in-service state, reserved schedules not yet in service, and any new 
>> reservation requests.
>>
>> a) Those schedules in-service using the links impacted by the 
>> topology change may undergo some type of restoration.  If this was a 
>> protected circuit then underlying transport will restore the service 
>> and we may not want to do anything about it.  If this was an 
>> unprotected service then perhaps re-dial could be initiated by the 
>> NRM in an attempt to achieve a lazy restore.
>>
>> b) Depending on the estimated length of the temporary topology change 
>> we may need to recompute the paths of those schedules reserved but 
>> not yet provisioned.  We should not recompute the paths from the 
>> point of failure to the end of time but for some predefined floating 
>> window optimistic enough to give the failure time to recover, and 
>> reduce the amount schedules that would be recomputed.  For example, a 
>> floating one hour window would mean all reservations up to an hour in 
>> the future that could be impacted by the failure can be recomputed.  
>> If the failure is cleared and the topology is restored then there is 
>> a one hour window that should have been cleared.  The interesting 
>> side-effect is we now have a window of time to make sure the link 
>> remains trouble free.  The question is have we blocked that link from 
>> use or can a new schedule use the remaining hour if it comes in after 
>> the trouble has cleared.
>>
>> c) If a new reservation request for a future point in time arrives 
>> while a failure has taken the link out of topology do we remove the 
>> link from computation, or do we add an optimistic guard time after 
>> which we can assume the link will be restored?
>>
>> #2 is different from a fault condition in that an administrator has 
>> removed the link from topology.  We can model this gracefully if we 
>> can have a high priority (preemptive) administration reservation that 
>> can block the bandwidth on a link from the point in time the link 
>> will be removed through until infinity.  Any schedules this 
>> preemptive schedule impacts will need to be recomputed as discussed 
>> in the previous example, or if provisioned switched to 
>> protection/re-dialed to restore.  At some point on or after the start 
>> of the preemptive schedule the link can be permanently removed from 
>> topology and the reservation blocking that link cleared.
>>
>> #3 is similar to #2 except there is a defined end time for the 
>> preemptive schedule blocking the link.  Only reservations overlapping 
>> with the maintenance window would need to be recomputed.  Obviously, 
>> any provisioned schedules would need to be switched to protection or 
>> re-dialed to restore.
>>
>> John.
>>
>> On 10-04-28 2:14 AM, Inder Monga wrote:
>>> Hi All,
>>>
>>> An updated draft based on comments. We attached a table in the front 
>>> to summarize and use it for discussions. Look forward to discuss 
>>> this tomorrow.
>>>
>>> Thanks,
>>> Inder
>>>
>>>
>>>
>>> On Apr 20, 2010, at 10:49 PM, Chin Guok wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've attached a draft of the error handling section that Inder and 
>>>> I came up with for the NSI Architecture document.
>>>>
>>>> This is a rough first draft, and there are some obvious portions 
>>>> missing, but it gives an idea of where we heading.
>>>>
>>>> Comments are most welcomed.
>>>>
>>>> Thanks.
>>>>
>>>> - Chin<NSI Error Handling Chin_Inder 
>>>> v2.docx>_______________________________________________
>>>> nsi-wg mailing list
>>>> nsi-wg at ogf.org
>>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>    
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org <mailto:nsi-wg at ogf.org>
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
> ---
> Inder Monga http://100gbs.lbl.gov
> imonga at es.net <mailto:imonga at es.net> http://www.es.net
> (510) 499 8065 (c)
> (510) 486 6531 (o)
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100428/20248f61/attachment-0001.html 


More information about the nsi-wg mailing list