[Nsi-wg] NSI error handling draft - next version

John Vollbrecht jrv at internet2.edu
Tue May 4 16:20:40 CDT 2010


Seems like this is a good place to think about the relationship of  
Management and Service planes.   This is -- I think -- different that  
between transport and service planes.  Interesting -- the planes  
picture might come into its own.

John

On Apr 28, 2010, at 2:38 PM, Inder Monga wrote:

> John,
>
> Great points about administrative and maintenance procedures.
>
> We would have to make an assumption that the NSA/NRM gets an event  
> with the right "notification" of the reason for topology change -  
> through the OSS/network management platform. Otherwise, we will not  
> be able to differentiate between the cause of the topology change  
> and will not be able to estimate the duration of that change like in  
> case of maintenance. We can assume the default case to be #1 if the  
> not notified of the exact cause.
>
> Thanks,
> inder
>
> On Apr 28, 2010, at 8:50 AM, John MacAuley wrote:
>
>> Peoples,
>>
>> Had someone show up in my office so I missed the conversation over  
>> "Resource change from available to not available."  I thought I  
>> would provide some input on the topic based on my DRAC experiences.
>>
>> I think there are three types of events that can initiate a  
>> topology change that should be understood when defining the error  
>> handling.  Two of these are actually not errors but normal  
>> operating procedures within a network:
>>
>> 1. Physical network failure resulting in a topology change -  
>> typically the temporary removal of a link from topology with no  
>> knowledge of when it will be restored.
>>
>> 2. The permanent removal of a link from the topology by a network  
>> administrator.  Actually, this one should include the  
>> reconfiguration of the network where an entire node could be removed.
>>
>> 3. The temporary removal of a link by a network administrator for  
>> maintainence purposes.  This will typically have a defined start  
>> and end time based on the maintenance window.
>>
>> #1 is interesting in that it impacts existing schedules in an in- 
>> service state, reserved schedules not yet in service, and any new  
>> reservation requests.
>>
>> a) Those schedules in-service using the links impacted by the  
>> topology change may undergo some type of restoration.  If this was  
>> a protected circuit then underlying transport will restore the  
>> service and we may not want to do anything about it.  If this was  
>> an unprotected service then perhaps re-dial could be initiated by  
>> the NRM in an attempt to achieve a lazy restore.
>>
>> b) Depending on the estimated length of the temporary topology  
>> change we may need to recompute the paths of those schedules  
>> reserved but not yet provisioned.  We should not recompute the  
>> paths from the point of failure to the end of time but for some  
>> predefined floating window optimistic enough to give the failure  
>> time to recover, and reduce the amount schedules that would be  
>> recomputed.  For example, a floating one hour window would mean all  
>> reservations up to an hour in the future that could be impacted by  
>> the failure can be recomputed.  If the failure is cleared and the  
>> topology is restored then there is a one hour window that should  
>> have been cleared.  The interesting side-effect is we now have a  
>> window of time to make sure the link remains trouble free.  The  
>> question is have we blocked that link from use or can a new  
>> schedule use the remaining hour if it comes in after the trouble  
>> has cleared.
>>
>> c) If a new reservation request for a future point in time arrives  
>> while a failure has taken the link out of topology do we remove the  
>> link from computation, or do we add an optimistic guard time after  
>> which we can assume the link will be restored?
>>
>> #2 is different from a fault condition in that an administrator has  
>> removed the link from topology.  We can model this gracefully if we  
>> can have a high priority (preemptive) administration reservation  
>> that can block the bandwidth on a link from the point in time the  
>> link will be removed through until infinity.  Any schedules this  
>> preemptive schedule impacts will need to be recomputed as discussed  
>> in the previous example, or if provisioned switched to protection/ 
>> re-dialed to restore.  At some point on or after the start of the  
>> preemptive schedule the link can be permanently removed from  
>> topology and the reservation blocking that link cleared.
>>
>> #3 is similar to #2 except there is a defined end time for the  
>> preemptive schedule blocking the link.  Only reservations  
>> overlapping with the maintenance window would need to be  
>> recomputed.  Obviously, any provisioned schedules would need to be  
>> switched to protection or re-dialed to restore.
>>
>> John.
>>
>> On 10-04-28 2:14 AM, Inder Monga wrote:
>>>
>>> Hi All,
>>>
>>> An updated draft based on comments. We attached a table in the  
>>> front to summarize and use it for discussions. Look forward to  
>>> discuss this tomorrow.
>>>
>>> Thanks,
>>> Inder
>>>
>>>
>>>
>>> On Apr 20, 2010, at 10:49 PM, Chin Guok wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've attached a draft of the error handling section that Inder  
>>>> and I came up with for the NSI Architecture document.
>>>>
>>>> This is a rough first draft, and there are some obvious portions  
>>>> missing, but it gives an idea of where we heading.
>>>>
>>>> Comments are most welcomed.
>>>>
>>>> Thanks.
>>>>
>>>> - Chin<NSI Error Handling Chin_Inder  
>>>> v2.docx>_______________________________________________
>>>> nsi-wg mailing list
>>>> nsi-wg at ogf.org
>>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>>
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> http://www.ogf.org/mailman/listinfo/nsi-wg
>>>
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
>
> ---
> Inder Monga				http://100gbs.lbl.gov
> imonga at es.net			http://www.es.net
> (510) 499 8065 (c)		
> (510) 486 6531 (o)		
>
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/nsi-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20100504/eb9b075f/attachment.html 


More information about the nsi-wg mailing list