[Nsi-wg] [NSI imp] NSI implementation call cancelled

Wed Mar 12 15:03:01 EDT 2014

Thank you for the well thought out feedback.  I have included some comments here:

> Scalability: When aggregators cannot handle the load of messages, the only thing
> that can be done is to use a more powerful system for the aggregators (scale up
> architecture).

Yes, this is true; however, there are two options for this scale up approach.  Put in place a larger machine. or distribute the Aggregator functions onto multiple machines.  Each NSI service is logically separated with advertised interfaces so the connection service function and be on a separate server from the discovery function. In addition, a reasonable software architecture can distribute load across multiple servers if really required.

> Scalability: One aggregator that cannot handle the load or has other problems will affect
> the performance of the whole system because other aggregators might depend on his updates
> (see also next item). 

Yes, any peer-to-peer system where there are super nodes is impacted by possible performance issues.  Same is true for all routing based systems as I understand them.

> - Scalability: The general idea is that an aggregator filters messages / documents from other
> aggregators and only subscribe for updates on one peer. Updates might not arrive and cannot
> be flooded if that peer has problems. As I understand it, this filtering is also needed for loop
> prevention.

Not exactly, an aggregator can subscribe for updates from all peers and discards duplicate notifications.  An optimization was to subscribe for update events from only a designated peer, but if people are worried about redundancy then there is no reason not to subscribe for updates from all peers.  If I have subscribed for updated from my peers then the only time an update will not make it across the network is if there is a choke point at a single aggregator.  If there is then administrators deployed the network incorrectly.

> Scalability: Current routing protocols such as RIP, OSPF and BGP do receive updates
> from all neighbors. This way they can quickly react to changes in the topology. However,
> the updates in current routing protocols are much smaller and easier to parse than Discovery
> and Topology Documents in NSI.

I think we need to set some context here with respect to discovery - the discovery service itself is not processing the contents of the documents.  It has no clue what is in the document.  It is the other NSI functions that are using the discovered documents.  Discovery is only providing the exchange framework.

Now onto the issue of rate-of-change these document will encounter.  There seems to be some preconceived notion that an NSA Discovery document is going to have a rapid rate of change.  These documents should be able to go days, weeks, months, even years without requiring an update when on a stable system.  They do not contain data that changes rapidly, and therefore, have very little impact on the system.

As for topology, I think the rate of change and what gets exchanged is up for debate.  We are modelling service topologies and not individual packet forwarding topologies.  Does this make a difference?  Maybe.  For now we decided on exchanging the full topology documents because they are not expected to have a large rate of change in the short term.  As we get more advanced and decide to model more dynamic changes, then a better approach may be flooding ono updates.

The following table has the estimated document sizes for NSA Discovery and NML topology 1,000 and 300 bidirectional ports using PortGroup summarization. I picked 1,000 E-NNI and UNI ports as representative of an R&E network today, with a 70% UNI ports and 30% E-NNI ports.  The 1,000 ports describes all edge ports, while the 300 only inter-domain ports.

Document

Uncompressed

Compressed

NSA Discovery

5 KB

2 KB

NML Topology (1,000 ports)

1.5 MB

85 KB

NML Topology (300 ports)

450 KB

26 KB

We can then extrapolate the total size of XML documents an aggregator would need to maintain for 1,000 ports per network using the following table.  Sorry, I did not cover 100k networks but you can do the math.

Global network size

Combined sizes (uncompressed)

Combined sizes (compressed)

10,000 networks

14.6 GB

850 MB

5,000 networks

7.3 GB

425 MB

1,000 networks

1.5 GB

85 MB

500 networks

750 MB

42 MB

If we expect each network to issue an NSA discovery and NML topology document change once a day, then worse case for an aggregator will be 850 MB x n-peers of data.  Spread over a 24 hour period we have a worse case average of 81 Kbps per peer.

> We cannot expect aggregators to be as fast and to be able
> to process as much updates as with current routing protocols. The proposed solution combines
> some properties from OSPF (exchange of full topology, although in a different way), BGP
> (multi-domain, signaling path based on policy) and routing protocols in general (flooding of
> messages) but there are drawbacks to that and NSI is a whole different beast that might require
> a whole different approach.

My main assumption for the first release was that the rate of change on NML documents would not be at the rates you are discussing.  I would assume the level of our service abstractions should insulate us somewhat from unstable lower level infrastructure.

I can also bring arguments to the table about using COTS hardware and the available power versus an imbedded routing engine inside a vendors piece of equipment.  In my last job we ran a full SOAP protocol stack for our interface, along with telephone protocols (SIP, JTAPI, etc.) with thousands of messages a second in and out of the box.  Most messages were about the size of our NSA Discovery documents.  We had CPU to spare.

> Scalability: Aggregators are overloaded with functionality. All functionality is put into one box.
> From what I understand aggregators are used to exchange Discovery and Topology Documents,
> for path finding, for connection setup, for relaying monitoring information... And every new service
> that will be invented will make use of the same signaling path that consists of the same aggregators.
> Excessive load on one of these functions will affect the performance of other functions.

As discussed on the call, an aggregator is a logical entity that can be separated into physical components.  In addition, it is possible using well defined software architectures to distribute even these single functions across multiple servers.

> Scalability: Checking signatures and resigning Discovery and Topology Documents will add a
> considerable amount of load on the aggregators which must be taken into account, especially
> because you can only scale up and not scale out the aggregators.

Once again, the discovery service does not touch the document itself, and does not need to validate the signature.  That is left up to the end user of the data, whether that is the path finder, the connection service engine, or an end application.

> Scalability: In general, the aggregators seem to be a bottleneck in the proposed model. Services
> are not segregated and there is no easy way to scale the performance of the aggregators.

I understand your concerns, but I believe we do have clear segregation.  Scalability may be an issue at some point, but this can also be addressed in turn.  Our routing does not need to be flat, she can create hierarchies if needed.

> Flexibility: No distinction is made between identifiers and addresses (no mobility, multi-homing…)

Not sure I understand this one.  Discovery just deals with identifiers within the document meta data used to move documents through the network.  The only addresses being used are the discovery protocol endpoint for accessing the associated HTTP server.

> Security: All trust is put into the aggregators. Because anyone can build and operate his own
> aggregator and aggregators resign all documents with their own private key, this can be seen
> as a security risk. It will have the same problems as BGP has. Currently, for BGP, this problem
> is being fixed with RPKI and BGPsec. I propose to learn from history and do it right from the start.
> Even it is not implemented in the beginning, it should be clear that a feasible, scalable and secure
> solution is possible within the proposed architecture.

Sorry this was not clearer, but the document is signed by the issuer of the document.  This is the original uPA for NSA Discovery and NML topology, or an Aggregator for its own NSA discovery document.  The signing is only done at document generation time and left untouched through the network.

> Security: At this point, it is not known how the PKI will be implemented. Besides the technical part
> there is also an important organizational part within a PKI. Because the PKI and the signing of
> documents are the primary way that is proposed to secure the system, it is advisable to have at
> least a proposal ready that works on paper before any agreement is made on the NSI Discovery
> and Topology Distribution Services solutions. Security must be part of the design, not an
> afterthought.  

Well, at this time I would say no PKI and no signatures and it goes with the same level of security as the NSI CS v2.0 protocol.  In NSI CS there is no way of determining the originator of a service request, nor if the original request has been tampered with.  No reason to raise the bar with the discovery service when we can have parity.  This also effectively stops the discussion.

> Some of the mentioned issues can be solved relatively easily. Others are much harder to solve.
> One of the main issues I have is that the aggregators that form the signaling plane (together with
> uRA’s and uPA’s) seem to form a bottleneck. I do see where it comes from however. It is part of
> the architecture described in the Network Services Framework document. I think it is fine for the
> NSI Connection Service, but I also think it will form a bottleneck when all services will make use
> of it. The distinction between identifiers and addresses can be incorporated if this is something
> of value to the NSI community (I think it is). The security part will need some more work before we
> can actually decide if the proposed solution can be standardized. 

Yes, the road to a standard is a long and winding one.  Unfortunately, we are deploying prototype solutions now into production while the group agrees on a standard.  I will not stand in the way of the group's success.

John

On 2014-03-12, at 8:02 AM, Diederik Vandevenne <diederik.vandevenne at surfsara.nl> wrote:

> Hi everyone,
> 
> I want to react to John’s e-mail from the 5th of March "NSI implementation call cancelled" and hear the opinion of the whole NSI community on how to go forward.
> 
>> I know we are a very opinionated group with a lot of good feedback; however, we need to make progress on key issues before production implementations can be deployed.  For this reason we need people with ideas and constructive feedback on the key topics to get it down in contributions with clearly defined solutions that meet the requirements and fit into the NSI Service Framework.  We do not want contributions that redefine the existing NSI architecture and throw out 5 years worth of requirements, so please keep it focused and on track. 
> 
> I partly agree with John. In my opinion, the most important key issues at this moment are the NSI identifier format (and the identifier versus address issue), the NSI Discovery and Topology Services (as they are similar) and the security of those services (including “Verification of topology”). Despite the fact their is little discussion on the mailing lists, it seems it is hard to agree to a specific solution and some members are even implementing their own ideas. 
> 
> However, I do not think that pushing forward and creating standards is the best route to take. Maybe we have to hold back a moment and think about what we really want to accomplish. The requirements from 5 years back may not reflect the needs we see now. 
> 
> I think the NSI identifier and/or address format is related to the distribution of topology information. We can only agree on a solution if we see the full picture and thus we should merge those two problems. 
> 
> In general, the proposal from ESnet about the NSI Discovery and Topology Services might be the best solution (with some little changes and additions) if we hold to the requirements from 5 years back and the current architecture of the Network Services Framework (Are those requirements documented somewhere?). However, I have some issues with it and I am not sure if it will work the way we want it to. I have attached a list of issues to this e-mail. 
> 
> My question to the community is if this is really what we want? Or do we want to take a better look at the requirements first to be able to come up with a better solution that fit our needs? I have attached a list of requirements I have to this e-mail to make a start. I would like you to ask to make your own list of requirements and send it to the mailing list.
> 
> I do have my own ideas about how to approach the NSI Discovery and Topology Services problem. However, my solution does not fit the current requirements and Network Services Framework, although it has similarities. I will present an architectural overview later. If anyone is interested in it, I might spend more time to it. If everyone thinks we should stick to the solution proposed by ESnet and the current requirements, I will put my energy in that and other issues we still have.
> 
> My goal is to contribute to a solution that meets the requirements of the community and to start a constructive conversation about how to go forward. I have no intentions to create more separation in the group.
> 
> The documents I present here and the ideas I have are influenced by the regular talks I had with Miroslav (UvA) and Freek (SURFsara). However they may have another opinion on some parts so I do not speak for anyone but myself. 
> 
> 
> Kind regards,
> 
> Diederik
> 
> 
> 
> 
> SURFsara heeft een nieuw algemeen telefoonnummer: 020 800 1300
> 
> | Diederik Vandevenne | Infrastructure Services  | SURFsara |
> | Science Park 140 | 1098 XG | Amsterdam |
> | T 06 4798 8196 | diederik.vandevenne at surfsara.nl | www.surfsara.nl |
> 
> ________________________________________
> From: John MacAuley [macauley at es.net]
> Sent: Wednesday, March 05, 2014 3:08 AM
> To: NSI implementation group; NSI Working Group
> Subject: [NSI imp] NSI implementation call cancelled
> 
> Peoples,
> 
> I have decided to cancel this Wednesday's NSI implementation call.  I do not believe the call would be productive and we should all be preparing for the face-to-face in Atlanta in less than two weeks.
> 
> We have had some good discussions since the meetings in Oxford, but I am not completely happy with the overall progress.  To quote Scrum terminology: we have very few pigs and a whole lot of chickens.  We need more pigs.
> 
> I know we are a very opinionated group with a lot of good feedback; however, we need to make progress on key issues before production implementations can be deployed.  For this reason we need people with ideas and constructive feedback on the key topics to get it down in contributions with clearly defined solutions that meet the requirements and fit into the NSI Service Framework.  We do not want contributions that redefine the existing NSI architecture and throw out 5 years worth of requirements, so please keep it focused and on track.
> 
> Here are the items that need to be addressed in the Atlanta meetings.  It is long and I know we do not have a lot of time.  I will discuss with Guy to determine in which time slots the topics are best discussed.
> 
> ·      NSI identifier format (STP, et al.)
> ·      NSA Discovery Document (John to present overview and next steps)
> ·      NSI Discovery Service (John to present overview of his solution, ?)
> ·      NSI Topology Service (if distinct from Discovery Service)
> ·      Verification of topology information (Chin)
> ·      User security (Hans to present SURFnet solution)
> ·      Ethernet extensions (John and Freek)
> ·      NML Adaptation (John and Freek)
> ·      NSI-EXT document (John)
> 
> Are there additional topics to discuss?
> 
> If you are planning on contributing a presentation or formal document please let me know so we can allocate you a time slot.  This is your chance to shed those feathers and be a resplendent pig.
> 
> John
> <NSI-Discovery-Topology-Distribution-Requirements.txt><Issues-ESnet-Proposal.txt>_______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/nsi-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/nsi-wg/attachments/20140312/39080e87/attachment-0001.html>