[Nsi-wg] Pathfinding and Service (a new topology suggestion)
Henrik Thostrup Jensen
htj at nordu.net
Thu Feb 5 09:59:33 EST 2015
Hi
(part three on security and pathfinding).
This one ended up being not so much about security, and more about
pathfinding and services. And quite long. Sorry about that.
We currently have a number of topology suggestions. None of them can
adequately describe complex transit policies. I will try to address the
issues of NML and GNS (The UvA suggestion is not a topology model),
suggest a new model and present some further issues which we haven't
considered yet.
* Overview
The main policy tool of NML is the switching service, which describes how
ports in a node can be connected. This approach falls short when selective
transit is provided, as this cannot be described as a switching service.
It also means that it is not possible to enforce this policy locally by
looking at the cross connect.
The GNS suggestion announces reachability to other networks, done at NSA
level. Hans pointed out that it should really be per network. However
there are cases where networks are connected over multiple links with
different policies, meaning that reachability would have to be provided
per link. This approach is then getting awfully close to that of BGP.
In BGP all reachability is explicit, as IP prefixes are announced (default
route is the exception, but reachability is still explicit). Further, AS
paths are announced, which makes it possible to build a network model.
This enables faster recovery in case of link/router failures as it is
possible to re-route traffic more efficiently. This is typically only done
in backbone networks.
Unlike BGP, NSI does not have a mechanism for aggregating its address
space (ports). Since Uppsala, we can infer the network from the port name,
however there is no way indicating which ports are closer to a certain
demarcation link, which makes optimal pathing difficult. However, it is
possible to split a network into multiple smaller networks for this, but a
metric is still needed. No good technical metric exists to announce
distances, and due to link pricing and policies, a distance metric must be
administrative (works well in BGP).
* Network & Services
Given our decision to link STPs and network identifiers it becomes
possible to do path finding to a network, instead to a specific port. This
drastically reduces the amount of information a path finder must be aware
of, and removes the problem of connecting to newly created STPs.
Path finding per network is somewhat difficult in NML as the problem of
adaptation between technologies in order to get the right service requires
intricate knowledge of the capabilities of the networks. IP solves this
problem by being a single universal service that can live on top of
different technologies. Since we want to be able to provide multiple
service types through NSI, this will not work for us. We also been
struggling quite a bit of tying together technology, adaptations and
services in NML as the combination becomes complicated quite quickly.
Recently I've talked to several persons about the ANA infrastructure. The
ANA infrastructure is interesting as the hardware comes from multiple
vendors, and the infrastructure must provide different services (best
effort circuits, circuits with guarantied bandwidth, and the same with
protected circuits). Focusing on technological capabilities becomes quite
messy in this scenario. Instead the important thing is to focus on what
services can be provided, and let the NSI Agents handle the service to
technology mapping. This is quite different from our previous approach.
In short: I think we need to move to a model that is service centric
instead of focusing on the technology of the equipment.
* Topology Model
I suggest a model that focuses on describing capability instead of the
basic resources. It is based around two principles:
- Network Reachability
- Service Capability
The first allows us to express complex transit policies in simple and well
proved way (BGP). The second allows us to abstract away technology
specific models and focus on what can services can actually be provided.
This model puts policy and services before technological capabilities.
By describing reachability, path finding becomes simple and predictable.
This is major advantage over the graph search approach in NML which can
produce extremely unfortunate effects such as cross-atlantic loops.
Defining reachability will require manual work, however this will always
be the case when describing policies.
Here is how the topology description could look like in XML:
<Network id="urn:ogf:network:aruba" version="123">
<Name>Aruba</Name>
<Link id="urn:ogf:network:aruba:bonaire-otn" demarcation="urn:ogf:network:bonaire:aruba-otn">
<Name>Bonaire OTN Link</Name>
<Service type="EVTS">
<ReachableNetwork id="urn:ogf:network:bonaire" distance="1" />
<ReachableNetwork id="urn:ogf:network:curacao" distance="2" />
</Service>
<Service type="OTN">
<ReachableNetwork id="urn:ogf:network:bonaire" distance="1" />
</Service>
</Link>
<Link id="urn:ogf:network:aruba:dominica-eth" demarcation="urn:ogf:network:dominica:aruba-eth">
<Name>Dominica Ethernet Link</Name>
<ServiceTransit type="EVTS" />
</Link>
<Service type="SwitchingService" />
</Network>
The topology description is really more of a transit table for services
than an actual topology description (however we have never been describing
topology, only networks and demarcations). Reachability is unidirectional
and cannot be turned around, however the service provided might
bidirectional. The description covers what the link can be used for from
outside the network.
The above example describes a network (Aruba), with two links. First link
is an OTN link towards the Bonaire network. This link provides two
services: EVTS and OTN. The EVTS service can reach the bonaire and curacao
network, where the OTN service can only reach bonaire. How the EVTS
service is carried on the OTN link is up to the aruba and bonaire network.
The details are encapsulated. The second link in the network is an
ethernet link towards the dominica network. This link provides what is
called service transit, for the EVTS service. This means that the dominica
can use the link to connect to any EVTS endpoint in the NSI system. This
is similar to the concept of a default route in BGP. Finally a switching
service is listed in the network. This denotes the capability to create a
service in the network.
There are several choices and tradeoffs in the design:
- Only describe demarcation links.
This means significantly less data to distribute and that the data will be
fairly stable. This also removes the need for complex distribution
mechanisms.
- For each link, the supported services and reachable network for each service
is listed along with a cost.
This makes path finding straightforward: Select the link that provides
reachability to the desired network that has the lowest value.
Cost is an administrative value. Lower is better (think hops).
The service first structure was chosen as I suspect it will be the
smallest, but it could as well be network first and services listed under
that.
If a network cannot bridge a service type from one link to another,
reachability for that network is not announced on link. This encapsulates
complicated technology restrictions in a simple way, that would otherwise
have to be built-in to pathfinders.
- A link can provide service transit. This means that all networks supporting
the service type should be reachable through that link. This would typically
be used by a backbone network towards its customers.
A customer may have other links that it could favor for some networks.
Similar to how most networks have a transit provider, but will have private
peering/links to other networks.
- Can describe network services
Some networks may provide services that are hosted in the network. This
could be a switching service which could be used to connect p2p links in
order to build a multi-domain switched network (I'll refrain from debating
if this is a good idea).
A list of potential network services:
- Switching service.
Will on creation return a number of STPs that can be used to connection P2P links
- Aggregation.
Aggregate the data from two or more unidirectional circuits into one.
- Ping service
Provides an STP with a pingable address.
The network may of course also supply these like today with pre-configured STP.
- Perfsonar
Would spawn a perfsonar instance and return an STP endpoint.
Label swapping capability is intentionally left out. We should stop
engineering for old technology, and underspecified STPs goes a long way
towards solving the issue if it should occur.
By describing services and the capabilities of the network in this way, it
becomes easier to introduce new services and possible to describe policies
that would otherwise be complicated or impossible to express with previous
suggestions.
* Pathfinding
As the networks explicitly describe reachability, an NSI Agent only needs
to fetch the topology (service table) document from its peers to know
which networks are reachable. This means that a document-distributing
system is not necessary, as the connections between networks is largely
static. The reachability document would only have to be updated at low
intervals. For leaf networks, an NSA would simply be setup with another
NSA as its transit provider. This NSA would then leave all pathfinding to
be done by the backbone network. This situation is similar to how many
organizations and smaller NRENs have a single transit provider, which does
all their connectivity.
The scheme assumes a chain-like path setup, which is intentional. Setting
up a circuit in chain mode allows each NSA to verify that the transit
policies of the network is enforced as it can see the destination network,
and check it against announced reachability. It is not possible to perform
such a check for UPAs when a circuit is setup in tree mode, as only the
crossconnect is know.
However chain mode will probably have to broken when crossing exchanges,
as the NSI agents for such will typically be UPAs and not care about
transit policies and link AUPs. Please refer to my previous email on
secure crossing of exchanges for details on this. This means that we will
end up with a hybrid pathfinding that does a tree with one branch being
one layer deep each time an exchange is crossed.
As chain pushes the decision for how to connect downstream, it does not
allow the same control over the circuit (unless using EROs). However
delegating this decision is likely to lead to fewer circuit setup
failures. Consider the following setup:
A = B = C
Three networks: A,B and C. A and B are connected via two links, and
similar for B and C (the = sign is two links). If A wants to create a
circuit to C, it will setup a link to B and ask B to setup a cross connect
into C. If one of the circuits between B and C cannot support the
capacity, B can automatically choose the link with enough capacity to C.
Only B and C has the knowledge about the state of their link, and are
hence better suited towards taking the decision of which one to use.
With explicit reachability and distances the process for setting up
circuits becomes simpler and more predictable. The distance vector in
particular makes sure that an optimal path is taken. This in contrast to
NML that has no mechanism to prefer certain paths over others.
* Handling Failure in Pathfinding
Handling failure circuit(service) setup failures and attempting alternate
paths (re-pathing) with only reachability information can be a bit tricky.
This section covers how this can be solved.
Currently we have the notion that aggregators should not try and fix
failures, only return them to the UPA. If we relax this latter notion, it
opens up the possibility for aggregators along the path to try alternate
paths. However such attempts should be restricted to certain behavior,
e.g., a backbone network can try alternative links towards a customer.
Re-pathing should not use transit/peers towards customers or use transit
to go to peers.
If the notion of aggregators not being able to re-pathing holds, it is
possible for a UPA to perform re-pathing, by constructing an ERO that
doesn't take the link without enough capacity. Constructing such an ERO is
likely to require more information than what is available from the
topology of the peers and transit provider for the UPA. To construct the
ERO it would have to fetch the topology of the backbone networks to build
up a model of the network to find an alternative path. The reason only
backbone networks is relevant is that end organization are typically
multi-homed into the same provider, which makes smaller network largely
irrelevant for re-pathing. Since the number of backbone network is
relatively small and the information in their topology description is
largely static, fetching them directly is simple and convenient.
Performing re-pathing at the UPA is however suboptimal. Leaf networks will
typically be very limited in their connectivity, where the service
provider (backbone network) will have a much degree of connectivity. Hence
it makes sense for the service provider to perform re-pathing. This is
similar to how BGP re-routing is not done by smaller tier-2 and tier-3
networks. Re-pathing at smaller networks does not make sense as their
connectivity is typically rather limited. However customers and providers
can agree where and if re-pathing should be done (similar to how some IP
customers get a full routing table, and others do not)
Best regards, Henrik
Henrik Thostrup Jensen <htj at nordu.net>
Software Developer, NORDUnet
More information about the nsi-wg
mailing list