[Nsi-wg] Message Delivery Layer

Fri Dec 7 07:49:42 EST 2012

Hi

Shouldn't the protocol takes care of protocol stuff only? Messaging layer is
something out of scope IMHO. There is already multiple more or less
standardized approaches on how to send a message in order to assure it's
delivered or failed, and also working frameworks are available for this
depending on implementations (JMS, or some concepts related to SOA). IMHO we
should list the requirements and recommendations for this layer (e.g.
security, ciphering, maximum delivery time), which are crucial for protocol
operations, but we should not define yet another message transport layer.
This in fact is flexibility, so one can adopt NSI to anything which gives
the features we need (which are rather common, like please deliver it, and
please keep the message order). We are interested in two things:

-          Send a message to a remote Agent (use timeout for delivery)

-          Notify if transport failed after timeout

-          Get a message from a remote Agent

Would that be WebServices - actually I don't care, unless all the correct
fields are there and it is in align with state machine. Is it written in
Java or C++ - actually I don't care, feel free to use brainfuck, OISC, Thue,
Whitespace or if those are too easy, use C# ;) 

The demo cloud, or operational cloud require an agreement to use the message
layer tools which can collaborate, i.e. agree to use one (so we use
WebServices). But we should not prevent people to use it different way
around. Yes, NSI will not be compatible with other NSI agent using different
message layer. But protocol definition should not solve all the issues with
communication at once. This is implementation independent, not protocol
dependent. That's my 5 cents. Disagreements are welcome (and rather expected
:) )

Best regards

Radek

______________________________________________

Radosław Krzywania

Network Department of

Poznan Supercomputing and Networking Center

http://www.man.poznan.pl

______________________________________________

From: nsi-wg-bounces at ogf.org [mailto:nsi-wg-bounces at ogf.org] On Behalf Of
Jerry Sobieski
Sent: Friday, December 07, 2012 1:22 PM
To: Jeroen van der Ham
Cc: NSI Working Group
Subject: Re: [Nsi-wg] Message Delivery Layer

My suggestion is to formalize the Message Transport Layer:

The MTL is responsible for taking a message from the NSI protocol layer,
placing it in a transmit queue, and ensuring that: a) the msg gets sent, or
b) there is an indication that the message cannot be sent.

The NSI protocol layer invokes this MTL_Send() function by presenting 1) the
NSI protocol message, 2) the *NSA ID* destination (not a web service
endpoint), 3) a callback for normal completion, 4) a timeout value, and 5) a
callback for timeout error.

The MTL places the message on a queue for the indicated target NSA, and
transmits the messages in a FIFO order.  Thus there is a different queue for
each destination NSA.    

Whenever a message is queued for transmission to a destination NSA, the MTL
will check to see if there is an open TCP/SSL session to that destination
NSA.   If there is, then the message is transmitted on that session using
the exchange below.  If there is no active session, the MTL will try to find
the appropriate addressing information for that target NSA and open a
session to it.  If successful, the local MTL will proceed with the
transmission.  If the session is not established(e.g. no target address info
is available, or the dest is unreachable, or behind a FW, etc.) the local
sending MTL leaves the message on the send queue and waits a period before
retrying the session establishment.

The session can also be established by the remote MTL.  When a TCP/SSL
session is received from a remote agent, the MTLs exchange NSA identifiers
and the session is bound to the appropriate NSA send queue.  (Sessions are
bi-directional.)

Each NSI protocol message is exchanged by MTLs using the following sequence:
    Source NSA                    Dest NSA
    MsgXmit(msg, msgid)->                               Sends the entire
message along with a message id/seq number 
                                     <- MsgRecv(msgid)     Responds that the
message id has been received, stored, and queued
    MsgRecvAck(msgid)   ->                               Acknowledges the
completion of the transaction.

This exchange conclusively indicates to both NSAs that the messages has been
moved from the local sending system to the destination system where it has
been saved in a persistent store and queued for processing by the
destination NSA.  

Upon successful transmission, the message is removed from the send queue,
the timer is canceled, and the successful send callback is invoked. 

The MTL will continue to try to send the message at the front of the send
queue until the timeout expires.   When the timeout expires, the message is
removed from the queue, and the timeout callback is invoked.   Other
messages destined for that NSA may exist in the queue as well and are
blocked.   Their timeouts may also expire and are treated similarly.   So
any message who's timer expires will be removed from the queue and the
timeout callback invoked.

For sending a message, this process allows the protocol layer to simply
earmark a message for a destination NSA and only be informed of success or
failure.  The NSI protocol layer does not care why a message was unable to
be sent, just that it did not make it after a pre-specified time of trying
to do so.   The MTL can log attempts or perform other actions so that more
detailed forensic information is available or actions can be taken, but
these other actions are not NSI protocol layer functions.  

Further, this MTL mechanism can take advantage of proven protocols such as
TCP to assure delivery of the MTL message exchanges - substantially
simplifying the MTL and minimizing redundant functionality.  However, TCP or
SSL or HTTP/S will not do everything - the MTLs are responsible for managing
the NSA send queues (note the NSI layer may be mutlithreaded), session
establishment/retries, store management, timeout processing, etc.

Similar processing is done by the MTL upon receiving a message.  The
received message is timestamped and entered into a persistent store to
enable recovery should the receiving host or process be interrupted.  After
the message is successfully stored, the message is placed on the input event
queue of the local NSA protocol layer.  Then the MsgRecv'd message is sent
to the source MTL.

Note:  There are two timers of interest here: the transmission timeout value
mentioned above, and the NSI protocol response timeout value that dictates
how long the local NSA is willing to wait for a protocol response from the
remote NSA.    The NSI protocol layer will place a message on the send queue
with a transmission timeout.  But the NSI layer is actually only concerned
with whether the protocol primitive was acted upon by the remote NSA withn a
specific timeframe - enter the response timer.  If there is no response from
the remote NSA, then the local response timer expires and the local NSA has
to recover.   Then and only then does the *protocol* layer need to recover.
And knowing if the remote NSA ever got the message is an important piece of
info in determining how to recover from the protocol response timeout.
Thus, if the transmission timeout exceeds the response timeout, the NSI
layer may timeout before the MTL has given up trying to send the message.
Else, if the transmission timeout is small compared to the response timer,
the send timeout may occur too fast - not allowing the remote system enough
time to establish a session before the sent message is timed out.   So, it
is suggested that the response timer should be set in sequence with the
transmission timer - not overlapping.  Thus the response timeout would not
begin until the transmission has succeeded.   However, issues such as slow
session establishment can still impact upstream response timers in the
service tree.   This remains an issue of concern.

Thoughts?
Jerry

On 12/7/12 5:04 AM, Jeroen van der Ham wrote:

Hi,

The problem that I am trying to solve is the situation where the client is
possibly behind a firewall/NAT/whatever, where the client is the only one
capable of setting up a bidirectional TCP session.

Right now the NSI protocol breaks in that situation, because it insists on
sending the acknowedgements through a separate channel that is independently
setup by the server back to the client.

In my opinion the simplest way to solve that is indeed to make the callback
optional and allow clients to poll for updates. The reserveConfirmed may or
may not be sent. But getting it failed to deliver would not trigger a fault.

Jeroen.

On 7 Dec 2012, at 10:26, Henrik Thostrup Jensen  <mailto:htj at nordu.net>
<htj at nordu.net> wrote:

On Thu, 6 Dec 2012, Jeroen van der Ham wrote:

A feasible first step in my opinion would be to make acknowledgements
optional. Combined with a good query interface this already makes it
possible to operate a client NSA behind a firewall.

Are you suggesting to make callback optional and resolve to polling for
updates? I.e., never getting a reserveConfirmed message from a reserve
request.

(btw. I think polling is completely acceptable - most people who have build
a distributed system with events are painfully aware that they sometime
disappear and that one will have to resolve to polling as a fallback).

OR:

Thowing out the callback scheme, i.e., getting a reserveConfirmed as the
direct reply to a reseve request. This will mean some potential
long-standing requests (not that it is a problem), probabaly some minutes.

This can also be optional (replyTo -> yes to callback, no replyTo -> direct
reply). I'd prefer not to have this dual behavior due to implementation
complexity.

--

Originally we chose to have the callbacks as some of the commands could take
a very long time to complete. I think this was especially for provision,
which would not trigger until the link came up (could be weeks), however
with the notification mechanism in NSI2, that is no longer the case
(provisionConfirmed now indicates that all NSAs have acknowledged the
provision request).

If we are willing to handle request delays of a couple of minutes (most will
be faster), we could forgo the callbacks for requests, and only deal with
callbacks for notifications like active and forcedEnd.

   Best regards, Henrik

Henrik Thostrup Jensen <htj at nordu.net>
Software Developer, NORDUnet

_______________________________________________
nsi-wg mailing list
nsi-wg at ogf.org
https://www.ogf.org/mailman/listinfo/nsi-wg

_______________________________________________
nsi-wg mailing list
nsi-wg at ogf.org
https://www.ogf.org/mailman/listinfo/nsi-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/nsi-wg/attachments/20121207/4c659259/attachment-0001.html>