[Nsi-wg] Message Delivery Layer

Fri Dec 7 07:37:52 EST 2012

Hi,

What is so incredibly different or outlandish about the transportation of NSI messages that we need another layer? IMO NSI has a couple of generic requirements, being that messages should be transported reliably, that they should be able to be sent securely, and interoperably.

We've basically already settled with SOAP and WSDL, which fits the requirements. So I don't see a reason for defining yet another layer.

Jeroen.

On 7 Dec 2012, at 13:21, Jerry Sobieski <jerry at nordu.net> wrote:

> My suggestion is to formalize the Message Transport Layer:
> 
> The MTL is responsible for taking a message from the NSI protocol layer, placing it in a transmit queue, and ensuring that: a) the msg gets sent, or b) there is an indication that the message cannot be sent.
> 
> The NSI protocol layer invokes this MTL_Send() function by presenting 1) the NSI protocol message, 2) the *NSA ID* destination (not a web service endpoint), 3) a callback for normal completion, 4) a timeout value, and 5) a callback for timeout error.
> 
> The MTL places the message on a queue for the indicated target NSA, and transmits the messages in a FIFO order.  Thus there is a different queue for each destination NSA.
> 
> Whenever a message is queued for transmission to a destination NSA, the MTL will check to see if there is an open TCP/SSL session to that destination NSA.   If there is, then the message is transmitted on that session using the exchange below.  If there is no active session, the MTL will try to find the appropriate addressing information for that target NSA and open a session to it.  If successful, the local MTL will proceed with the transmission.  If the session is not established(e.g. no target address info is available, or the dest is unreachable, or behind a FW, etc.) the local sending MTL leaves the message on the send queue and waits a period before retrying the session establishment.
> 
> The session can also be established by the remote MTL.  When a TCP/SSL session is received from a remote agent, the MTLs exchange NSA identifiers and the session is bound to the appropriate NSA send queue.  (Sessions are bi-directional.)
> 
> Each NSI protocol message is exchanged by MTLs using the following sequence:
>    Source NSA                    Dest NSA
>    MsgXmit(msg, msgid)->                               Sends the entire message along with a message id/seq number
>                                     <- MsgRecv(msgid) Responds that the message id has been received, stored, and queued
>    MsgRecvAck(msgid)   -> Acknowledges the completion of the transaction.
> 
> This exchange conclusively indicates to both NSAs that the messages has been moved from the local sending system to the destination system where it has been saved in a persistent store and queued for processing by the destination NSA.
> 
> Upon successful transmission, the message is removed from the send queue, the timer is canceled, and the successful send callback is invoked.
> 
> The MTL will continue to try to send the message at the front of the send queue until the timeout expires.   When the timeout expires, the message is removed from the queue, and the timeout callback is invoked.   Other messages destined for that NSA may exist in the queue as well and are blocked.   Their timeouts may also expire and are treated similarly.   So any message who's timer expires will be removed from the queue and the timeout callback invoked.
> 
> For sending a message, this process allows the protocol layer to simply earmark a message for a destination NSA and only be informed of success or failure.  The NSI protocol layer does not care why a message was unable to be sent, just that it did not make it after a pre-specified time of trying to do so.   The MTL can log attempts or perform other actions so that more detailed forensic information is available or actions can be taken, but these other actions are not NSI protocol layer functions.
> 
> Further, this MTL mechanism can take advantage of proven protocols such as TCP to assure delivery of the MTL message exchanges - substantially simplifying the MTL and minimizing redundant functionality.  However, TCP or SSL or HTTP/S will not do everything - the MTLs are responsible for managing the NSA send queues (note the NSI layer may be mutlithreaded), session establishment/retries, store management, timeout processing, etc.
> 
> Similar processing is done by the MTL upon receiving a message.  The received message is timestamped and entered into a persistent store to enable recovery should the receiving host or process be interrupted.  After the message is successfully stored, the message is placed on the input event queue of the local NSA protocol layer. Then the MsgRecv'd message is sent to the source MTL.
> 
> Note:  There are two timers of interest here: the transmission timeout value mentioned above, and the NSI protocol response timeout value that dictates how long the local NSA is willing to wait for a protocol response from the remote NSA.    The NSI protocol layer will place a message on the send queue with a transmission timeout. But the NSI layer is actually only concerned with whether the protocol primitive was acted upon by the remote NSA withn a specific timeframe - enter the /response/ timer.  If there is no response from the remote NSA, then the local response timer expires and the local NSA has to recover.   Then and only then does the *protocol* layer need to recover.   And knowing if the remote NSA ever got the message is an important piece of info in determining how to recover from the protocol response timeout.    Thus, if the transmission timeout exceeds the response timeout, the NSI layer may timeout before the MTL has given up trying to send the message. Else, if the transmission timeout is small compared to the response timer, the send timeout may occur too fast - not allowing the remote system enough time to establish a session before the sent message is timed out.   So, it is suggested that the response timer should be set in sequence with the transmission timer - not overlapping.  Thus the response timeout would not begin until the transmission has succeeded.   However, issues such as slow session establishment can still impact upstream response timers in the service tree.   This remains an issue of concern.
> 
> Thoughts?
> Jerry
> 
> 
> On 12/7/12 5:04 AM, Jeroen van der Ham wrote:
>> Hi,
>> 
>> The problem that I am trying to solve is the situation where the client is possibly behind a firewall/NAT/whatever, where the client is the only one capable of setting up a bidirectional TCP session.
>> 
>> Right now the NSI protocol breaks in that situation, because it insists on sending the acknowedgements through a separate channel that is independently setup by the server back to the client.
>> 
>> In my opinion the simplest way to solve that is indeed to make the callback optional and allow clients to poll for updates. The reserveConfirmed may or may not be sent. But getting it failed to deliver would not trigger a fault.
>> 
>> Jeroen.
>> 
>> On 7 Dec 2012, at 10:26, Henrik Thostrup Jensen <htj at nordu.net> wrote:
>> 
>>> On Thu, 6 Dec 2012, Jeroen van der Ham wrote:
>>> 
>>>> A feasible first step in my opinion would be to make acknowledgements optional. Combined with a good query interface this already makes it possible to operate a client NSA behind a firewall.
>>> Are you suggesting to make callback optional and resolve to polling for updates? I.e., never getting a reserveConfirmed message from a reserve request.
>>> 
>>> (btw. I think polling is completely acceptable - most people who have build a distributed system with events are painfully aware that they sometime disappear and that one will have to resolve to polling as a fallback).
>>> 
>>> OR:
>>> 
>>> Thowing out the callback scheme, i.e., getting a reserveConfirmed as the direct reply to a reseve request. This will mean some potential long-standing requests (not that it is a problem), probabaly some minutes.
>>> 
>>> This can also be optional (replyTo -> yes to callback, no replyTo -> direct reply). I'd prefer not to have this dual behavior due to implementation complexity.
>>> 
>>> --
>>> 
>>> Originally we chose to have the callbacks as some of the commands could take a very long time to complete. I think this was especially for provision, which would not trigger until the link came up (could be weeks), however with the notification mechanism in NSI2, that is no longer the case (provisionConfirmed now indicates that all NSAs have acknowledged the provision request).
>>> 
>>> If we are willing to handle request delays of a couple of minutes (most will be faster), we could forgo the callbacks for requests, and only deal with callbacks for notifications like active and forcedEnd.
>>> 
>>> 
>>>    Best regards, Henrik
>>> 
>>> Henrik Thostrup Jensen <htj at nordu.net>
>>> Software Developer, NORDUnet
>>> 
>>> _______________________________________________
>>> nsi-wg mailing list
>>> nsi-wg at ogf.org
>>> https://www.ogf.org/mailman/listinfo/nsi-wg
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> https://www.ogf.org/mailman/listinfo/nsi-wg
>