[Nsi-wg] Message Delivery Layer

Fri Dec 7 07:21:56 EST 2012

My suggestion is to formalize the Message Transport Layer:

The MTL is responsible for taking a message from the NSI protocol layer, 
placing it in a transmit queue, and ensuring that: a) the msg gets sent, 
or b) there is an indication that the message cannot be sent.

The NSI protocol layer invokes this MTL_Send() function by presenting 1) 
the NSI protocol message, 2) the *NSA ID* destination (not a web service 
endpoint), 3) a callback for normal completion, 4) a timeout value, and 
5) a callback for timeout error.

The MTL places the message on a queue for the indicated target NSA, and 
transmits the messages in a FIFO order.  Thus there is a different queue 
for each destination NSA.

Whenever a message is queued for transmission to a destination NSA, the 
MTL will check to see if there is an open TCP/SSL session to that 
destination NSA.   If there is, then the message is transmitted on that 
session using the exchange below.  If there is no active session, the 
MTL will try to find the appropriate addressing information for that 
target NSA and open a session to it.  If successful, the local MTL will 
proceed with the transmission.  If the session is not established(e.g. 
no target address info is available, or the dest is unreachable, or 
behind a FW, etc.) the local sending MTL leaves the message on the send 
queue and waits a period before retrying the session establishment.

The session can also be established by the remote MTL.  When a TCP/SSL 
session is received from a remote agent, the MTLs exchange NSA 
identifiers and the session is bound to the appropriate NSA send queue.  
(Sessions are bi-directional.)

Each NSI protocol message is exchanged by MTLs using the following sequence:
     Source NSA                    Dest NSA
     MsgXmit(msg, msgid)->                               Sends the 
entire message along with a message id/seq number
                                      <- MsgRecv(msgid) Responds that 
the message id has been received, stored, and queued
     MsgRecvAck(msgid)   -> Acknowledges the completion of the transaction.

This exchange conclusively indicates to both NSAs that the messages has 
been moved from the local sending system to the destination system where 
it has been saved in a persistent store and queued for processing by the 
destination NSA.

Upon successful transmission, the message is removed from the send 
queue, the timer is canceled, and the successful send callback is invoked.

The MTL will continue to try to send the message at the front of the 
send queue until the timeout expires.   When the timeout expires, the 
message is removed from the queue, and the timeout callback is invoked. 
   Other messages destined for that NSA may exist in the queue as well 
and are blocked.   Their timeouts may also expire and are treated 
similarly.   So any message who's timer expires will be removed from the 
queue and the timeout callback invoked.

For sending a message, this process allows the protocol layer to simply 
earmark a message for a destination NSA and only be informed of success 
or failure.  The NSI protocol layer does not care why a message was 
unable to be sent, just that it did not make it after a pre-specified 
time of trying to do so.   The MTL can log attempts or perform other 
actions so that more detailed forensic information is available or 
actions can be taken, but these other actions are not NSI protocol layer 
functions.

Further, this MTL mechanism can take advantage of proven protocols such 
as TCP to assure delivery of the MTL message exchanges - substantially 
simplifying the MTL and minimizing redundant functionality.  However, 
TCP or SSL or HTTP/S will not do everything - the MTLs are responsible 
for managing the NSA send queues (note the NSI layer may be 
mutlithreaded), session establishment/retries, store management, timeout 
processing, etc.

Similar processing is done by the MTL upon receiving a message.  The 
received message is timestamped and entered into a persistent store to 
enable recovery should the receiving host or process be interrupted.  
After the message is successfully stored, the message is placed on the 
input event queue of the local NSA protocol layer. Then the MsgRecv'd 
message is sent to the source MTL.

Note:  There are two timers of interest here: the transmission timeout 
value mentioned above, and the NSI protocol response timeout value that 
dictates how long the local NSA is willing to wait for a protocol 
response from the remote NSA.    The NSI protocol layer will place a 
message on the send queue with a transmission timeout. But the NSI layer 
is actually only concerned with whether the protocol primitive was acted 
upon by the remote NSA withn a specific timeframe - enter the /response/ 
timer.  If there is no response from the remote NSA, then the local 
response timer expires and the local NSA has to recover.   Then and only 
then does the *protocol* layer need to recover.   And knowing if the 
remote NSA ever got the message is an important piece of info in 
determining how to recover from the protocol response timeout.    Thus, 
if the transmission timeout exceeds the response timeout, the NSI layer 
may timeout before the MTL has given up trying to send the message. 
Else, if the transmission timeout is small compared to the response 
timer, the send timeout may occur too fast - not allowing the remote 
system enough time to establish a session before the sent message is 
timed out.   So, it is suggested that the response timer should be set 
in sequence with the transmission timer - not overlapping.  Thus the 
response timeout would not begin until the transmission has succeeded.   
However, issues such as slow session establishment can still impact 
upstream response timers in the service tree.   This remains an issue of 
concern.

Thoughts?
Jerry

On 12/7/12 5:04 AM, Jeroen van der Ham wrote:
> Hi,
>
> The problem that I am trying to solve is the situation where the client is possibly behind a firewall/NAT/whatever, where the client is the only one capable of setting up a bidirectional TCP session.
>
> Right now the NSI protocol breaks in that situation, because it insists on sending the acknowedgements through a separate channel that is independently setup by the server back to the client.
>
> In my opinion the simplest way to solve that is indeed to make the callback optional and allow clients to poll for updates. The reserveConfirmed may or may not be sent. But getting it failed to deliver would not trigger a fault.
>
> Jeroen.
>
> On 7 Dec 2012, at 10:26, Henrik Thostrup Jensen <htj at nordu.net> wrote:
>
>> On Thu, 6 Dec 2012, Jeroen van der Ham wrote:
>>
>>> A feasible first step in my opinion would be to make acknowledgements optional. Combined with a good query interface this already makes it possible to operate a client NSA behind a firewall.
>> Are you suggesting to make callback optional and resolve to polling for updates? I.e., never getting a reserveConfirmed message from a reserve request.
>>
>> (btw. I think polling is completely acceptable - most people who have build a distributed system with events are painfully aware that they sometime disappear and that one will have to resolve to polling as a fallback).
>>
>> OR:
>>
>> Thowing out the callback scheme, i.e., getting a reserveConfirmed as the direct reply to a reseve request. This will mean some potential long-standing requests (not that it is a problem), probabaly some minutes.
>>
>> This can also be optional (replyTo -> yes to callback, no replyTo -> direct reply). I'd prefer not to have this dual behavior due to implementation complexity.
>>
>> --
>>
>> Originally we chose to have the callbacks as some of the commands could take a very long time to complete. I think this was especially for provision, which would not trigger until the link came up (could be weeks), however with the notification mechanism in NSI2, that is no longer the case (provisionConfirmed now indicates that all NSAs have acknowledged the provision request).
>>
>> If we are willing to handle request delays of a couple of minutes (most will be faster), we could forgo the callbacks for requests, and only deal with callbacks for notifications like active and forcedEnd.
>>
>>
>>     Best regards, Henrik
>>
>> Henrik Thostrup Jensen <htj at nordu.net>
>> Software Developer, NORDUnet
>>
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> https://www.ogf.org/mailman/listinfo/nsi-wg
> _______________________________________________
> nsi-wg mailing list
> nsi-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/nsi-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/nsi-wg/attachments/20121207/26c02d75/attachment-0001.html>