[SAGA-RG] SAGA Message API Extension

Mon Jan 22 01:11:08 CST 2007

Andre Merzky wrote:
> Quoting [Pascal Kleijer] (Jan 18 2007):
>> From: Pascal Kleijer <k-pasukaru at ap.jp.nec.com>
>> To: Andre Merzky <andre at merzky.net>, SAGA RG <saga-rg at ogf.org>
>> CC: John Shalf <JShalf at lbl.gov>, Werner Benger <benger at zib.de>
>> Subject: Re: [SAGA-RG] SAGA Message API Extension
>>
>> Hello all,
>>
>> I jump into the conversation a bit later then expected. It seems that 
>> Andre always choose a time in space-time that is inconvenient for me :p
>>
>> Here is some food:
>>
>> A. Typos
>> I cannot access the CVS so I post some of my stuff here.
>> 1. Copyright at 2006? Should it not be 2007? See page 1 and 23
>> 2. Most notes of the detailed spec should be "Notes: - see notes *on* 
>> memory management."
>> 3. Section 2.1, "[ibidem]" ???
>> 4. Section 2.1.3, paragraph 2 "A message sent by *an* endpoint"
>> 5. Page 17 "Format: serve" instead of " Format: connect"
> 
> Perfect, thanks, will fix.  
> I'll check the CVS access - I seem to remember that you
> should have a login...

Well I have a login, the problem is that I am behind a proxy gate, so I 
can only access CVS if it is configured to support web access (HTTP or 
HTTPS). This is not the case I think.

> ibidem: basically means: same as last reference.  I wanted
> to avoid referenceing the core spec again and again.  Not
> sure if this is commonly used though...

I am not so familiar with this one, I tend to directly add the reference 
again. If the reference is just a paragraph away I don't add the 
reference at all.

> 
>> There might be more, but this just catched my eyes.
> 
> 
>> B. API
>> 1. I would called the method "receive()" and not "recv()", in section 
>> 2.1.3 you have it right ;)
> 
> Yes, I had 'receive' first, but then the examples don't line
> up so well:
> 
>   ep.send (msg);
>   ep.recv (msg);
> 
> Isn't that nice? :-P  Also, send/recv is the same as the
> POSIX send/recv.

POSIX was made by C hacker, so no need to emulate them and do the things 
properly from start. Obfuscated code (http://www.ioccc.org/) should not 
be part of "Simple API". Sorry to hammer on this one, but I tend to 
correct my programmers all the time on that: Portability, Clarity, 
Simplicity, Re-usability, etc.

As for making long names, who cares, I never write a full name, my IDE
takes care of selecting the right method after 2 or 4 characters types.

> But you may probably right: verbosity may be better here.
> Lets flip a coin at the OGF session (i.e. lets have a 30 min
> deiscussion and a random vote about it)? :-)  I'm fine
> either way, really...

Not really verbosity IMO. Just clarity for a reader that enters the spec 
for the first time.

> 
> 
>> 2. The "test()" method should be called "available()" because it sounds 
>> more like test code then API code. The method should be non-blocking and 
>> return -1 if no message is available. The timeout should be ignored IMO.
> 
> Agree, test is misleading.  But possibly 'check'?  (shorter
> but also explicit)?
> 
> Reason for timeout: see later.  That allows to have test
> blocking... 

Hmm, "check" is to vague. You need a method that looks for a possible 
message and returns information on the message. Basically it returns two 
values: a boolean to tell their is something and a message size.

>> 3. Abusive usage of the NotImplemented exception. Most method are 
>> mandatory anyway, clean out the spec would be nice.
> 
> NotImplemented is on all calls, because the whole package
> might be NotImplemented.  We changes that in the Core spec
> too, lately, and list exceptions like NotImplemented and
> NoSuccess basically everywhere...
> 
> The statement from the Core spec intro still holds though:
> 
>   "The NotImplemented exception MUST, however, be used only
>   in necessary cases, for example if an underlying Grid
>   middleware does not provide some capability, and if this
>   capability can also not be emulated.  The implementa- tion
>   MUST carefully document and motivate the use of the
>   NotImplemented exception."
> 

OK, then some verbatim in the document is necessary. Just telling ppl to 
read another document will not help. I didn't read the Core API spec 
document for sometime so I forgot about that point.

>> 4. Metric RemoteDisConnect should be RemoteDisconnect.
> 
> Agree.
> 
> 
>> 5. The "msg" class should be called "message"
> 
> kind of agree - I like the shorter, but again you are
> probably right, and verbosity is better.
> 
> 
>> Andre, you are to C (hacker) centric :p Make it simple for use folx with 
>> full names.
> 
> I actually wonder that nobody complaend about 'endpoint' -
> that is what I have some issues with, because its so generic
> - who knows what other communication schemes we are going to
> add, which also have endpoints.  'message_endpoint' would be
> much better, but is too long really.  Any suggestion?
> msg_ep? har har har :-P  

Well I was ready to tell something, but other matter were more 
important. Also in WS we have EPR, which is basically a pointer 
container and not a manager as here. We can think about MessageBus :)

>> C. Others
>> Here several more complex things.
>>
>> 1. State machine
>> Why do you not include the failed states? Depending on how the message 
>> API is handled the failure might be permanent and the object remains in 
>> the given state. We can imaging that the dropped can be recovered for 
>> some time: You have two attributes attempts and delay, the object is in 
>> dropped state until it can reconnect, if the number of attempts are 
>> consumed the state is permanent, otherwise it will sleep (delay) for 
>> some time and try a new connection.
> 
> Very good point!  We had Failed (and Dropped) earlier.  The
> problem is, when do you enter Failed?  Assume you have an
> endpoint with three open connections, and a send() succeeds
> to send on two connections, but fails on the third.  If the
> EP is 'Unrealiable', thats not an error anyway.  if
> communication is 'Reliable', its certainly an error, and
> should get flagged - but should the two good sessions be
> dropped as well (which would happen when we go to Failed I
> assume, otherwise the state is meaningless)?
> 
> States could be assigned to the connections as well, but we
> don't have any handle on them as of now, and introducing
> those would REALLY complicate the API (I tried).
> 
> Any idea?

I think the whole issues is due to the support of multi cast/message 
bus. If one end point would handle a single point we would make the API 
easier. We can always add an aggregation class that supports more then 
one endpoint based on single end-points. In that case we would simplify 
the API and allow extensions.

Now the "serve" method should be thrown out of the end point and 
delegated to a endpoint factory. In case of a TCP/IP implementation this 
could be a ServerSocket returning Sockets.

>> I have this problem with the NAREGI project, for the moment I only allow 
>> one more attempt when the connection is dropped or fails (time out, 
>> others). After that a fatal exception is thrown.
> 
> Ugh, that SHOULD be hidden in the implementation I think, no
> need to expose that on API level IMHO.  Or?  

It is hidden now in the implementation, but I want to make some of the 
controls visible for advanced tweaking of the system. By default it 
would remain as is.

>> This might be an optional implementation but will ensure resilient 
>> implementation. If the failure is permanent it must be handled at 
>> application level. Of course the two attribute can be modified anytime 
>> during the life time of the object.
> 
> Yes, I agree, it might help to stabilize the implementation
> on protocol/stream level, but what can the application do
> with that information, really?  drop the connection and
> reconnect?  That would mean to close the endpoint and set up
> the whole thing again.  Much better to handle reconnect on
> implementation level, and tell the application when that
> fails -- the app then can still do the same (close/restart).

The application might unable to handle the problem, it must then 
transfer the problem to the user. In the NAREGI project we do it, in 
case of failure and the application can not recover, the application 
tells the user of the problem. The user has then to decide what to do.

>> 2. Silently Discarded & Reliability
>> In section 2.1.2 second paragraph! Gasp no, in a (un)reliable system 
>> this cannot be! You should at least return a flag with the "send()" in 
>> synchronous connection telling the message was send, ev. acked. In 
>> asynchronous it will be a monitorable field, like a ticked for the 
>> "send()" call or the task that does the job.
> 
> Hmm....
> 
> 
>> Whatever Reliability, Correctness and Order is applied there should be a 
>> way to tell the application, ev. Optional implementation, that the 
>> message was send. If the underlying API does not support reliability the 
>> application can implement one. In a reliable system anyway this should 
>> not happen.
> 
> IMHO, if the application needs reliability, it sjould use a
> realible implementation of the message API - the idea is to
> avoid application level management of the message
> transpoint.
> 
> Anyway, I think you make a good point, an ACK  might be
> useful for more than app level reliability.  But what do you
> return?  Again, assume an EP is connected to three clients,
> mode is unreliable, and two messages get through - what do
> you report?  0.6666? ;-)  2?  Is of no use if the app has to
> check how many clients are connected - that can not be done
> atomically, as its a separate operation to the send
> (get_receivers()).  So, return true if at least one message
> got through?  What is the use of that?  If its not reliable,
> id does not matter anyway.  If its reliable, an error will
> be flagged anyway if only one client fails...
> 
> So: you may be right, but I think the semantics is
> non-trivial.

See above my response about pooled endpoint connections.

> 
> 
>> 3. Correctness
>> I would rather see this as an attribute you can set. If the message is 
>> received I am not always interested in knowing if the content is 
>> correct. Just getting it is enough. The overhead for correctness depends 
>> on the implementation but in most cases this means extra CPU usages that 
>> I don?t want. I can think about MPEG streams, if we loose some part or 
>> some is corrupted, I don?t care; we go to the next frame. It would be 
>> even more powerful if you can set correctness for a given direction 
>> specifically.
> 
> Good point.  We are getting more and more flags though...
> Anyway, I'll add it, but we should discuss if we can
> collapse the (now four) options to the EP constructor.
> :-(

Hmm, not necessary to add into the constructor. It might be a attribute 
set mechanism. Only when you start to serve the endpoint will it tell 
the user that it cannot if some features are not supported.

> 
> 
>> 4. Order
>> Same as correctness. In most cases I don?t care order. I sometimes 
>> prefer have control over it on at the application level. This 
>> dramatically simplifies the underlying implementation and to have it at 
>> application level can be rather easy. In the NAREGI project I have such 
>> mechanism. I don?t care about the order for 95% of the messages, just 
>> some must be ordered. It would be even more powerful if you can set 
>> correctness for a given direction specifically.
> 
> That is in the spec by now.
> 
> 
>> 5. Opaque messages
>> In the NAREGI project we have already implement a messaging API. 
>> Actually we have several, (Socket, File, SOAP, and HTTP), and all use 
>> the same message container. Now the API for each implementation isn?t 
>> unified yet. We do use opaque messages content because when an 
>> application component wants to send a piece of something it just drops 
>> its content. That can be any object. When the message moves down to the 
>> messaging API it passes through filters that will transform it into byte 
>> arrays. The same is true for incoming messages. Filters will transform 
>> the content when it moves up in the application.
>>
>> The message class has an extra method called "getBytes()" which ensures 
>> to return a byte array. If the content cannot be serialized to a byte 
>> array, then nothing happen and null is returned. The "getSize()" will 
>> return the size of usable block of the array. The "getData()" might 
>> return an array but is not guarantied. In addition the message container 
>> has an ID flag to hint the receiving application in how the filters must 
>> be applied. It has also other attributes but of no interest so far for SAGA.
> 
> See answer to Werners comments:  I certainly agree that
> support for data typing and conversion is handy - I only
> doubt that its easy to agree on a model here.  I am not sure
> if you share Werners opinion that data type support should
> be independent from a data model - do you?
> 
> My (personal) opinion still is that msg (or message ;-)
> should be a byte buffer only, moving all type conversion,
> packaging etc. on application level, and to later work on
> other messages with better data support which can then
> easily be mapped onto the original (unstructured) message.
> That way we can go in more than one direction in the
> future, and have something to start with very fast.
> 
> Lets discuss this at OGF again - I am very open to other
> options.  But I really would like to keep the message API
> simple, for now at least.

Indeed, this can be handled by filters on application level. For the 
moment in the NAREGI project the message itself can handle raw byte 
arrays and strings (can be translated in bytes directly). Other formats 
are up to the filters you add above the stack.

>> 6. Memory Management
>> In general for the sending part I totally agree. The management is up to 
>> the application. Now for the receiving part I am not entirely satisfied. 
>> When the management is done by the implementation I am fine, there is 
>> not much to say. But in the case if the application management I have 
>> issues: manly concerning efficient memory usage of the message buckets. 
>> With message intensive applications the message container should be 
>> handled by the application to avoid memory burns/leaks. This is the case 
>> for the NAREI project.
>>
>> For example, the application creates a pool of buckets (messages) of 
>> lets say 512 bytes each. +90% of the incoming messages are always 
>> smaller then 512, so we have no trouble with the buckets since the array 
>> capacity is always larger then the size of the message. Now what happen 
>> when we have a message larger and we didn?t do the available test 
>> because we want to block the current thread with the received? In the 
>> current proposal my message gets truncated. That is not good. In the 
>> current implementation the implementation will change the array by 
>> reallocating the array.
>>
>> The message container has two fields: size of the actual message, and 
>> capacity of the current array buffer. In the next version of the API for 
>> NAREGI I will add offset in case we want to use larger array blocks, 
>> typically from network buffers and allow sub blocks to be used. In C 
>> implementation this is not necessary because you can just shift the 
>> pointer, Java cannot.
> 
> Do I understand that correctly?
> 
>   - application allocates 512 bytes
>   - message arrives, and is larger than 512 bytes
>   - the implementation re-allocates the memory
>   - the application later frees the memory
> 
> I am sure that works, but I would be hesitant to introduce
> semantics which distributes memory management in different
> layers - I would think that this can give serious trouble
> for several languages.
> 
> Also, I would expect you to do a blocking test() (check())
> to wait for the message?  (see earlier)

Yeap I see the point now. I am also too Java oriented with this nice GC 
doing the work for you.

> 
> 
>> 7. Serve time out and shutdown
>> I think that we should allow as optional implementation or usage to have 
>> the server method time out. This will either stop the internal thread or 
>> for example with a TCP/IP messaging solution to set the SO_TIMEOUT flag. 
>> We need this in order to check what is happening with processes. In some 
>> cases the job submission tool reports the process running but no IO 
>> connection is active, a time out might help us release the hook and do 
>> something different before making another attempt.
> 
> Serve by now allows an int parameter to specify for how many
> clients it should serve (i.e. wait for).  That is close to
> what you say I think, but not specifying a timeout in
> seconds, but an availability for <n> clients.
> 
> Also, I certainly agree that more sophisticated management
> of both the serve() and the individual connections is
> possible - it would, however, imply a more complicated API.
> 
> Having said that: yes, a timeout would be possible on send.
> 
> I could not find SO_TIMEOUT? (I am offline though, will
> recheck...)
> 
> 
>> We should also be able to properly stop the service with a "disconnect" 
>> or "shutdown", in the case the close doesn?t do it. That means that the 
>> server must stop and all the connection handled by the class must be 
>> terminated. That is important for us to free the underlying memory and 
>> objects allocated. Typically we should be able to close the server 
>> sockets in a proper way.
> 
> close is supposed to do exactly that:
>   
>   - close
>     Purpose: close the endpoint, and release all
>              resources
> 
>> 8. Endpoint reuse
>> The current semantic does not allow reusing the same object after a 
>> close. It might be handy to support a reconnect with an open if the 
>> implementation does not support reconnection. In case the application 
>> must handle stable connection it must create a new object at each time, 
>> which might be a memory burden. If the URL is not modified and the 
>> states are correct I don?t see why we cannot open a connection again 
>> after a close.
> 
> I am not sure about that.  You would not have a final state
> anymore.  Not sure if that is a problem.  Also, as close
> released all resources anyway, an open() would imply a
> similar overhead to a new object creation, wouldn't it?
> (minus allocating the object itself though).
> 
> 
>> OK That is all for now. I might have other things popping
>> up.
> 
> Thanks a lot!  I'm sure we can converge on all these points.
> It seems we have to take care of what use cases we want to
> cover really, that seems the main point in your mail, and in
> the other comments.
> 
> Cheers, Andre.
> 
> 
> -- "So much time, so little to do..."  -- Garfield
> 

-- 

Best regards,
Pascal Kleijer

----------------------------------------------------------------
   HPC Marketing Promotion Division, NEC Corporation
   1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan.
   Tel: +81-(0)42/333.6389       Fax: +81-(0)42/333.6382
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4385 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20070122/9f313bae/attachment-0003.bin