[SAGA-RG] SAGA Message API Extension

Andre Merzky andre at merzky.net
Tue Jan 23 20:04:29 CST 2007


Hi group, 

Werner and I, and separately Hartmut and I, chatted about
the message object, structured and typed data buffers etc.

We would like to propose the following approach:

  - leave the message data buffer unstructured and untyped
  - allow language bindings to add support for packing
    native data types into the buffer (similar to MPI and
    PVM)
  - use this message buffer for the message API, but
    possibly also for streams and files (additionally to the
    char* we support right now)
  - later discuss more structured message buffers on top of
    the unstructured ones
  - even later discuss message buffer with specific data
    models on top of the structured ones (that may be domain
    specific, and outside of the SAGA scope per se - more as
    informational docs or community practice)

Lets pick this up at OGF next week from here...

Best regards, 

  Andre.


Quoting [Werner Benger] (Jan 19 2007):
> To: Andre Merzky <andre at merzky.net>
> Subject: Re: SAGA Message API Extension
> From: Werner Benger <benger at zib.de>
> Cc: Andrei Hutanu <ahutanu at cct.lsu.edu>, John Shalf <JShalf at lbl.gov>,
> 	SAGA RG <saga-rg at ogf.org>,
> 	Gregor von Laszewski <gregor at mcs.anl.gov>
> 
> Hi Andre,
> 
> On Thu, 18 Jan 2007 15:47:34 -0600, Andre Merzky <andre at merzky.net> wrote:
> > Hi Werner,
> >
> > Quoting [Werner Benger] (Jan 18 2007):
> >>
> >> Hi Andre,
> >>
> >>   I have two other remarks, which might be orthogonal to the current
> >> draft, but might still be good to have it mentioned there:
> >>
> >>   * Structured messages:
> >>
> >>   The current draft just talks about transporting an array of bytes,
> >>   but in practice we might want to transfer floats/doubles/ints etc.
> >>   While this *might* be implemented on top of the current msg API,
> >>   this would be a waste if the low-level protocol implementation
> >>   (e.g. MPI) already would support such types (including byte ordering
> >>   conversion). As such, it were useful to have the option to use
> >>   such mechanisms from a low-level protocol if supported. If not,
> >>   then it would need to be taken care of on top of the current level.
> >
> > Ah, good point - that at least needs clarification in the
> > spec!
> >
> > Yes, you are right: the focus on opaque messages is a
> > limitation for many use cases.  OTOH, support for primitive
> > types such such as ints or floats don't by you that much,
> > and for more complex structures... - well, who knows better
> > than you that agreeing on a data model is a reeaaally
> > difficult job? ;-)
> >
> ah, data model issues are certainly shooting too high here.
> however, a self-descriptive typed message structure like it
> is possible in MPI or also HDF5 would do the job in a generic
> way. It doesn't need to do any more than native C allows, ie.
> support native types and arbitrary structures built from them.
> That's just the same level as other low-level protocols might
> already support.
> 
> > So, basically the message API tries to avoid that topic for
> > the main reason that it seems difficult to define.  I would
> > wholeheartly support any activity which tries to define
> > domain or use case specific flavours of the API.  That would
> > be a simple excercise: you would only need to redefine the
> > set_data method on the msg class accordingly.
> >
> A generic self-descriptive of data should be fully independent
>  from user cases or application domains.
> 
> > So, the question is: is a very limited support for primitive
> > data types something (really) useful?
> >
> In practice, you would need it anyway. So it's kind of moving
> operations that the app needs to do anyway into the common
> denominator SAGA. Plus we get the option that these operations
> might be done by a lower-level protocol interface which might
> be more efficient than if done by the application.
> 
> What I would tend to avoid is to use e.g. a protocol, which knows
> about floats and byteordering, just to shuffle bytes, and then
> do the byteordering manually again.
> 
> 
> >>   * Interfacing Event Loops:
> >>
> >>   If we want to use this API from within a larger application instead
> >>   of just self-standing programs, we might want to use mechanisms such
> >>   as socket callbacks for event handling (eg. the QSocketNotifier or
> >>   under X11 using XtAppAddInput). Would be good to have some support
> >>   to allow this, even though it might be optional.
> >
> > Right, thats an important point, in particular for the
> > visualization use cases.  Its actually in the spec, but well
> > hidden :-)  The endpoint class definition says:
> >
> >   class endpoint : implements saga::object
> >                    implements saga::async
> >                    implements saga::monitoring
> >                    [...]
> >saga::async is actually an empty interface, but what that
> > means is that the class will contain several versions of
> > every class method: a synchronous one, and 3 additional
> > ones.  In C++ the rendering would look like:
> >
> >   // connection setup
> >   saga::endpoint ep;
> >   ep.serve ();
> >
> >   // normal, synchronous version
> >   saga::msg m = ep.recv ();
> >
> >   // task version 1: synchronous
> >   saga::task t1 = ep.recv <saga::task::Sync> (msg);
> >  // task version 2: asynchronous
> >   saga::task t2 = ep.recv <saga::task::ASync> (msg);
> >  // task version 3: task
> >   saga::task t3 = ep.recv <saga::task::Task>  (msg);
> >
> > These three versions of the recv method all return a task,
> > which only differs in its state: t1 is Done, t2 is Running,
> > and t3 is New (not yet running).  You can get notification
> > on when a task is Done etc.
> >
> Task means a thread or is this just some saga-internal data
> type?
> 
> >
> > Additionally, the spec defines some metrics on the endpoint,
> > among them:
> >
> >       // Metrics:
> >       //   name:  Message
> >       //   desc:  fires if a message arrives
> >       //   mode:  Read
> >       //   unit:  1
> >       //   type:  String
> >       //   value: ""
> >       //   notes: - the value is the endpoint URL of the
> >       //            sending party, if known.
> >
> > These metrics are used by the monitoring interface, which is
> > also implemented by the endpoint.  With that, you can add
> > callbacks to an endpoint which gets called when a new msg
> > arrives:
> >
> >   saga::endpoint ep;
> >   ep.add_callback ("Message", my_cb);
> >   ep.serve ();
> >
> > my_cb is a user defined class which implements
> > saga::callback, and whose cb() method gets then called on
> > incoming messages.
> >
> >
> > Sorry if that was somewhat lengthy.  Anyway, point is: async
> > ops and notification are covered, by means of the SAGA Core
> > Look&Feel, which is inherited by this API.
> >
> Hm, ok. Good. Maybe you can add some concrete examples in the
> appendix, such as how would interfering with QT really look like?
> 
> 	Werner
> 
> 
> 
> 
> > Cheers, Andre.
> >
> >>
> >> 	Werner
> >>
> >>
> >> On Thu, 18 Jan 2007 14:18:52 -0600, Andre Merzky <andre at merzky.net> wrote:
> >>
> >> > Hi John, Andrei,
> >> >
> >> > you are right: getting some feedback from the transport
> >> > level folx is certainly a good idea.  The API draft won't go
> >> > into public comment for another month or so (at least), and
> >> > then it will stay in public comment for another 2 months or
> >> > longer - that should give us enough time to contact them.
> >> >
> >> > About ordering: the text Andrei cited is in the spec because
> >> > ordering is, as of now, not an attribute of the connection
> >> > or endpoint - so the spec tries to nail it down.  It says
> >> > "MUST be ordered, but no global ordering is required"
> >> > because I thought that this covers the majority of use
> >> > cases.
> >> >
> >> > I don't think there are use cases which require global
> >> > ordering - or at least not enough to justify a requirement
> >> > for global ordering.  What is your opinion?  Also, thats
> >> > really difficult to implement in Grids IMHO.
> >> >
> >> > Use cases which do not require ordering should be happy with
> >> > order preserving connections, too.  Question now is: does
> >> > the benefit of un-ordered implementations (simplier, smaller
> >> > footprint) justify an attribute on API level?  Or are there
> >> > use cases which require non-ordered delivery for other
> >> > reasons?
> >> >
> >> > Cheers, Andre.
> >> >
> >> >
> >> > Quoting [Andrei Hutanu] (Jan 18 2007):
> >> >>
> >> >> Hi,
> >> >>
> >> >> >>
> >> >> >>2) I see ordering is enforced, could that be an option?
> >> >> >
> >> >> >
> >> >> >I think ordering is *not* enforced, but I do wonder if it should be
> >> >> >an option or a channel property (certainly semireliable will likely
> >> >> >result in some reording whereas a TCP channel would enforce ordering
> >> >> >of the messages for instance).
> >> >> >
> >> >> >This is a controversial topic in the HPC message passing community
> >> >> >(whether msg. ordering is a good or bad-thing to enforce in at the
> >> >> >hardware level).
> >> >> >
> >> >> I was thinking the same (no strong feelings for either option or
> >> >> property) but the text tells otherwise :
> >> >> In 2.1 introduction :
> >> >> In contrast, this message API extension guarantees that message blocks
> >> >> of arbitrary size are delivered in order, and intact, without the need
> >> >> for additional application level coordination or synchronization.
> >> >> and
> >> >>
> >> >> then in 2.1.7 reliability corectness and ordering
> >> >> The order of sent messages MUST be preserved by the implementation.
> >> >> Global ordering is, however, not guaranteed to be preserved:
> >> >>
> >> >> Assume three endpoints A, B and C, all connected to each other. If A
> >> >> sends two messages [a1, a2], in this order, it is guaranteed that both B
> >> >> and C receive the messages in this order [a1, a2]. If, however, A sends
> >> >> a message [a1] and then B sends a message [b1], C may receive the
> >> >> messages in either order, [a1, b1] or [b1, a1].
> >> >>
> >> >> Andrei
-- 
"So much time, so little to do..."  -- Garfield



More information about the saga-rg mailing list