[SAGA-RG] document updates, phone call cadence, and more
Andre Merzky
andre at merzky.net
Wed Apr 28 04:13:55 CDT 2010
Hi all,
Quoting [Andre Merzky] (Apr 04 2010):
> From: Andre Merzky <andre at merzky.net>
> To: Thilo Kielmann <kielmann at cs.vu.nl>
> Cc: SAGA RG <saga-rg at ogf.org>
> Subject: Re: [SAGA-RG] notes from the OGF28 session on 15/03, 16:00-17:30
>
> attached is another revision of the SAGA Core API Experience
> document, which contains changes as discussed at OGF28. I hope the
> changes reflect the discussion points.
I just wanted to let you know that both the advert API extension and
the Core experience document have been submitted to the OGF editor,
and both docs should be entering public comment sometime soon. That
means that the Core API (including errata) is now definitely frozen,
unless the public comments require additional changes. The
submitted documents can be found in
https://svn.cct.lsu.edu/repos/saga-ogf/trunk/documents/saga-package-advert/tags/v1.0rc1/
https://svn.cct.lsu.edu/repos/saga-ogf/trunk/documents/saga-core-experience/tags/v1.0rc1/
https://svn.cct.lsu.edu/repos/saga-ogf/trunk/documents/saga-core/tags/v1.1rc1/
> So, a couple of additional errata from the Naregi group have been
> applied to the Core API - hopefully the last ones. However, there
> remains one item unresolved:
>
> appearently we never considered to add a flush() method to the
> saga::file instance. As is, our API implies that all writes are
> immediately flushed. While that is certainly valid, the question
> remains if we should consider an explicit flush() method, which
> would, amongst others, allow implementations to perform client side
> caching of write operations. Iff that is considered useful, one
> could further discuss if that should be introduced on namespace
> level, so that other namespace derived packages (replica, advert,
> etc) can also benefit from flush(). FWIW, a close() should always
> imply a flush() IMHO.
>
> So, please voice your opinion!
There was not much feedback on this item, so I added it to the list
of open items for SAGA 2.0. As of now, caching behaviour on write
remains undefined, and the safest assumption (for SAGA implementors)
is to always flush after write, even if that is costly in terms of
performance.
That opens the question on when, and if at all, we should start to
discuss a next version of the core API. FWIW, I appen the current
list of open issues to this mail.
We did not have a phone call since OGF28. There are a number of
open TODO items however, and I am not sure that any calls are useful
at that point, beyond iterating that those items need to be dealt
with :-P
So, I suggest to suspend the calls until at least some of these
items are handled:
- CPR package needs to be finalized
- message API examples need to be rendered in different versions,
to come to a conclusion on the general design approach.
- Python bindings need to be shown to be functional on both Java
and C++
- the SAGA rendering of GridRPC.v2 needs to be synced with the
final version of GridRPC.v2
If anybody has other items to discuss, please let me know, and I'll
schedule the calls. Also, the above items are obviously open for
input from all of you, so, please feel free to contribute in any
form.
Finally, the conversion of our CVS repository to SVN is completed.
CCT support did not manage to make the CVS repository ReadOnly, but
please don't commit there anymore. The new SVN url is, as you
probably guessed from above,
https://svn.cct.lsu.edu/repos/saga-ogf/trunk
That repository should be world-readable. Please let me know if you
would like to have write permissions.
Best, Andre.
Current known open issues for SAGA Core v2.0
--------------------------------------------
- file / stream server / rpc could have state (Unknown, New,
Open, Closed).
- task: get_task_description
just like job desc, would give you information about what
the task does, e.g.
- "method" = "copy"
- "args" = "internet.txt" "internet.bak" (vector attrib (type??))
- "started" = "11:35pm 12/22/2006"
- "finished" = "11:35pm 12/22/2007"
inspection would be useful to get type and return type of
task after getting it from a task_container.
- I/O tasks could have a get_buffer() method, to free
application from keeping/tracking I/O buffers. That would
return a shallow copy of the buffer object which was given
as inout parameter. Method would need to be templetized for
the different buffer classes we have in the spec (or limited
to the buffer base class)
- make state transitions less prone to race conditions. E.g.,
allow suspend() also on jobs in Suspend state, and cancel()
on jobs in a final state (state remains the same). Needs
some thought...
- what error is thrown on incorrectly formatted attributes,
and when?
- wait() to also report on other state changes, like
suspend/resume (see DRMAA-II).
- add inspection: job.list_interfaces ()
- monitorable
- attributable
- steerable?
- checkpointable?
to provide seemless integration of extensions, which then
can define additional interfaces for core classes (see cpr).
- add resource assignment to job description, e.g.:
// name: CPUID
// desc: CPU id to assign the process thread to
// mode: ReadWrite, optional
// type: Int
// value: '1'
// notes: - if supported, the process is guaranteed to
// run on the CPU identified by the id.
// - id starts at 1
// - not supported by JSDL, DRMAA.v1
//
// name: CPUCoreID
// desc: CPu core id to assign the process thread to
// mode: ReadWrite, optional
// type: Int
// value: '1'
// notes: - if supported, the process is guaranteed to
// run on the CPU core identified by the id.
// - id starts at 1
// - not supported by JSDL, DRMAA.v1
This could also go into a resource management package,
obviously, together with 'queue' attribute btw (see mailing
list discussion with Sylvain, and discussion about DRMAA.v2.
- session.list_contexts (string type = "");
returns all contexts of that type. Also works on default
session! If no type is given, all contexts are returned.
- trigger metrics should have a value of 0 or 1, to allow
polling for triggers. So, in fact Trigger metrics should be
Boolean.
- we have mtime for ns entries - there is no reason not to
have ctime, or even atime, even if that is not widely
supported. So what.
Now we have to add ctime to the cpr package... Messy.
- properties which are available via get_xyz() and is_abc()
should generally also be expressed as attributes (see
get_size(), get_mtime(), but also is_file() etc)
- attributes and metrics should be unified. Either a metric
IS-A attribute, or even better, callbacks can be added to
attributes - no metric needed anymore.
- file::dir should inherit file::entry *and* ns::dir. Makes
in particular sense for advert and cpr ns derivates, which
then don't need to duplicate methods anymore. Language
bindings may not allow/encourage multiple inheritance, but
it would make the spec (IDL) simpler.
- we are not sticking to SIDL syntax anyway, so probably
should remove references to it, and define our own *blush*.
See attributes, metrics, c'tors, multiple inheritance, etc.
- reconsider to split the core into \LF and API packages :-/
- reconsider file.get_fd(), for example for checkpoint
writing/reading, where apps often have their own native IO
routines. But of course, if they get a saga::fs::file, they
can just close() it, and reopen the location natively...
- file.flush is missing :-( Same for replica etc. Not sure
if it makes sense on the ns::entry though.
--
Nothing is ever easy.
More information about the saga-rg
mailing list