[saga-rg] Comments on Strawman API

Thu Feb 2 05:27:44 CST 2006

Andre & Shantenu,

Thanks for the reply. I have added comments to some of the points below.

It is interesting that several of the issues that I raise are specific 
to Java [and other languages]. I think that it is valuable to tease out 
these distinctions. It would be useful to define where the languages 
bindings may (or are expected to) differ from the SAGA specification.

At present I am [fairly] faithful to the SAGA specification in my 
preliminary Java bindings. I will continue development informed by your 
comments, and attempt to capture any deviation from the spec in a document.

Graeme

>>   -2.1 What is the purpose of NSEntry? This is not documented.
> 
> Right, that should be better documented.
> The purposes are:
> 
>   - NSEntry should be a class, not an interface.  So
>     Physical and Logical files can both be handled as
>     NSEntries.
> 
>   - There will be methods on NSEntry: get_parent, get_name,
>     is_file etc.  

In the spec NSDir is described as implementing NSEntry. Is that also 
correct?

>>   -3.2 There should be a specified exception model for the elements of 
>> the SAGA API that are not provided by an implementation. This is 
>> described in the introduction of the API. May I suggest the SIDL 
>> specification specify all methods that may throw a 'NotImplemented' 
>> exception ('NotImplemented' is more consistent stylistically than 
>> 'NOT_IMPLEMENTED').
> 
> It is difficult to specify which methods can be left unimplemented, as
> that highly depends on the middleware your implementation build upon.
> E.g. middleware one might not have logical files, but middleware two
> might have no support for streams.  Or MW-1 can copy files, but not
> obtain their size (ftp), while others can get size of the file, but
> can't write (http in some configurations).
> 
> So, we say in the Intro that as every implementable call must be
> implemented.  Now, implementable means a lot, of course - what we mean
> (which oissibly should be made more clear), is "every operation
> natively supported by the middleware".  Does that make sense?

I still think that this exception is important. When providing an
implementation the developers will very quickly encounter methods that
cannot be provided. It is necessary to handle this in a sensible 
fashion, particularly when the client is not developing for a specific 
implementation. My thinking on this subject stems from an interest in 
selecting the SAGA implementation at runtime (depending on the resource 
selected by the user).

My naive solution would be to indicate that [almost?] every method may 
throw a 'NotImplemented' exception. When an implementation cannot 
implement a method this exception should be thrown by that method; in 
addition to documenting the fact that the method is not supported.

The developer writing client code against a specific implementation 
would use the documentation to determine which methods are supported. 
Generic client code written against the SAGA API would have to catch and 
  handle the NotImplemented exception in an appropriate manner.

Perhaps this is another Java specific problem. I define the Java 
bindings of the API as interfaces which specify all of the methods which 
an implementation must provide. Therefore the implemented classes will 
end up providing methods that cannot work, at which point an error 
should be thrown.

>>   -3.3 There is no close() method defined for the File interface. The 
>> user should be able to explicitly release a file.
> 
> Right now, a file is cosed on object deletion.  We are aware that e.g.
> in Java, where garbage collection makes that a non-determinable point
> in time, a close method might be necessary.  However, that should go
> to the language binding.
> 
> That statement is somewhat hidden in the namespace section: "
> However, bindings for languages with garbage collection MAY add the
> definition of an explicite close method." -- will make it more
> obvious.  Does that make sense?  Do we need close for all languages?

Yup, I simply added close() to SAGA.File.File.

> 
>>   -3.6 It may be appropriate to define attribute specific get/set 
>> methods for known attributes of interfaces extending SAGA.Attribute 
>> (i.e. Context, JobDefinition, JobExitStatus, JobInfo, Stream, 
>> StreamServer). For example JobExitStatus defines three known attributes:
>>      SAGA_ExitCode
>>      SAGA_Signaled
>>      SAGA_Termsig
> 
> Then we would not need attributes anymore :-)  Indeed, setters/getters
> was discussed as alternative to attributes, in particular because of
> their type safety.
> 
> The counter arguments have been, IIRC, that the number of API calls
> would increase dramatically (moren often than not, we have been told
> that the number of calls is a measurement of how simple an API is.
> Its a stupid argument I think: a SAGA API with a single call would be
> possibe: SAGA_RUN (string operation, string_vector args), but highly
> useless, and certainly NOT simple to use.)
> 
> Also, we felt that initially the list of required attrributes might
> change over time, and we wanted to have some flexibility in that
> respect: obvious attributes might make it into the API as
> setter/getter eventually (get_size in File...). 
> 
> 
>> These convenience methods would expose the semantics of interface and 
>> would provide a typesafe way to handle these pre-define attributes 
>> (which have a specified type). For example;
>>      void getExitCode(out integer exitcode);
>>      void setExitCode(in integer exitcode);
>> Although perhaps setting values should be protected or handled by the 
>> constructor.
>> 	This approach would also reduce the occurrence of runtime errors 
>> resulting from typos in the attribute names (since these would be 
>> detected at compile time). I had several of these whilst playing with my 
>> toy implementation.
> 
> I think the buttom line is that we are aware of the shortcomings of
> the attribute approach, but think its the most simple and flexible way
> right now (but not the safest for sure).
> 
> Also, we added introspection to the interface, that should allow to
> catch some of the cases you list (but not all, and does not make the
> interface simplier, really).

I understand the desire to keep the API simple, however my preference 
would certainly be to extend the API to keep the clients code simple. 
Casting to and from strings is a pain, and any additional complexity in 
the clients code is an opportunity for [unnecessary] bugs to be introduced.

The get and set methods could exist over the Attribute interface to 
provide some structure in addition to the existing flexibility. 
Constructors allow type-safe setting of attributes, without the 
transparent ability to retrieve the value.

> 
>>   -3.9 Could the enumerations be defined as integers in the range 
>> 0-MAXVALUE? This makes it simpler to validate the values, for example 
>> where contextType is defined as:
>>          public final class contextType {
>>              public final static int X509            = 0;
>>              public final static int MyProxy         = 1;
>>              public final static int SSH             = 2;
>>              public final static int Kerberos        = 3;
>>              public final static int UserPass        = 4;
>>              public final static int MAXVALUE        = 5;
>>          }
>>      a supplied integer value could be validated with the code:
>>          if (type>=0 && type<SAGA.contextType.MAXVALUE)
>>      This would require the following enumerations to be altered: 
>> File.openFlags, LogicalFile.openFlags, NameSpace.copyFlags, 
>> NameSpace.linkFlags, NameSpace.makeDirFlags, NameSpace.moveFlags, 
>> NameSpace.removeFlags, Stream.ActivityType.
> 
> Would be nice, BUT: saga::file::MAXVALUE might then refer to either
> the MAXVALUE of OpenMode or ReadMode, with conflicting values.  So at
> least in cpp thats not possible :-(  However, we want to introduce -1
> as Unknown, to simplify initialization.
> 

Ignore my suggestion, if I wanted type-safe enumerations I should have 
used them in the first place.

> 
> 
>>   -3.12 The Session objects should provide some method to allow them to 
>> be serialised/deserialised. This would allow the user to submit jobs, 
>> quit the application and resume at a later date. At the simplest save() 
>> and load() methods could be defined.
 >>
 >> [...]
> 
> We have considered serialization, for more than session (e.g. file,
> JobServer etc), but left it out for now: firstly, we have been unsure
> if we would need to define a serialization scheme as well, and
> secondly we did not have enough use cases for that.  However,
> serialization is something which could potentially get added at a
> later point.

It need not be necessary to specify a serialization scheme with the 
save() and load(), that would be an issue for the implementation.

> 
>>   -3.15 It is not clear to me how the ErrorHandler interface corresponds 
>> to the Java exception handling model. Which errors are intended to be 
>> handled via this interface? Any serious errors I will throw immediately, 
>> in my trial interface the error handler became the repository for crud 
>> exceptions that could otherwise be ignored, which is wasted effort. I 
>> suggest that the ErrorHandler interface is redundant in Java because it 
>> is supplanted by the natural way to handle exceptions in this language. 
>> The ErrorHandler may be appropriate in other languages, however in many 
>> cases I would expected there already to be a tried and tested method for 
>> dealing with errors. Rather than reinvent the wheel I would remove this 
>> section of the API and leave this to the language bindings to handle 
>> errors in the manner appropriate for each language. (see 3.20)
> 
> I think that the error handler interface should not neccessarily be
> implemented in all languages, in particular not if that language has
> exceptions.
> 
> Also, the ErrorHandler conflicts with the task model I think: on two
> async copies of the same file, where is the error attached?  at the
> file?  In which order?  At the task?
> 

This is a point at which the Java bindings may differ from the SAGA 
specification.

>>   -3.17 The replication of the enumerations openDirFlags and openFlags 
>> between the packages SAGA.Namespace, SAGA.File and SAGA.LogicalFile is a 
>> little confusing. I understand that File and LogicalFile extend\override 
>> these enumerations, is there a neat solution to this problem? - Ignore 
>> this otherwise.
> 
> We do not know a good solution to the various enums, which are
> distributed over various name spaces.  In cpp this is somewhat
> painful, really, I guess its worse in C.  Any idea?
> 

In an earlier email you indicated that you had remove the enumerations 
from SAGA.NameSpace, this is sufficient to resolve the problem for me.

>>   -3.18 How should permission denied exceptions be modelled by 
>> SAGA.File.Directory? A new exception type required; 'PermissionDenied'.
> 
> Right, thanks.  A remark: we left permissions mostly out of the spec
> right now, as we don't know which security paradigms should be
> reflected in the API at all (what is a user again?).  We wait for the
> security area to give us input on that...
> 

There are other areas in which authentication and authorisation 
exceptions should be expected.

> 
>>   -3.19 NSDir.list(dir) should return a String array rather than a 
>> String. Also would it not be preferable to have a NSDir.list() method to 
>> return the contents of the directory?
> 
> It does return an array now, thanks.  contents: see above (I think
> what you mean is to return a list of NSEntries?).
> 

No this is my mistake, the specification was correct.

> 
>>   -3.20 I am very dubious about the value of the SAGA.Task API within 
>> the context of high-level languages. Where asynchronous method 
>> invocation exists, supported either within the language (i.e. delegates 
>> in C#) or documented design patterns (i.e. inner classes in Java), I 
>> suggest that the semantics of the language should take precedence over 
>> the semantics of the SAGA API (this also applies the Exception handling 
>> in high-level languages). I am not certain whether asynchronous method 
>> invocation should simply be left to the language bindings, or be 
>> overridden in the language bindings when accepted solutions already 
>> exist. (see 3.15)
> 
> Ah, java speaking ;-)  Beleave me, on other languages you DO want a
> async API, and DON'T want to code threads all the time...
> 
> I am not sure if we should allow some languages not to implement tasks
> - async notification would then be gone as well, and also
> task_containers!  Opinions?
> 

It may be better for the language bindings to specify a preferred 
solution for languages where an existing solution exists.

> 
>>   -3.21 JobService.runJob() returns Job and stdin, stdout, stderr. 
>> However stdin, stdout, stderr are available from Job.getJobInfo()? I 
>> understand that this is a convenience functions but this is a prime 
>> example of the multiple return values problem. Can runJob() simply 
>> return an object Job?
> 
> Yes, but the conveniuence of that convenience call is exactly that it
> simplifies the most common case: running a remote job, and controling
> its STDIO.  Otherwise, we could use submit_job...
> 
> Its the only obvious call with multiple output parameters I think,
> apart from the reads (no chance of change there I guess).  

This is not a big issue for me. I have further comments on the semantics 
of the read() methods that may resolve this problem.

>>   -3.22 I dislike that capitalisation of namespaces in the API, in 
>> particular where the namespace is the same as a class that it contains, 
>> for example;
>> 	SAGA.File.File
>>           SAGA.LogicalFile.LogicalFile
>>           SAGA.Stream.Stream
>>           SAGA.Task.Task
>> This is merely a stylistic point, however it would be clearer is the 
>> namespace elements were changed to lowercase, for example;
>> 	SAGA.file.File
> 
> I think thats a language issue - in CPP we have all name spaces
> lowercase, as that seems the natural way to do it: saga::file
> 
> OTOH, we have _not_ implemented the SIDL packages as name spaces in
> cpp.  Hmm, I am not sure if that should be done, really: as you point
> out, this will usually merily increase nameing conflicts.
> 
> I am not sure about this point though...

I think that I will alter the Java language bindings to meet the 
stylistic expectations of Java users ;)

>>   -5.3 Enumerations are supplied as a class containing constant field 
>> values as public static ints. This is the simplest solution, however a 
>> typesafe solution could be provided if required.
> 
> We are considering to use ints in cpp as well, for various reasons,
> which also would ignore type safety.  So my biased opinion is that
> this is ok ;-)

Hehe, I would use enumerations if I could have them.

> 
>>   -5.4 The API is defined as a collection of interfaces that may be 
>> implemented to provide access to Grid resources. Interfaces are the most 
>> suitable solution. However the inability to describe constructors in a 
>> Java interface is problematic since constructors are occasionally 
>> defined in SIDL specification. There is no way to make implementing 
>> classes honour a specific constructor signature, instead the 
>> documentation for the interface should suggest constructors that should 
>> be provided by any implementing class.
> 
> Most interfaces are classes now in the spec, with the exception of:
> Attribute, ErrorHandler, NSEntry, and NSDir.  In cpp, we found it
> useful to have NSEntry and NSDir as classes as well, so we want
> actually propose to change that in the spec.  Seems to be consistent
> with what you are saying, IIUYC.
> 

I ignored the interface/class definitions in the spec to go for an 
interface only solution.

> 
>>   -5.5 Implementations could be provided for generic classes to prevent 
>> developers from needing to re-implementing, for example; Attribute, 
>> Context. Obviously implementation specific subclasses of these would 
>> have to be defined as 'abstract' classes. Rather I think that an 
>> interface only solution is simpler and more flexible in the long term, 
>> default implementations of generic classes could still be provided.
> 
> See above: the spec talks about classes by now.  Do you think that
> this limits flexibility overly much?  In our opinion, the implementors
> of SAGA have the same flexibility as before, and the users of SAGA
> don't need it, really.  What do you think?

I am not clear about the interface/class distinction in SIDL.

In the Java world interfaces are more *fashionable* than class 
inheritance. There is little to be gained from abstract classes apart 
from tying developers to code defined by the language bindings. It is 
also harder to introduce a bug into a Java interface.

The purpose of the SAGA API is to define an interface. The 
implementation developers are not tied to an implementation of the 
utility classes, and the users should not care.

>>   -5.8 Relative paths are relative to user's home directory, they should 
>> not be relative to the elusive Java current working directory. Therefore 
>> it is necessary to perform a mapping using java.io.File objects (see the 
>> article 'Java Applications and the "Current Directory"' 
>> http://www.devx.com/tips/Tip/13804).
> 
> Well, I am not qualified to say if that is a language issue, or a
> semantic change.  I (as more c and perl programmer) find the absence
> of notion of a working directory disturbing ;-)

This is an implementation issue and not really your problem.

Your reply to your reply to 2.3 is absolutely correct. Jobs and 
directories should both have a concept of CWD, however what these are 
will be entirely resource dependant.