[saga-rg] Fwd (hupfeld at zib.de): Re: SAGA Strawman API Version 1.0

Wed Jun 15 08:54:19 CDT 2005

>> <snip>
>> If SAGA choses to give single-system like guarantees, this must be
>> explicititely stated. All interfaces that deal with data are unusable
>> without a
>> specification of consistency guarantees.

I don't believe that it is possible, or even desirable, to try to  
make distributed systems look like they are not distributed.  For  
example, I don't think you should provide POSIX behaviour on a  
distributed filesystem.  If you look at AFS, it doesn't fit the POSIX  
model.  Most people write code that ignores what the filesystem might  
be, and assume POSIX.  How many people check the failure status on a  
file close?  With AFS, you can get "Host not found" when you do a  
file close.  You can wait, and try again.  If you quit, your changes  
are lost.  (As a library writer, you can try and "squash" the errors  
by putting a clever layer of code between the app and the filesystem  
that know tricks like this.  The Condor people do this, I seem to  
recall.)

The point here isn't that developers should never assume a POSIX  
filesystem, it is that they should know what kind of filesystem they  
are dealing with, so that they can write appropriate code.  When you  
go distributed, there are a whole new set of error conditions that  
can occur.  I don't think that there is anything to be gained from  
pretending that remote objects are the same as local objects, so that  
people's code can stay the same.  If the code doesn't know it's  
dealing with something that is remote, rather than local, then at  
best (i.e. if there is lots of error checking) it will fail far more  
often.  Probably though, it won't be robust.

It might be worth looking at the following paper, which says  
eloquently what I'm grasping for.

"A note on distributed computing" by Jim Waldo et al., available  
from: http://research.sun.com/techrep/1994/abstract-29.html

Cheers,

Jon.