[saga-rg] Fwd (hupfeld at zib.de): Re: SAGA Strawman API Version 1.0
Jon MacLaren
maclaren at cct.lsu.edu
Wed Jun 15 08:54:19 CDT 2005
>> <snip>
>> If SAGA choses to give single-system like guarantees, this must be
>> explicititely stated. All interfaces that deal with data are unusable
>> without a
>> specification of consistency guarantees.
I don't believe that it is possible, or even desirable, to try to
make distributed systems look like they are not distributed. For
example, I don't think you should provide POSIX behaviour on a
distributed filesystem. If you look at AFS, it doesn't fit the POSIX
model. Most people write code that ignores what the filesystem might
be, and assume POSIX. How many people check the failure status on a
file close? With AFS, you can get "Host not found" when you do a
file close. You can wait, and try again. If you quit, your changes
are lost. (As a library writer, you can try and "squash" the errors
by putting a clever layer of code between the app and the filesystem
that know tricks like this. The Condor people do this, I seem to
recall.)
The point here isn't that developers should never assume a POSIX
filesystem, it is that they should know what kind of filesystem they
are dealing with, so that they can write appropriate code. When you
go distributed, there are a whole new set of error conditions that
can occur. I don't think that there is anything to be gained from
pretending that remote objects are the same as local objects, so that
people's code can stay the same. If the code doesn't know it's
dealing with something that is remote, rather than local, then at
best (i.e. if there is lots of error checking) it will fail far more
often. Probably though, it won't be robust.
It might be worth looking at the following paper, which says
eloquently what I'm grasping for.
"A note on distributed computing" by Jim Waldo et al., available
from: http://research.sun.com/techrep/1994/abstract-29.html
Cheers,
Jon.
More information about the saga-rg
mailing list