[saga-rg] context problem
Pascal Kleijer
k-pasukaru at ap.jp.nec.com
Tue Jul 18 19:21:30 CDT 2006
Hi Andre,
in-lined comments...
>> OK long discussion about the context. Well I just went quickly trough it
>> and here are my comments.
>>
>> 1) When using pure OO languages the context is dependent on how many
>> references to the object exist, regardless of where it is created. So
>> you never lose the object if a pointer to it exists somewhere.
>>
>> 2) In general no copies are done in OOP, just the pointer is maintained,
>> thus negligible time spend on it. Unless the method clearly stipulate
>> that the object will be cloned (full copy).
>>
>> 3) In OO languages the garbage collector handles the effective
>> destruction, so there is not object flushed when they get out of scope
>> (unless no more pointers to it).
>
> Well, that is cretainly true for Java, but not, for example,
> for C++. Also, although the spec is OO, we need to define
> that lifetime explicitely, to allow to semantically
> identical mappings in non-OO languages.
>
> However, I would agree that the behaviour you describe would
> be the one to whish for.
True that non-OO are more tricky to handle. But can the spec not state
that it should handle this problem based on the binding used? If you use
a high context language you will have an easy task, when going down to
more primitive languages like Fortran or C, well the binding will have a
big burden.
In the case of non-OO languages it might be preferable to use by copy
then by reference, this will however have an impact on the runtime but
will avoid you a lot of troubles.
>
>
>> 4) When you have a task running and the objects it shares are changed
>> (state, attributes, etc) it is no problem. The task should properly
>> handle it. Either it fails (error, exception, etc.) or it continues with
>> new values (if possible). This is a typical concurrent programming
>> issue. If you use a language like Java you can also put monitors/mutex
>> on critical sections, so the context cannot be changed concurrently.
>>
>> In definitive this is a language binding problem. The main spec can
>> reference the problem for some particular cases or tell that in
>> concurrent mode the default behavior is "...". Also it is up to the
>> programmer to know what he does when doing concurrent programming. It is
>> not to the SAGA to solve all the issues.
>
> I strongly agree to that: SAGA should not strive to solve
> the concurrent programming problems, but should allow to
> adopt existing practices.
>
>
>> I did a lot of concurrent programing in the past and the basic rule is:
>> - All states, variables, data that must be used gets a local copy. I do
>> not copy objects, only primitives. An object just gets a reference copy.
>> - Elements that are critical are put within a monitor, but it must be a
>> minimum monitor or have a signal mechanism to avoid long lock and
>> possibly deadlocks.
>> - If an internal object state changes and the current thread cannot
>> handle it anymore, an exception is raised and the thread ends.
>>
>> In the case explained below with the write lines, this can widely
>> different between the implementations. If you use monitors in the code
>> you will see something like:
>>
>> whereas the coed
>>
>> saga::task t1 = f.write ("line 1\n"); t1.run (); t1.wait ();
>> saga::task t2 = f.write ("line 2\n"); t2.run (); t2.wait ();
>> saga::task t3 = f.write ("line 3\n"); t3.run (); t3.wait ();
>>
>> will result in a file for example
>>
>> line 3
>> line 1
>> line 2
>>
>> Or any combination of theses 3 lines. However each line will be written
>> fully before the monitor is released. This will cost in execution time,
>> but at least the critical section will be coherent. If not then it is a
>> free for all fight. SAGA just must mention if the run must ensure the
>> critical execution or if it goes for a free-for-all.
>
> The above example should actually, IMHO, result in an
> ordered file, as the wait() calls are synchronizing the
> tasks on application level. Only if the waits are omitted
> the tasks could be executed in any order, resulting in a
> mixed file.
>
> Do I miss something?
No you didn't. I did a blunt Cut & Paste and omitted to remove the
"wait". :(
>>>> Date: Sun, 16 Jul 2006 19:38:54 +0200
>>>> From: Andre Merzky <andre at merzky.net>
>>>> To: Thilo Kielmann <kielmann at cs.vu.nl>
>>>> Cc: Andre Merzky <andre at merzky.net>
>>>> Subject: Re: Fwd (andre at merzky.net): Re: Fwd (andre at merzky.net): Re:
>>>> [saga-rg] context problem
>>>>
>>>> Quoting [Thilo Kielmann] (Jul 16 2006):
>>>>> Merging 2 mails from Andre:
>>>>>
>>>>>> very good points, and indeed (1) seems cleanest. However,
>>>>>> it has its own semantic pitfalls:
>>>>>>
>>>>>> saga::file f (url);
>>>>>> saga::task t = f.write <saga::task::Task> ("hello world", ...);
>>>>>>
>>>>>> f.seek (100, saga::file::SeekSet);
>>>>>>
>>>>>> t.run ();
>>>>>> t.wait ();
>>>>>>
>>>>>>
>>>>>> If on task creation the file object gets copied over, the
>>>>>> subsequent seek (sync) and write (async) work on different
>>>>>> object copies. In particular, these copies will have
>>>>>> different state - seek on one copy will have no effect on
>>>>>> where the write will occur.
>>>>> I cannot see a problem here: With object copying, you will simply have
>>>>> the
>>>>> same file open twice. And given the operations you do, this might even be
>>>>> the right thing...
>>>>> This example is very academic: can you show an example where the sharing
>>>>> of
>>>>> state between tasks is useful, actually?
>>>> The problem here is, that I at a user would expect the write
>>>> to happen at byte 100, but it will happen at byte 0: the
>>>> seek happens on a different object than the write.
>>>>
>>>> What might be a more obvious example, which goes wrong along
>>>> the same lines:
>>>>
>>>> f.write ("line 1\n");
>>>> f.write ("line 2\n");
>>>> f.write ("line 3\n");
>>>>
>>>> That will result in a file
>>>>
>>>> line 1
>>>> line 2
>>>> line 3
>>>>
>>>> whereas the coed
>>>>
>>>> saga::task t1 = f.write ("line 1\n"); t1.run (); t1.wait ();
>>>> saga::task t2 = f.write ("line 2\n"); t2.run (); t2.wait ();
>>>> saga::task t3 = f.write ("line 3\n"); t3.run (); t3.wait ();
>>>>
>>>> will result in a file
>>>>
>>>> line_3
>>>>
>>>> the last write will start on 0, as the previous write
>>>> operated on a different file pointer. In general, you
>>>> cannot execute any two tasks on a single object, at least
>>>> not if any state is of concern, such as file pointer, pwd,
>>>> replica name, stream server port, job id, ...
>>>>
>>>> That is a no-go in my opinion, as it is counter-intuitive,
>>>> and breaks a large number of use cases. And is incosistent
>>>> with the syncroneou method calls.
>>>>
>>>> Yes, you can wreak havoc with state as well:
>>>>
>>>> saga::task t1 = f.write ("line 1\n");
>>>> saga::task t2 = f.write ("line 2\n");
>>>>
>>>> t1.run ();
>>>> t2.run ();
>>>>
>>>> t1.wait ();
>>>> t2.wait ();
>>>>
>>>> will likely result in
>>>>
>>>> linline 2
>>>> e 1
>>>>
>>>> or such - the user does need to think when doing multiple
>>>> async ops at once. I don't see a way around that (and don't
>>>> see a need for it either: we want to make the Grid stuff
>>>> easy, but not revolutionize programming styles).
>>>>
>>>>
>>>>>> I should have added that I'd prefer 3:
>>>>>>
>>>>>>>> 3. when creating a task, all parameter objects are passed "by
>>>>>>>> reference"
>>>>>>>> + no enforced copying overhead
>>>>>>>> - all objects are shared, lots of potential error conditions
>>>>>> The error conditions I could think of are:
>>>>>>
>>>>>> - change state of object while a task is running, hence
>>>>>> having the task doing something differently than
>>>>>> intended
>>>>> Change of state,
>>>> That is intentional - see above.
>>>>
>>>>
>>>>> like destruction of objects
>>>> Well, that is what we discuss :-) 3 would delay destruction
>>>> until its save (state is not needed anymore).
>>>>
>>>>
>>>>> or change of objects.
>>>> What doe you mean here?
>>>>
>>>>
>>>>> Not to speak of synchronization conditions: supposed you
>>>>> have non-atomic write operations (which is everything that
>>>>> writes more than a single word to memory): do you thus
>>>>> also enforce object locking by doing this?
>>>>> If not, you can have inconsistent object state that can be
>>>>> seen by one task, just because another task is halfway
>>>>> through writing the object... (all classical problems of
>>>>> shared-memory communication apply)
>>>> See above. You are right, but I don't see a way around
>>>> that, without causing more harm than good (child and bathtub
>>>> come to my mind for some reason...).
>>>>
>>>> BTW: the bulk optimization we have now assumes that tasks
>>>> which run at the same time are, by their very definition,
>>>> independent from each other, do not depend on any specific
>>>> order of execution, and do not depend from each other in
>>>> respect to object state. That are the very points we talk
>>>> here about - I think its a very sensible assumption. I have
>>>> the same behaviour on a unix shell BTW:
>>>>
>>>> touch file
>>>> date >> file &
>>>> date >> file &
>>>>
>>>> I would not be able to make assumptions about the file
>>>> contents... (well, here I could make a save bet, but you
>>>> know what I mean).
>>>>
>>>>
>>>>>> - limited control over resource deallocation
>>>>> this is the same thing as above
>>>>>
>>>>> The problem really is that there is no "object lifecycle"
>>>>> defined. There is no way to define which task or thread
>>>>> might be responsible or even allowed to destroy objects or
>>>>> change objects. Is it???
>>>> Yes, that is what I mean with limited control.
>>>>
>>>> We had a discussion on this list and in Tokyo about the
>>>> semantics of cancel(), which touches the same problem:
>>>> should task.cancel() block until resources are freed? As we
>>>> might talk about remote resources, and Grids are unreliable,
>>>> we might block forever. That does not make sense, at least
>>>> not always.
>>>>
>>>> The resolution we came up with is that cancel() is advisory,
>>>> so non-blocking, but can also use a timeout parameter (with
>>>> -1 meaning forever) to block until resources are freed.
>>>>
>>>> Timeouts do not make sense on destructors I believe, but
>>>> 'advisory destruction' does, IMHO.
>>>>
>>>>
>>>>>> The advantages I see:
>>>>>>
>>>>>> - no copy overhead (but, as you say, that is of no
>>>>>> concern really)
>>>>> ok, but minor point.
>>>> right. Lets forget that from now on.
>>>>
>>>>
>>>>>> - simple, clear defined semantics
>>>>> no, it is the the most dangerous of the three versions
>>>> Well, see above - I think its the most sensible semantics
>>>> :-)
>>>>
>>>>
>>>>>> - tasks keep objects they operate on alive -
>>>>>> objects keep sessions they live in alive -
>>>>>> sessions keep contexts they use alive
>>>>> what is the maening of "alive" here??? Now that you have
>>>>> outruled memory management...
>>>> see above: resources get freed if not needed anymore.
>>>>
>>>>>> - sync and asyn operations operate on the same
>>>>>> object instance.
>>>>> Let's forget about "sync" here: it is the task that is
>>>>> running in the current thread, so multiple tasks share
>>>>> object instances.
>>>> Well, it would be nice to have same semantics for sync and
>>>> async, don't you think? :-)
>>>>
>>>>
>>>>>> Either way (1, 2 or 3), we have to have the user of the
>>>>>> API thinking while using it - neither is not idiot
>>>>>> proof.
>>>>> Well, we should strive to limit the mental load on the
>>>>> programmer as much as possible...
>>>>>
>>>>>> I think (2) is most problematic, if I understant your
>>>>>> 'hand-over' correctly: that would mean you can't use the
>>>>>> object again until the task was finished?
>>>>> No, it means you will never ever again be allowed to use
>>>>> these objects. (hand over includes the hand over of the
>>>>> responsibility to clean up...)
>>>> Right. So you can never do a async read, and then a sync
>>>> seek, and then a async read again. At least not with
>>>> sensible results.
>>>>
>>>> Also, I need to create 100 file instances to do 100 reads?
>>>> Remember that opening a file is a remote op in itself,
>>>> potentially. Then we don't need the task model anymore.
>>>>
>>>> That is broken IMHO.
>>>>
>>>> Cheers, Andre.
>>>>
>>>>
>>>>> Thilo
>>>> -- "So much time, so little to do..." -- Garfield
>>> ----- End forwarded message -----
--
Best regards,
Pascal Kleijer
----------------------------------------------------------------
HPC Marketing Promotion Division, NEC Corporation
1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan.
Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4385 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20060719/b28a628e/attachment-0003.bin
More information about the saga-rg
mailing list