[saga-rg] context problem

Tue Jul 18 19:21:30 CDT 2006

Hi Andre,

in-lined comments...

>> OK long discussion about the context. Well I just went quickly trough it 
>> and here are my comments.
>>
>> 1) When using pure OO languages the context is dependent on how many 
>> references to the object exist, regardless of where it is created. So 
>> you never lose the object if a pointer to it exists somewhere.
>>
>> 2) In general no copies are done in OOP, just the pointer is maintained, 
>> thus negligible time spend on it. Unless the method clearly stipulate 
>> that the object will be cloned (full copy).
>>
>> 3) In OO languages the garbage collector handles the effective 
>> destruction, so there is not object flushed when they get out of scope 
>> (unless no more pointers to it).
> 
> Well, that is cretainly true for Java, but not, for example,
> for C++.  Also, although the spec is OO, we need to define
> that lifetime explicitely, to allow to semantically
> identical mappings in non-OO languages.
> 
> However, I would agree that the behaviour you describe would
> be the one to whish for.

True that non-OO are more tricky to handle. But can the spec not state 
that it should handle this problem based on the binding used? If you use 
a high context language you will have an easy task, when going down to 
more primitive languages like Fortran or C, well the binding will have a 
big burden.

In the case of non-OO languages it might be preferable to use by copy 
then by reference, this will however have an impact on the runtime but 
will avoid you a lot of troubles.

> 
> 
>> 4) When you have a task running and the objects it shares are changed 
>> (state, attributes, etc) it is no problem. The task should properly 
>> handle it. Either it fails (error, exception, etc.) or it continues with 
>> new values (if possible). This is a typical concurrent programming 
>> issue. If you use a language like Java you can also put monitors/mutex 
>> on critical sections, so the context cannot be changed concurrently.
>>
>> In definitive this is a language binding problem. The main spec can 
>> reference the problem for some particular cases or tell that in 
>> concurrent mode the default behavior is "...". Also it is up to the 
>> programmer to know what he does when doing concurrent programming. It is 
>> not to the SAGA to solve all the issues.
> 
> I strongly agree to that: SAGA should not strive to solve
> the concurrent programming problems, but should allow to
> adopt existing practices.
> 
> 
>> I did a lot of concurrent programing in the past and the basic rule is: 
>> - All states, variables, data that must be used gets a local copy. I do 
>> not copy objects, only primitives. An object just gets a reference copy.
>> - Elements that are critical are put within a monitor, but it must be a 
>> minimum monitor or have a signal mechanism to avoid long lock and 
>> possibly deadlocks.
>> - If an internal object state changes and the current thread cannot 
>> handle it anymore, an exception is raised and the thread ends.
>>
>> In the case explained below with the write lines, this can widely 
>> different between the implementations. If you use monitors in the code 
>> you will see something like:
>>
>>  whereas the coed
>>
>>    saga::task t1 = f.write ("line 1\n"); t1.run (); t1.wait ();
>>    saga::task t2 = f.write ("line 2\n"); t2.run (); t2.wait ();
>>    saga::task t3 = f.write ("line 3\n"); t3.run (); t3.wait ();
>>
>>  will result in a file for example
>>
>>    line 3
>>    line 1
>>    line 2
>>
>> Or any combination of theses 3 lines. However each line will be written 
>> fully before the monitor is released. This will cost in execution time, 
>> but at least the critical section will be coherent. If not then it is a 
>> free for all fight. SAGA just must mention if the run must ensure the 
>> critical execution or if it goes for a free-for-all.
> 
> The above example should actually, IMHO, result in an
> ordered file, as the wait() calls are synchronizing the
> tasks on application level.  Only if the waits are omitted
> the tasks could be executed in any order, resulting in a
> mixed file.
> 
> Do I miss something?

No you didn't. I did a blunt Cut & Paste and omitted to remove the 
"wait". :(

>>>> Date: Sun, 16 Jul 2006 19:38:54 +0200
>>>> From: Andre Merzky <andre at merzky.net>
>>>> To: Thilo Kielmann <kielmann at cs.vu.nl>
>>>> Cc: Andre Merzky <andre at merzky.net>
>>>> Subject: Re: Fwd (andre at merzky.net): Re: Fwd (andre at merzky.net): Re: 
>>>> [saga-rg] context problem
>>>>
>>>> Quoting [Thilo Kielmann] (Jul 16 2006):
>>>>> Merging 2 mails from Andre:
>>>>>
>>>>>> very good points, and indeed (1) seems cleanest.  However,
>>>>>> it has its own semantic pitfalls:
>>>>>>
>>>>>>  saga::file f (url);
>>>>>>  saga::task t = f.write <saga::task::Task> ("hello world", ...);
>>>>>>
>>>>>>  f.seek (100, saga::file::SeekSet);
>>>>>>
>>>>>>  t.run  ();
>>>>>>  t.wait ();
>>>>>>
>>>>>>
>>>>>> If on task creation the file object gets copied over, the
>>>>>> subsequent seek (sync) and write (async) work on different
>>>>>> object copies.  In particular, these copies will have
>>>>>> different state - seek on one copy will have no effect on
>>>>>> where the write will occur.
>>>>> I cannot see a problem here: With object copying, you will simply have 
>>>>> the
>>>>> same file open twice. And given the operations you do, this might even be
>>>>> the right thing...
>>>>> This example is very academic: can you show an example where the sharing 
>>>>> of
>>>>> state between tasks is useful, actually?
>>>> The problem here is, that I at a user would expect the write
>>>> to happen at byte 100, but it will happen at byte 0: the
>>>> seek happens on a different object than the write.
>>>>
>>>> What might be a more obvious example, which goes wrong along
>>>> the same lines:
>>>>
>>>>  f.write ("line 1\n");
>>>>  f.write ("line 2\n");
>>>>  f.write ("line 3\n");
>>>>
>>>> That will result in a file
>>>>
>>>>  line 1
>>>>  line 2
>>>>  line 3
>>>>
>>>> whereas the coed
>>>>
>>>>  saga::task t1 = f.write ("line 1\n"); t1.run (); t1.wait ();
>>>>  saga::task t2 = f.write ("line 2\n"); t2.run (); t2.wait ();
>>>>  saga::task t3 = f.write ("line 3\n"); t3.run (); t3.wait ();
>>>>
>>>> will result in a file
>>>>
>>>>  line_3
>>>>
>>>> the last write will start on 0, as the previous write
>>>> operated on a different file pointer.  In general, you
>>>> cannot execute any two tasks on a single object, at least
>>>> not if any state is of concern, such as file pointer, pwd,
>>>> replica name, stream server port, job id, ...
>>>>
>>>> That is a no-go in my opinion, as it is counter-intuitive,
>>>> and breaks a large number of use cases.  And is incosistent
>>>> with the syncroneou method calls.
>>>>
>>>> Yes, you can wreak havoc with state as well:
>>>>
>>>>  saga::task t1 = f.write ("line 1\n");
>>>>  saga::task t2 = f.write ("line 2\n"); 
>>>>
>>>>  t1.run (); 
>>>>  t2.run (); 
>>>>
>>>>  t1.wait ();
>>>>  t2.wait ();
>>>>
>>>> will likely result in
>>>>
>>>>  linline 2
>>>>  e 1
>>>>
>>>> or such - the user does need to think when doing multiple
>>>> async ops at once.  I don't see a way around that (and don't
>>>> see a need for it either: we want to make the Grid stuff
>>>> easy, but not revolutionize programming styles).  
>>>>  
>>>>
>>>>>> I should have added that I'd prefer 3:
>>>>>>
>>>>>>>> 3. when creating a task, all parameter objects are passed "by 
>>>>>>>> reference"
>>>>>>>>   + no enforced copying overhead
>>>>>>>>   - all objects are shared, lots of potential error conditions
>>>>>> The error conditions I could think of are:
>>>>>>
>>>>>>  - change state of object while a task is running, hence
>>>>>>    having the task doing something differently than
>>>>>>    intended
>>>>> Change of state, 
>>>> That is intentional - see above.
>>>>
>>>>
>>>>> like destruction of objects 
>>>> Well, that is what we discuss :-)  3 would delay destruction
>>>> until its save (state is not needed anymore).
>>>>
>>>>
>>>>> or change of objects.
>>>> What doe you mean here?
>>>>
>>>>
>>>>> Not to speak of synchronization conditions: supposed you
>>>>> have non-atomic write operations (which is everything that
>>>>> writes more than a single word to memory): do you thus
>>>>> also enforce object locking by doing this?
>>>>> If not, you can have inconsistent object state that can be
>>>>> seen by one task, just because another task is halfway
>>>>> through writing the object...  (all classical problems of
>>>>> shared-memory communication apply)
>>>> See above.  You are right, but I don't see a way around
>>>> that, without causing more harm than good (child and bathtub
>>>> come to my mind for some reason...).
>>>>
>>>> BTW: the bulk optimization we have now assumes that tasks
>>>> which run at the same time are, by their very definition,
>>>> independent from each other, do not depend on any specific
>>>> order of execution, and do not depend from each other in
>>>> respect to object state.  That are the very points we talk
>>>> here about - I think its a very sensible assumption.  I have
>>>> the same behaviour on a unix shell BTW:
>>>>
>>>>  touch file
>>>>  date >> file &
>>>>  date >> file &
>>>>
>>>> I would not be able to make assumptions about the file
>>>> contents... (well, here I could make a save bet, but you
>>>> know what I mean).
>>>>
>>>>
>>>>>>  - limited control over resource deallocation
>>>>> this is the same thing as above
>>>>>
>>>>> The problem really is that there is no "object lifecycle"
>>>>> defined.  There is no way to define which task or thread
>>>>> might be responsible or even allowed to destroy objects or
>>>>> change objects. Is it???
>>>> Yes, that is what I mean with limited control.
>>>>
>>>> We had a discussion on this list and in Tokyo about the
>>>> semantics of cancel(), which touches the same problem:
>>>> should task.cancel() block until resources are freed?  As we
>>>> might talk about remote resources, and Grids are unreliable,
>>>> we might block forever.  That does not make sense, at least
>>>> not always.
>>>>
>>>> The resolution we came up with is that cancel() is advisory,
>>>> so non-blocking, but can also use a timeout parameter (with
>>>> -1 meaning forever) to block until resources are freed.
>>>>
>>>> Timeouts do not make sense on destructors I believe, but
>>>> 'advisory destruction' does, IMHO.
>>>>
>>>>
>>>>>> The advantages I see:
>>>>>>
>>>>>>  - no copy overhead (but, as you say, that is of no
>>>>>>  concern really)
>>>>> ok, but minor point.
>>>> right.  Lets forget that from now on.
>>>>
>>>>
>>>>>>  - simple, clear defined semantics 
>>>>> no, it is the the most dangerous of the three versions
>>>> Well, see above - I think its the most sensible semantics
>>>> :-)
>>>>
>>>>
>>>>>>    - tasks    keep objects  they operate on alive -
>>>>>>    objects  keep sessions they live in    alive -
>>>>>>    sessions keep contexts they use        alive
>>>>> what is the maening of "alive" here???  Now that you have
>>>>> outruled memory management...
>>>> see above: resources get freed if not needed anymore.
>>>>
>>>>>>    - sync and asyn operations operate on the same
>>>>>>    object instance.
>>>>> Let's forget about "sync" here: it is the task that is
>>>>> running in the current thread, so multiple tasks share
>>>>> object instances.
>>>> Well, it would be nice to have same semantics for sync and
>>>> async, don't you think? :-)
>>>>
>>>>
>>>>>> Either way (1, 2 or 3), we have to have the user of the
>>>>>> API thinking while using it - neither is not idiot
>>>>>> proof. 
>>>>> Well, we should strive to limit the mental load on the
>>>>> programmer as much as possible...
>>>>>
>>>>>> I think (2) is most problematic, if I understant your
>>>>>> 'hand-over' correctly: that would mean you can't use the
>>>>>> object again until the task was finished?  
>>>>> No, it means you will never ever again be allowed to use
>>>>> these objects.  (hand over includes the hand over of the
>>>>> responsibility to clean up...)
>>>> Right.  So you can never do a async read, and then a sync
>>>> seek, and then a async read again.  At least not with
>>>> sensible results.  
>>>>
>>>> Also, I need to create 100 file instances to do 100 reads?
>>>> Remember that opening a file is a remote op in itself,
>>>> potentially.  Then we don't need the task model anymore.
>>>>
>>>> That is broken IMHO.
>>>>
>>>> Cheers, Andre.
>>>>
>>>>
>>>>> Thilo
>>>> -- "So much time, so little to do..."  -- Garfield
>>> ----- End forwarded message -----

-- 

Best regards,
Pascal Kleijer

----------------------------------------------------------------
   HPC Marketing Promotion Division, NEC Corporation
   1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan.
   Tel: +81-(0)42/333.6389       Fax: +81-(0)42/333.6382
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4385 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20060719/b28a628e/attachment-0003.bin