[DRMAA-WG] normal exit status causes drmaa_wifaborted

Tim Harsch harsch1 at llnl.gov
Wed Mar 28 20:22:01 CDT 2007


Daniel,
    By what method does the Java binding, bind to the C binding ( e.g. the 
perl binding uses SWIG... )

I'm diving into the Perl binding now, but its been about 4 years since I 
wrote it.... so it's gonna take me some time I think.

PS It's really odd, the problem showed up in code I've put in regular use 
for a long time, I know bugs just don't introduce themselves, but this part 
of the binding worked fine, I haven't upraded SGE or Perl or recompiled 
Schedule::DRMAAc but the problem just appeared.  I'm thinking the sysadmins 
ran up2date on my RH4 box and a dependency library to the C binding changed. 
But, if your Java binding is actively using it, then it would rule that 
out...


----- Original Message ----- 
From: "Daniel Templeton" <Dan.Templeton at Sun.COM>
To: "Tim Harsch" <harsch1 at llnl.gov>
Cc: "DRMAA-WG" <drmaa-wg at gridforum.org>
Sent: Wednesday, March 28, 2007 10:48 AM
Subject: Re: [DRMAA-WG] normal exit status causes drmaa_wifaborted


> Tim,
>
> Looks like something localized to the Perl binding or your
> configuration.  I did the same test on the Java language binding, which
> is also based on the C binding, and it worked fine for me.  Output
> below, program attached.
>
> Could the problem be that you're sending the full command line as the
> remote command and "1" as the args, instead of "csh" as the remote
> command and "-c", "'exit 1'" as the args?  What is the meaning of
> setting the args to "1"?
>
> ---
>
> % java -cp /sge/lib/drmaa.jar:. -d64 Test
> Exited: true
> Aborted: false
> Signaled: false
>
> ---
>
> Daniel
>
> Tim Harsch wrote:
>> I don't understand why causing a simple non-zero exit status is
>> causing drmaa_wifaborted to be set.
>>
>> The easiest way for me to demo this is to change line 38 of
>> t/08_posix_tests.t of the Schedule::DRMAAc CPAN module to be
>> my $remote_cmd = "csh -c 'exit 1'";
>>
>> And then running "make test TEST_VERBOSE=1", which would produce:
>> <SNIP>
>> ok 12 - drmaa_wait says jobid did not change?
>> #     Failed test (t/08_posix_tests.t at line 83)
>> not ok 13 - drmaa_wait should say there is more info available in
>> POSIX funcs
>> ok 15 - drmaa_wifaborted error?
>> #     Failed test (t/08_posix_tests.t at line 90)
>> not ok 16 - normal job should not abort.
>> ok 17 - drmaa_wifexited returned 3 of 3 args
>> ok 18 - drmaa_wifexited error?
>> #     Failed test (t/08_posix_tests.t at line 97)
>> not ok 19 - normal job should exit.
>> <SNIP>
>>
>> I've attached test 8 to this email, in case you want to see how the
>> calls are made in Perl.
>>
>> Any ideas?
>>
>> Thanks,
>> Tim Harsch
>> ------------------------------------------------------------------------
>>
>> --
>>   drmaa-wg mailing list
>>   drmaa-wg at ogf.org
>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>
>


--------------------------------------------------------------------------------


> import org.ggf.drmaa.*;
>
> public class Test {
> public static void main(String[] args) throws Exception {
> Session s = SessionFactory.getFactory().getSession();
> s.init("");
> JobTemplate jt = s.createJobTemplate();
> jt.setRemoteCommand("/usr/bin/csh");
> jt.setArgs(new String[] {"-c", "'exit 1'"});
> String job = s.runJob(jt);
> JobInfo ji = s.wait(job, s.TIMEOUT_WAIT_FOREVER);
> System.out.println("Exited: " + ji.hasExited());
> System.out.println("Aborted: " + ji.wasAborted());
> System.out.println("Signaled: " + ji.hasSignaled());
> s.deleteJobTemplate(jt);
> s.exit();
> }
> }
> 


More information about the drmaa-wg mailing list