[DRMAA-WG] normal exit status causes drmaa_wifaborted
Tim Harsch
harsch1 at llnl.gov
Wed Mar 28 20:22:01 CDT 2007
Daniel,
By what method does the Java binding, bind to the C binding ( e.g. the
perl binding uses SWIG... )
I'm diving into the Perl binding now, but its been about 4 years since I
wrote it.... so it's gonna take me some time I think.
PS It's really odd, the problem showed up in code I've put in regular use
for a long time, I know bugs just don't introduce themselves, but this part
of the binding worked fine, I haven't upraded SGE or Perl or recompiled
Schedule::DRMAAc but the problem just appeared. I'm thinking the sysadmins
ran up2date on my RH4 box and a dependency library to the C binding changed.
But, if your Java binding is actively using it, then it would rule that
out...
----- Original Message -----
From: "Daniel Templeton" <Dan.Templeton at Sun.COM>
To: "Tim Harsch" <harsch1 at llnl.gov>
Cc: "DRMAA-WG" <drmaa-wg at gridforum.org>
Sent: Wednesday, March 28, 2007 10:48 AM
Subject: Re: [DRMAA-WG] normal exit status causes drmaa_wifaborted
> Tim,
>
> Looks like something localized to the Perl binding or your
> configuration. I did the same test on the Java language binding, which
> is also based on the C binding, and it worked fine for me. Output
> below, program attached.
>
> Could the problem be that you're sending the full command line as the
> remote command and "1" as the args, instead of "csh" as the remote
> command and "-c", "'exit 1'" as the args? What is the meaning of
> setting the args to "1"?
>
> ---
>
> % java -cp /sge/lib/drmaa.jar:. -d64 Test
> Exited: true
> Aborted: false
> Signaled: false
>
> ---
>
> Daniel
>
> Tim Harsch wrote:
>> I don't understand why causing a simple non-zero exit status is
>> causing drmaa_wifaborted to be set.
>>
>> The easiest way for me to demo this is to change line 38 of
>> t/08_posix_tests.t of the Schedule::DRMAAc CPAN module to be
>> my $remote_cmd = "csh -c 'exit 1'";
>>
>> And then running "make test TEST_VERBOSE=1", which would produce:
>> <SNIP>
>> ok 12 - drmaa_wait says jobid did not change?
>> # Failed test (t/08_posix_tests.t at line 83)
>> not ok 13 - drmaa_wait should say there is more info available in
>> POSIX funcs
>> ok 15 - drmaa_wifaborted error?
>> # Failed test (t/08_posix_tests.t at line 90)
>> not ok 16 - normal job should not abort.
>> ok 17 - drmaa_wifexited returned 3 of 3 args
>> ok 18 - drmaa_wifexited error?
>> # Failed test (t/08_posix_tests.t at line 97)
>> not ok 19 - normal job should exit.
>> <SNIP>
>>
>> I've attached test 8 to this email, in case you want to see how the
>> calls are made in Perl.
>>
>> Any ideas?
>>
>> Thanks,
>> Tim Harsch
>> ------------------------------------------------------------------------
>>
>> --
>> drmaa-wg mailing list
>> drmaa-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>
>
--------------------------------------------------------------------------------
> import org.ggf.drmaa.*;
>
> public class Test {
> public static void main(String[] args) throws Exception {
> Session s = SessionFactory.getFactory().getSession();
> s.init("");
> JobTemplate jt = s.createJobTemplate();
> jt.setRemoteCommand("/usr/bin/csh");
> jt.setArgs(new String[] {"-c", "'exit 1'"});
> String job = s.runJob(jt);
> JobInfo ji = s.wait(job, s.TIMEOUT_WAIT_FOREVER);
> System.out.println("Exited: " + ji.hasExited());
> System.out.println("Aborted: " + ji.wasAborted());
> System.out.println("Signaled: " + ji.hasSignaled());
> s.deleteJobTemplate(jt);
> s.exit();
> }
> }
>
More information about the drmaa-wg
mailing list