[drmaa-wg] perl DRMAA, SGE and working directory

Martín Sarachu msarachu at biol.unlp.edu.ar
Tue Dec 14 08:56:14 CST 2004


So you can also specfiy a host in DRMAA_WD variable? Not only a directory?

Martin

Quoting "Rajic, Hrabri" <hrabri.rajic at intel.com>:

> Hi,
> 
> If we look at the string /home/msarachu/.showdb.04.12.09:17.37.42
> and consider the following expression (we need to add this syntax into the
> spec for the working directory as well)
> 
> [hostname]:file_path
> 
> then it is not surprising for the runtime to look for directory 17.37.42.
> Unfortunately, the second error, not being able to find host
> /home/msarachu/.showdb.04.12.09 is not displayed.
> 
> Hope this helps from the standard point of view.
> 
> Regards,
>     -Hrabri
> 
> 
> -----Original Message-----
> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf Of
> Martín Sarachu
> Sent: Tuesday, December 14, 2004 8:28 AM
> To: drmaa-wg at ggf.org
> Subject: [drmaa-wg] perl DRMAA, SGE and working directory
> 
> Dear list,
> 
> I'm using Schedule-DRMAAc-0.81 and SGE to be able to queue jobs from a web
> interface.
> 
> Here's my problem: When launching a job with something like  /home/msarachu 
> as
> the $DRMAA_WD it runs ok, but when using a directory like
> /home/msarachu/.showdb.04.12.09:17.37.42  as $DRMAA_WD the script does not
> run
> and the error reported by SGE is "28  : changing into working directory".
> I also passed the directory "escaped" (\.showdb\.04\.12\.09\:17\.37\.42) and
> got
> the same error, although passing the string
> "/home/msarachu/.showdb.04.12.09:17.37.42/job.sh" to the
> $DRMAA_REMOTE_COMMAND
> argument works fine because the job is sent to the queue.
> Is there any way to mask this directory so it changes ok to the working
> directory?
> 
> Below is an email from a failed job I tried to run with
> DRMAA_WD = /home/msarachu/wProjects/tope/.showdb.04.12.14:16.30.47
> Look at the sheperd error, apparently is truncating the dir just before the
> :
> 
> If I submit the job from
> /home/msarachu/wProjects/tope/.showdb.04.12.14:16.30.47
> with command 'qsub -cwd job.sh' it works ok.
> 
> -----
> Job 123 caused action: Job 123 set to ERROR
>  User        = msarachu
>  Queue       = all.q at pentiumIV.embnet-ar.org
>  Host        = pentiumIV.embnet-ar.org
>  Start Time  = <unknown>
>  End Time    = <unknown>
> failed changing into working directory:can't read usage file for job 123.1
> 
> Shepherd trace:
> 12/13/2004 16:31:04 [502:24622]: shepherd called with uid = 0, euid = 502
> 12/13/2004 16:31:04 [502:24622]: starting up 6.0u1
> 12/13/2004 16:31:04 [502:24622]: setpgid(24622, 24622) returned 0
> 12/13/2004 16:31:04 [502:24622]: no prolog script to start
> 12/13/2004 16:31:04 [502:24623]: pid=24623 pgrp=24623 sid=24623 old
> pgrp=24622
> getlogin()=<no login set>
> 12/13/2004 16:31:04 [502:24623]: setosjobid: uid = 0, euid = 502
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_CPU setting: (soft 4294967295 hard
> 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_FSIZE setting: (soft 4294967295
> hard
> 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_DATA setting: (soft 4294967295 hard
> 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_STACK setting: (soft 4294967295
> hard
> 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_CORE setting: (soft 4294967295 hard
> 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_VMEM/RLIMIT_AS setting: (soft
> 4294967295
> hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [502:24623]: RLIMIT_RSS setting: (soft 4294967295 hard
> 4294967295) resulting: (soft 4294967295 hard 4294967295)
> 12/13/2004 16:31:04 [500:24623]: closing all filedescriptors
> 12/13/2004 16:31:04 [500:24623]: further messages are in "error" and
> "trace"
> 12/13/2004 16:31:04 [502:24622]: forked "job" with pid 24623
> 12/13/2004 16:31:04 [502:24622]: child: job - pid: 24623
> 12/13/2004 16:31:04 [502:24622]: wait3 returned 24623 (status: 7168;
> WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 28)
> 12/13/2004 16:31:04 [502:24622]: job exited with exit status 28
> 12/13/2004 16:31:04 [502:24622]: reaped "job" with pid 24623
> 12/13/2004 16:31:04 [502:24622]: job exited not due to signal
> 12/13/2004 16:31:04 [502:24622]: job exited with status 28
> 12/13/2004 16:31:04 [502:24622]: now sending signal KILL to pid -24623
> 12/13/2004 16:31:04 [502:24622]: no tasker to notify
> 12/13/2004 16:31:04 [502:24622]: failed starting job
> 12/13/2004 16:31:04 [502:24622]: no epilog script to start
> 
> Shepherd error:
> 12/13/2004 16:31:04 [500:24623]: error: can't chdir to :16.30.47: No such
> file
> or directory
> 
> Shepherd pe_hostfile:
> pentiumIV.embnet-ar.org 1 all.q at pentiumIV.embnet-ar.org UNDEFINED
> -----
> 
> I sent this same email to Tim and also SGE users list. Tim also suggested
> to
> send it to this list.
> 
> Thanks in advance.
> 
> Best regards,
> 
> Martin 
> 
> -- 
> Martín Sarachu
> msarachu at biol.unlp.edu.ar
> EMBnet Argentina
> http://www.ar.embnet.org
> 
> 


-- 
Martín Sarachu
msarachu at biol.unlp.edu.ar
EMBnet Argentina
http://www.ar.embnet.org





More information about the drmaa-wg mailing list