[ogsa-wg] More comments: HPC Use Cases -- Base Case and Common Cases

Ian Foster foster at mcs.anl.gov
Fri Apr 28 11:45:32 CDT 2006


Susanne:

I'd like to respond to your comments.

I believe that the reference to "network partitions" refers to the fact 
that in a distributed environment, unlike a single machine environment, we 
cannot be sure that messages will be delivered: a network failure can 
result in any message being lost. Thus, a job submission may not receive a 
response, and in that case, we cannot know whether the job was submitted 
(i.e., the request got through, but the response was lost) or not (i.e., 
the request was lost).

One convenient way of dealing with this problem is to allow users to 
associate a "unique job id" with a job request. A scheduler that receives a 
second or subsequent submission with the same jobid should simply return 
the response it provided to the first request received.

It's true that a user can achieve a similar effect by searching for the 
submitted job in the scheduler queue. However, this approach is more 
complex to do, and also suffers from the problem that the job might have 
already completed and thus can't be found that way.

In our recently circulated ESI specification, we proposed an optional 
"unique job id" field in JSDL as a way of addressing this requirement. This 
notion was discussed on a BES call, and people seemed sympathetic to the idea.

Ian.


At 12:20 PM 4/28/2006 -0400, Balle, Susanne wrote:

>Marvin,
>
>Enclosed find the remaining of my comments:
>
>Page 5. (top paragraph) I think I know what you mean by "with the
>ambiguity of distinguishing between scheduler crashes and network
>partitions ". "scheduler crashes" is obvious. I am assuming that by
>"network partitions" you are inferring that various sub-networks are
>going to have different response time which will have an effect on the
>time it takes to deliver a call-back message.
>
>Reading further along in the same paragraph I am now not sure I know
>what you mean by "network partitions".
>
>Page 5. Section 3.3
>The topic of this section is clear (described in the first line of the
>paragraph) but of the section is a little confusing.
>
>"possibility that a client cannot directly tell whether its job
>submission request has been successful ..." --> Do we expect the client
>to re-submit the job if the submission failed or do we expect users to
>inspect that their job has in fact been submitted and resubmit if
>needed? I am wondering if we assume the later if that wouldn't result in
>users re-launching their jobs several time if they do not see their job
>listed in some state when pulling the job scheduler for the state of
>their job?
>
>I guess I do not understand why so much emphasis is put on the
>"At-Most-Once" or "Exactly-Once".
>
>Can't the client poll the Job scheduler and ask the JS for a list of
>jobs queued, running, terminated, failed, etc.? It might be useful for
>the client to be able to submit jobs with a special keyword like
>JOB_SUBMITTED_BY since that would reduce the list it gets back. It would
>be nice if the value for the keyword was a unique identifier but doesn't
>have to be. Most schedulers allows you to name or associate a group to
>programs so that feature could be used as special keyword.
>
>Page 6. Section 3.4
>General question: Are you taking into account that user applications
>will require different software?
>
>1. For example if my executable is compiled for Linux, Intel platform
>then I would like to run it on a Linux,Intel system and not a Linux,AMD
>system.
>
>2. Are you assuming that the program will be compiled on the fly on the
>allocated system? or pre-compiled and then staged?
>
>I agree that staging the data is going to be an interesting topic.
>
>All this is probably out-of-band for the HPC JS Profile but should be
>considered somewhere. I am sure it is I just don't know where.
>
>I like the section on virtual machines and think that they will be used
>more and more in the future.
>
>Page 7. Extended Resource Features
>The second approach (arbitrary resource types ...) is the only one that
>make sense to me since that approach is extensible. I believe that Moab
>is implementing this approach as well.
>
>Page 8. Extended Client/System Administrator Operations
>
>Are you assuming that System Administrators will be able to perform sys
>admin operations on somebody else's system? I don't think that is right.
>
>You mention suspend-resume. Are you thinking of suspending a job running
>across several clusters that are in different organizations? Or just
>suspending a job on a single cluster/server?
>
>Again I am trying to figure out how this fit in with "One important
>aspect, is that the individual clusters should remain under the control
>of their local system administrator and/or of their local policies".
>
>I believe that suspend-resume is a JS operation or an operation to be
>performed by the local sys admin, NOT by remote sys admins.
>
>If we are now talking about a meta-scheduler then yes it makes sense. In
>the case of a meta-scheduler it might take over the individual JS and
>schedule jobs base on its own policies, on its job reservation system,
>etc. In this case I look at it as we have one deciding entity (the
>meta-scheduler) and several "slaves". Moab and Maui are the only
>meta-scheduler I an familiar with and they do take over the scheduling
>decisions/node allocations/etc and just submit jobs to the local job
>schedulers.
>
>This does of course assume that the local system administrators have
>agreed on a schedule when their cluster is shared within this greater
>infrastructure. This is a different approach than having jobs passed
>onto their local scheduler and run on their systems.
>
>This just seems to be a different approach from the one that is taken in
>this paper.
>I might be wrong. If I am please educate me.
>
>Page 9. Section 3.10
>
>Don't forget UPC (Unified Parallel C: http://upc.nersc.gov/). This
>parallel programming paradigm is getting more and more interest from
>several communities.
>We'll need to provide support for UPC as well.
>
>Page 10. Section 3.13
>A meta-scheduler approach that make sense to me is to allow developers
>to submit their job to their local cluster using their "favorite"
>scheduler commands and then have the meta-scheduler load-balance the
>work and forward the job to another system/cluster if needed. Moab from
>cluster resources support this approach even if the clusters have
>different JSs. They have a list of supported JS such as LSF, PBSpro,
>SLURM, etc. and they can "translate" one JS's commands into another
>within that supported set.
>
>Page 11. SLURM is missing.
>
>Let me know what you think,
>
>Regards
>
>Susanne
>
>---------------------------------------------------------------
>Susanne M. Balle,
>Hewlett-Packard
>High Performance Computing R&D Organization
>110 Spit Brook Road
>Nashua, NH 03062
>
>Phone: 603-884-7732
>Fax:     603-884-0630
>
>Susanne.Balle at hp.com

_______________________________________________________________
    Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
       Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/ogsa-wg/attachments/20060428/d5eac770/attachment.htm 


More information about the ogsa-wg mailing list