[ogsa-wg] FW: Comments to the HPC Use Cases: Base Case and Common Cases

Tue Apr 25 06:27:42 CDT 2006

-----Original Message-----
From: Marvin Theimer [mailto:theimer at microsoft.com] 
Sent: Monday, April 24, 2006 9:06 PM
To: Balle, Susanne
Cc: Treadwell, Jem; Marvin Theimer
Subject: RE: Comments to the HPC Use Cases: Base Case and Common Cases

Hi;

Thanks for your input!  My comments on your comments are in-line.

How do you feel about my posting this email to the ogsa-wg mailing list
so that others can see the issues you've raised plus my responses?  No
problem if you'd rather not, but I think it would be interesting to the
larger community if you do feel comfortable with it.  Let me know -- I
will keep this email thread private until you tell me otherwise.

Marvin.

-----Original Message-----
From: Balle, Susanne [mailto:Susanne.Balle at hp.com]
Sent: Monday, April 24, 2006 4:44 PM
To: Marvin Theimer
Cc: Balle, Susanne; Treadwell, Jem
Subject: Comments to the HPC Use Cases: Base Case and Common Cases

Marvin,

I read the document "HPC Use Cases: Base Case and Common Cases" and
thought it was a good start. 

I do have a couple of comments about the documents which I have enclosed
below:

1. I would re-title the document to "HPC Job Scheduling Use Cases --
Base Case and Common Case". The main reason is that too many topics
which are important to HPC are "out-of-band" areas with regards to the
focus of this document. The document does only focus on job scheduling
so why not just cal it that?
[MT] Good point.  I'll try it and see how others react.

2. Under Base Case
In this section I would like to add a point about "users being able to
query for available resources". I do that all the time before I launch a
job. It is nice to know what resources are currently available or
currently up so that I don't submit a job for 512 nodes and only 510 of
the 512 nodes are up. 
[MT] This seems reasonable.  A key thing will be to define what the
"minimal" set of useful information is.  Additional "commonly desired"
information should then be defined as common case extensions. 

3. Page 3 (top). Reading this Section made me decide that renaming the
document made more sense. This is NOT "HPC Use Cases" but "HPC Job
Scheduling Base Cases". This will allow others to follow your lead and
create "HPC Other topic (.ie. Data management, etc.) Use Cases".
[MT] Agreed.

4. Page 3 (bottom). One important aspect that you have left out here is
that the individual clusters should remain under the control of their
local system administrator and/or of their local policies. You cannot
impose a FIFO policy onto clusters with a different scheduling policy. 
[MT] You raise a good point.  My goal in specifying FIFO was to say that
the "simplest" scheduling policy would be the only thing "required" of
all schedulers.  But I think the true base case is that a scheduler is
free to pick whatever scheduling policy (or policies) that it wishes.
I.e. that specification of required scheduling policies is out-of-scope
for the base case.

5. Page 3 (bottom) I am not quick sure what you mean by "the only
scheduling policy supported is FIFO". Where do you mean? Aren't you
planning on passing the jobs onto the local scheduler which will then
apply whatever policy it is set to obey? Do you mean that in the
scheduler infrastructure you want to create you will only support FIFO
which in fact will just consist in passing it on to the local scheduler?
I am a little confused after reading this section.  
[MT] My comment to point 4 is applicable here.  My goal was to define
the "smallest" or "simplest" set of interop requirements possible for
the base case.  Extensions could then define scheduling policies that a
scheduler promises to provide.

Does "only supporting FIFO" mean that you will only submit one job at
the time to a cluster?
[MT] No, if more than one job will fit on the cluster then there is no
reason not to run two or more jobs simultaneously.  But I'm changing the
base case to not require any particular scheduling policy in any case.

6. " A job consists of a single program running on a single node". I
believe this is too restrictive. MPI programs need to be considered to
make sure that we have the right level of confidence that the
design/infrastructure will work for parallel programs as well. How about
OpenMP or threaded programs? Have you taken this type of programs into
consideration? I do not see them mentioned anywhere. 
[MT] These fall under the "common cases that will be handled via
extensions" category.  Note that support for MPI programs typically
involves infrastructure (such as SMPD daemons) that not all scheduling
systems necessarily support; hence MPI programs are not explicitly
supported in the base case.  The whole purpose of defining the various
common cases in the next section of the document is to ensure that we
address how to handle things like MPI programs correctly by means of
thought-out extensions.  The same applies for threaded/OpenMP programs.

7. After having read Section 2 I believe that the common case is too
restrictive. I understand that you want things to be simple but maybe
they have become too simple. I am worried that you cannot extend this
simple base case to fit most common requirements for HPC applications.
[MT] Figuring out how to extend the base case to cover all the listed
common cases is the task of the working group. :-)  Personally I believe
that we can define suitable extensions that will enable the simple base
case I've defined to be extended to the common cases.  If that's not the
case then we'll certainly change the base case as necessary.  But I want
to start with the simplest possible base case since it's a constant
battle to keep peoples' pet features out of the base design and it's a
slippery slope to hell once you step away from the absolutely most
minimal base case that lets the whole edifice (base case plus
extensions) hang together.

8. Section 3.1 
You forgot SLURM in the list and also Moab which is the commercial
product from Cluster Resources who support Maui. Moab's Grid scheduler
is very interested and offers a lot of very desirable features for a
Grid environment. 

More info on SLURm is available at:
http://www.llnl.gov/linux/slurm/slurm.html

I am working on SLURM and would be happy to provide you with what you
need for your document or to answer any questions you have about SLURM.
[MT] Great!  I really appreciate it.

I have some more comments on the remaining sections. I will send them in
a separate email tomorrow.

Regards

Susanne 

-----------------------------------------
Susanne M. Balle
Hewlett-Packard
High Performance Computing R&D Organization
MS ZKO01-3
110 Spit Brook Road
Nashua, NH 03062

Phone: 603-884-7732
Fax:     603-884-0630

Susanne.Balle at hp.com