[jsdl-wg] Questions and potential changes to JSDL, as seen from HPC Profile point-of-view

Fri Jun 9 20:26:52 CDT 2006

Hi;

My responses are in-line below.

Marvin.

________________________________

From: A S McGough [mailto:asm at doc.ic.ac.uk] 
Sent: Friday, June 09, 2006 3:03 AM
To: Marvin Theimer
Cc: JSDL Working Group; ogsa-bes-wg at ggf.org; Ed Lassettre; Ming Xu
(WINDOWS)
Subject: Re: [jsdl-wg] Questions and potential changes to JSDL, as seen
from HPC Profile point-of-view

Hi Marvin,

Thanks for the (fairly long) email, you've raised quite a few
interesting points - which I'll address inline below. First off I'd just
like to say that the JSDL document is meant to be a language
specification document thus a large number of the issues about how JSDL
should be used and what they have to support is not really in scope for
that document. However, I do agree with you that such a document needs
to exist - but for all uses of JSDL not just HPC. I would like to take
your straw man and use it as the starting point for this document for
the section on "using JSDL for HPC". Let me know what you think.

[Marvin] As long as the HPC profile specification has some
document/specification that it can employ to normatively define
behaviors, I'm happy.  Presumably "compliance" with JSDL will defined to
mean compliance with this second document that you propose to create?

More comments below:

Marvin Theimer wrote: 

Hi;

Coming from the point-of-view of the HPC Profile working group, I have
several questions about JSDL, as well as some straw man thoughts about
how JSDL should/could relate to the HPC Profile specification that I'm
involved with.  Some of my questions lead me to restrictions on JSDL
that an HPC profile specification might make.  Other questions lead to
potential changes that might be made as part of creating future versions
of JSDL.  (I'm well aware that JSDL 1.0 was meant as a starting point
rather than the final word on job submission descriptions and so please
interpret my questions as being an attempt at constructive suggestions
rather than a criticism of a very fine first step by the JSDL working
group.)

Will do.

At a high level, there are several general questions that came up when
reading the JSDL 1.0 specification:

1.                  Can JSDL documents describe jobs other than
Linux/Unix/Posix jobs?  For example, things like mount points and mount
sources do not map in a completely straight-forward manner to how file
systems are provided in the Windows world.

The idea is that JSDL (possibly through extensions) will be able to
describe all kinds of jobs that can be submitted. This may include
database queries, control of instruments or web invocations. We did
Posix job type fist as that was the one the majority of people in the
group wanted first. We're currently working on a ParallelApplication
extension for JSDL. We'd be more than happy to see if an extension for
Windows (or any other system) can be done through tweaks to the existing
setup or by adding a new extension.

Could you say how file systems don't map to the Windows world? My naive
assumption was that  you could do it. 

[Marvin] With suitable - and hopefully relatively minor - restrictions
the concepts of mount-point and mount-source might be made to map to the
Windows world.  It's the details that may not map precisely.  In
general, there are concepts in Unix and Windows file systems that don't
map to each other.  For example, there are no symbolic links in NTFS and
there is no notion of fine-grained op-locks in the Unix file system.

2.                  Is JSDL expressive enough to describe all the needs
of a job?  For example, it is unclear how one would specify a
requirement for something like a particular instruction set variation of
the IA86 architecture (e.g. the SSE3 version of the Pentium) or how one
would specify that AMD processors are required rather than Intel ones
(because the optimized libraries and the optimizations generated by the
compiler used will differ for each).  For another example, it is unclear
how one would specify that all the compute nodes used for something like
an MPI job should have the same hardware.

NO. I doubt JSDL is expressive enough in its current state to describe
the needs of all jobs. We're working with the Information model people
in the OGSA group at the moment on this please help! I liked some of
your ideas for this below by the way.

3.                  How will JSDL's normative set of enumeration values
for things like processor architecture and operating system be kept
up-to-date and relevant?  Also, how should things like operating system
version get specified in a normative manner that will enable
interoperability among multiple clients and job scheduling services?
For example, things like Linux and Windows versions are constantly being
introduced, each with potentially significant differences in
capabilities that a job might depend on.  Without a normative way of
specifying these constantly evolving version sets it will be difficult,
if not impossible, to create interoperable job submission clients and
job scheduling services (including meta-scheduling services where
multiple schedulers must interoperate with each other).

Agreed. We don't yet have a way to add to the normative enumerations. I
think you suggest below to move these into a separate document so that
they can be updated more easily - this would seem a good idea. As for OS
versioning I have my ideas though JSDL doesn't have a central plan yet.
Again input here would be appreciated.

4.                  Although JSDL specifies a means of including
additional non-normative elements and attributes in a document,
non-normative extensions make interoperability difficult.  This implies
the need for normative extensions to JSDL beyond the Posix extension
currently described in the 1.0 specification.  Are there plans to define
additional extension profiles to address the above questions surrounding
expressive power and normative descriptions of things like current OS
types and versions?

Yes. The intention with JSDL has always been to produce more normative
extensions post JSDL 1.0.

5.                  If one accepts the need for a variety of extension
profiles then this raises the question of what should be in the base
case.  For example, it could be argued that data staging - with its
attendant aspects such as mount points and mount sources - should be
defined in an extension rather than in the core specification that will
need to cover a variety of systems beyond just Linux/Unix/Posix.
Similarly, one might argue that the base case should focus on what's
functionally necessary to execute a job correctly and should leave
things that are "optimization hints", such as CPU speed and network
bandwidth specifications, to extension profiles.

Personally I'd agree with you that file staging should be in an
extension. Though the view of the group was that most current DRM
systems which would consume JSDL had file staging as a core element. I
also agree on the idea of "optimization hints".

6.                  How are concepts such as IndividualCPUSpeed and
IndividualNetworkBandwidth intended to be defined and used in practice?
I understand the concept of specifying things like the amount of
physical memory or disk space that a job will require in order to be
able to run.  However, CPU speed and network bandwidth don't represent
functional requirements for a job - meaning that a job will correctly
run and produce the same results irrespective of the CPU speed and
network bandwidth available to it.  Also, the current definitions seem
fuzzy: the megahertz number for a CPU does not tell you how fast a given
compute node will be able to execute various kinds of jobs, given all
the various hardware factors that can affect the performance of a
processor (consider the presence/absence of floating point support, the
memory caching architecture, etc.).  Similarly, is network bandwidth
meant to represent the theoretical maximum of a compute node's network
interface card?  Is it expected to take into account the performance of
the switch that the compute node is attached to?  Since switch
performance is partially a function of the pattern of (aggregate)
traffic going through it, the network bandwidth that a job such as an
MPI application can expect to receive will depend on the type of
communications patterns employed by the application.  How should this
aspect of network bandwidth be reflected - if at all - in the network
bandwidth values that a job requests and that compute nodes advertise?

As said above we really need to define this in a separate "profile"
document.

7.                  JSDL is intended for describing the requirements of
a job being submitted for execution.  To enable matchmaking between
submitted jobs and available computational resources there must also be
a way of describing existing/available resources.  While much of JSDL
can be used for this purpose, it is also clear that various extensions
are necessary.  For example, to describe a compute cluster requires that
one be able to specify the resources for each compute node in the
cluster (which may be a heterogeneous lot).  Similarly, to describe a
compute node with multiple network interfaces would require an extension
to the current model, which assumes that only a single instance of such
things can exist.  This raises the question of whether something other
than JSDL is intended to be used for describing available computational
resources or whether there are intensions to extend JSDL to enable it to
describe such resources. 

The writing of a resource description language was something we were
told we couldn't do in the JSDL group. I do agree that it's now
important that we have one. I think we'd need to go back to GGF (or
whatever there name is this week) and ask to set up a group to do this.
Perhaps we could take all the stuff out of JSDL which is appropriate as
a starting point?

[Marvin] Whichever group the work is done in, the HPC profile working
group will need to deal with the matter sooner rather than later (during
this summer, to be precise).  It may be the case that the HPC profile
working group will end up defining a "Basic Resource Description"
specification in the same spirit as BES is a "basic" version of what's
being pursued in the EMS working group.  But that's a personal
speculation thus far.

8.                  The current specification stipulates that conformant
implementations must be able to parse all the elements and attributes
defined in the spec, but doesn't require that any of them be supplied.
Thus, a scheduling service that does nothing could claim to be compliant
as long as it can correctly parse JSDL documents.  For interoperability
purposes, I would argue that the spec should define a minimum set of
elements that any compliant service must be able to supply. Otherwise
clients will not be able to make any assumptions about what they can
specify in a JSDL document and, in particular, client applications that
programmatically submit job submission requests will not be possible
since they can't assume that any valid JSDL document will actually be
acceptable by any given job submission service.

Yes - this is true - though as the current document is a description of
the JSDL "language" this is  correct. These issues should all be
clarified in the profile document.

9.                  I have a number of questions about data staging:

10.  Although the notions of working directory and environment variables
are defined in the posix extension, they are implicitly assuming in the
data staging section of the core specification.  This implies to me that
either (a) data staging is made an extension or (b) these concepts are
made a normative, required part of the core specification.

Hmm - well spotted. Personally as I've said I'd like to see it made into
an extension. This probably need s some discussion on the list.

11.  Recursive directory copying can be specified, but is not required
to be supplied by any job submission service.  This makes it difficult
to write applications that programmatically define their data staging
needs since they cannot in the current design determine whether any
given job submission service implements recursive directory copying.  In
practice this may mean that programmatically generated job submissions
will only ever use lists of individual files to stage.

This is a major problem as many of the systems that are currently
available out there do not support recursive directory copying. Again we
could clarify the use of this through a HPC profile. 

12.  The current definitions of the well-known file systems seem
imprecise to me.  In particular:

13.  What are the navigation rules associated with each?  Can you cd out
of the subtree that each represents?  ROOT almost certainly does not
allow that.  Is there an assumption that one can cd out of HOME or TMP
or SCRATCH?  Hopefully not, since that would make these file systems
even more Unix/Linux-centric, plus one would now need to specify what
clients can expect to see when they do so.

Again not defined here. Though I'd assume we can easily say in the
profile that you can't cd out of it.

14.  What is ROOT intended to be used for?  Are there assumptions about
what resides under root?  Are there assumptions about what an
application can read/write under the ROOT subtree?  (ROOT also seems
like the most Unix-specific of the 4 file system types defined.)

Personally I don't have a use for it. Anyone else?

15.  What are the sharing/consistency semantics of each file system in
situations where a job is a multi-node application running on something
like a cluster?  Is HOME visible to all compute nodes in a
data-consistent manner?  I'm guessing that TMP would be assumed to be
strictly local to each compute node, so that things like MPI
applications would need to be cognizant that they are writing multiple
files to multiple separate storage systems when they write to a file in
TMP - and furthermore that data staging of such files after a job has
run will result in multiple files that all map to the same target file.

Again profile issue.

16.  Can other users write over or delete your data in TMP and/or
SCRATCH?  Is data in these file systems visible to other users or does
each job get its own private TMP and SCRATCH?

Profile.

17.  How long does data in SCRATCH stay around?  Without some normative
definition - or at least a normative lower bound - on data lifetime
clients will have to assume that the data can vanish arbitrarily and
things like multi-job workflows will be very difficult to write if they
try to take advantage of SCRATCH space to avoid unnecessary data staging
actions to/from a computing facility.

Profile.

18.  From an interoperability and programmatic submission point-of-view,
it is important to know which transports any given job submission
service can be expected to support.  This seems like another area where
a normative minimal set that all job submission services must implement
needs to be defined.

This gets very difficult and political! Though we should be able to come
up with a core set for the profile.

Given these questions, as well as the mandate for the HPC profile to
define a simple base interface (that can cover the HPC use case of
submitting jobs to a compute cluster), I would like to present the
following straw man proposal for feedback from this community:

19.              Restructure the JSDL specification as a small core
specification that must be universally implemented - i.e. not just
parsable, but also suppliable by all compliant job submission services -
and a number of optional extension profiles.

Hopefully the language as it stands at the moment (with a few
exceptions) is a good core set. With profiles for different use cases we
could mandate the implemented side too.

20.              Declare concepts such as executable path, command-line
arguments, environment variables, and working directory to be generic
and include them in the core JSDL specification rather than the posix
extension.  This may enable the core specification to support things
like Windows-based jobs (TBD).  The goal here is to define a core JSDL
specification that in-and-of-itself could enable job submission to a
fairly wide range of execution subsystems, including both the
Unix/Linux/Posix world and the Windows world.

Why do these need to be in the core? We had problems before in a
pre-release version when they were in the core as people who wanted to
do database submissions (and other things) were trying to map these into
such elements.

[Marvin] A Windows HPC job is not completely posix-compliant, yet has
overlap on the above-listed set of concepts (and actually many more).
So I would argue that we need something that abstracts out the core
concepts of a traditional HPC job.  Given the presence of file data
staging elements in the core specification - which I would argue are
meaningless for database submissions - it seems like the above-listed
elements are at least as generic as the data staging elements.  

21.              Move data staging to an extension.

22.  Create precise definitions of the various concepts introduced in
the data staging extension, including normative requirements about
whether or not one can change directory up and out of a file system's
root directory, etc.

23.  Define which transports are expected to be implemented by all
compliant services.

Quite possibly - and the use of a profile.

24.              Move the various enumeration types - e.g. for CPU
architecture and OS - to separate specification documents so that they
can evolve without requiring corresponding and constant revision of the
core JSDL specification.

Sounds good. Even better if we can get someone else to update these for
us.

25.              Define extension profiles (eventually, not right away)
that enable richer description of hardware and software requirements,
such as details of the CPU architecture or OS capabilities.  As part of
this, move optimization hints, such as CPU speed and network bandwidth
elements out of the JSDL core and into a separate extension profile.

This should come from the work we are doing with the Information model
people - please join in.

26.              Embrace the issue of how to specify available resources
at an execution subsystem.  Start by defining a base case that allows
the description of compute clusters by creating a compound JSDL document
that consists of an outer element that ties together a sequence of
individual JSDL elements, each of which describes a single compute node
of a compute cluster.  Define an explicit notion of extension profiles
that could define other ways of describing computational resources
beyond just an array of simple JSDL descriptions.

Not entirely sure what you are meaning on this one. Can you explain
further.

[Marvin] I'm basically advocating two things: (a) tackle the problem of
how to describe available resources since it's so closely allied to the
topic of describing required resources, and (b) start with a simple
approach and a means of allowing evolution/extension to support richer
approaches later on.

Now, as presented above, my straw man proposal looks like suggestions
for changes that might go into a JSDL-1.1 or JSDL-2.0 specification.  In
the near-term, the HPC profile working group will be exploring what can
be done with just JSDL-1.0 and restrictions to that specification.  The
restrictions would correspond to disallowing those parts of the JSDL-1.0
specification that the above proposal advocates moving to extension
profiles.  It will also explore whether a restricted version of the
posix extension could be used to cover most common Windows cases.

Marvin.

OK for those who have made it this far - possibly not many. I'm going to
propose a JSDL call on this in a new email so all can see it.

[Marvin] Great idea.  I will try hard to be on that call.  (If you send
me a direct email then that will increase the likelihood since all my
GGF email now goes into one folder and I sometimes miss important ones
in the deluge of all emails.)

steve..

-- 
------------------------------------------------------------------------
Dr A. Stephen McGough                       http://www.doc.ic.ac.uk/~asm
------------------------------------------------------------------------
Technical Coordinator, London e-Science Centre, Imperial College London,
Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK
tel: +44 (0)207-594-8409                        fax: +44 (0)207-581-8024
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/jsdl-wg/attachments/20060609/989be7de/attachment.html