[ogsa-bes-wg] Questions and potential changes to BES, as seen from HPC Profile point-of-view

Mon Jun 5 18:19:10 CDT 2006

Hi;

Coming from the point-of-view of the HPC Profile working group, I have
several questions about BES (including recent discussions on the mailing
list), as well as some straw man thoughts about how BES should relate to
the HPC profile spec.

Based on the BES-1.3 spec that Andrew Grimshaw recently sent out, at an
abstract level, there seem to be the following aspects to BES:

*        A core set of operations around activities:

*        CreateActivityFromJSDL

*        GetActivityStatus

*        RequestActivityStateChange

*        GetActivityJSDLDocuments

*        A set of BES factory-specific system management operations and
resource properties (RPs):

*        StartAcceptingNewActivities

*        StopAcceptingNewActivities

*        IsAcceptingNewActivities RP

*        Support for notifications.

*        Support for various resource properties (or their equivalent in
a non-WSRF version) having to do with an information model for
describing various things about a BES factory, the associated container
it represents, and any activities it is currently running.

*        An extensible activity state model.

Things explicitly NOT in the BES specification are:

*        Generic system management interface.

*        Security design.

*        Interface for directly controlling/manipulating an activity
once it has been created.

Things that used to be in the BES spec but now seem to be extensions
(please correct me if I'm wrong here!):

*        Data staging

*        Suspension

I have the following questions about BES and the various discussions
that have recently occurred (including the ESI integration):

*        Extensibility:

*        Given that BES has bought into the notion of an extensible
activity state diagram, it needs to also normatively define how clients
can learn of the extensions that a given BES service supports.  Is that
something that will be added to the BES specification?  Or will the
specification point to some other place where notions of extensibility
are defined more generically?  (Personally, I'd vote for the former
approach.)

*        Is the "base case" for BES now fig.2, which shows states of
{new, pending, running, canceled, failed, finished}?

*        Previously included states, such as Execution-Pending, will
presumably be defined in suitable extension profiles?

*        Assuming that data staging and suspension are now extensions to
the base BES spec, will they be defined as such in an appendix of the
spec, or as a separate extension profile?

*        The original BES spec describes a fairly sophisticated data
staging design that supports parallelism.  Is there any interest in
defining a second, simpler data staging extension that avoids the
complexity of the parallelism support?

*        Will the suspension extension be the simple one that is
currently presented in sec. 4 as an example?  Or do people feel that a
more complicated version, such as the ESI one is necessary/important?
Can/should we define both?

*        Given that suspension is no longer in the base design,
presumably the createInSuspendedState parameter to
CreateActivityFromJSDL should disappear?

*        RequestActivityStateChange: I believe this operation will pose
challenges in an extensible design.  The current design is imperative by
nature: it specifies an explicit state to move an activity to.  However,
a client who does not know of all the extensions that a BES service
implements may not know how to pick the appropriate state to transition
to.  It seems better to introduce a more declarative approach in which
clients specify "actions" they wish to occur, such as 'CancelActivity'.
This approach would allow the BES service to make the appropriate state
transition in response to a desired action requested by a client.

*        Information model:

*        JSDL seems to inherently be focused on describing a single job
or a single computational resource.  For example, it has no notion of
describing all the differing compute nodes of a (heterogeneous) compute
cluster.  By incorporating JSDL elements into the BES information model
it seems that BES is foreclosing the ability to describe things like
compute clusters.  This issue also effects what can get returned from
GetActivityJSDLDocuments.  If I'm wrong about this, then it seems like
it would be worth having an explicit explanation about how to achieve
this functionality somewhere in the specification.

*        The BES information model now includes various posix-specific
elements of JSDL.  How would other systems - such as a Windows system -
be described?

*        The spec requires that all BES services "support" all the
various attributes listed in sec. 5, but they don't have to implement
them.  What exactly does that mean?  For example, if a JSDL doc
specifies a CPU-Speed requirement and a particular BES service doesn't
implement it (meaning it doesn't keep track of it), then does the
associated CreateActivityFromJSDL request have to fail?  If so, then do
clients have to figure out what the minimal set of implemented
attributes are in a system and then only use those in job descriptions?
Is there is a notion of "optional" attributes that can be ignored, that
specify desired attribute values rather than required ones?  

*        Is there any notion of specifying that all compute nodes should
have the same value for some attribute (e.g. CPU architecture, CPU
speed, NIC card)?  This seems to be missing from the JSDL specification,
but seems very important for BES if it is to support things like compute
clusters.

*        Some of the elements seem either incompletely specified, have
definitions that are open to multiple interpretations, or have
definitions that would be very difficult to implement in practice.  In
particular:

*        CPU architecture seems like it can't describe all the
variations - let alone all the peripherals such as GPUs - that a
computing resource might have (let alone a cluster).

*        CPU speed seems like the tip of an iceberg having to do with
characterizing the performance of a system, which will depend on all
manner of things like details of the processor chip used, cache sizes,
bus used, etc.

*        Network bandwidth: is this the theoretical maximum of the NIC
on a compute node or is it the current bandwidth actually available in a
(shared) system?  Note that the latter is difficult to measure in a
practically useful way.  Note also that network bandwidth only describes
one aspect of communications performance and that several others are
arguably equally important (e.g. latency).

All this leads to the question of whether BES will have a notion of
extending the information model that is supplied. If so, then that leads
to the question of what the base case should be and whether it should
include a smaller set of things than is currently listed in the spec.

Are there any plans to tighten the definitions of some of the more vague
information elements?  (I guess this really is an issue more for the
JSDL WG than for BES.)

*        GetActivityJSDLDocuments returns a JSDL document for each
specified activity.  Is this sufficient to capture the entire
"provenance" for what has happened to the activity?  In particular,
would it be sufficient to allow someone to (a) run the same activity on
another BES service (assuming same hardware and software) and get the
same results and (b) debug what has happened to an errant activity?  I
would argue that both capabilities have proven to be important in actual
systems.

*        System management operations:

*        Currently BES supports 2 specific system management operations:
Start and stop activities commands.  Most schedulers support a variety
of scheduling-specific system management operations and I'm wondering
why these two operations were singled out in particular to be part of
the base case?

*        These operations seem to require a different set of
authorization credentials than the other interface operations since they
should be invoked by system administrators rather than random users.
How will that work, given that these operations are in the same WSDL as
the other operations?  Wouldn't this argue for moving these operations
to a separate system management interface?

*        Array operations:

*        Currently one can create a single activity, but all other
operations accept an array of AEDs as input.  Was there some reason why
an array creation operation wasn't included so that, for example,
parameter sweep applications can be created with a single request
instead of N requests (where N can be in the thousands)?

*        Given that BES seems to have bought into the notion of
extensibility, should the base case be a "non-array" one?  For example,
currently if you want to handle a fault for a RequestActivityStateChange
operation on a single activity you need to look inside the returned
array of results to see if a fault infoset was returned.  All the
exception handling machinery that modern tooling provides can't get used
because RequestActivityStateChange never returns an actual fault message
(as compared to a fault infoset for the appropriate array elements that
are returned.

*        Other questions:

*        An entire (small) section is devoted to talking about the
optional use of WS-Names.  However, since the specification doesn't
require them, it's unclear to me whether BES needs to say anything about
WS-Names.  As far as I understand things, whether an EPR is a WS-Name or
not can be determined by inspecting it.  Hence the only reason to have a
special property on a BES service that indicates what kind of AEDs it
returns is to alert potential clients ahead of time about this feature
of the service.  But it's not clear to me what a client would do with
that information, as compared to deciding opportunistically to exploit a
WS-Name AED for, e.g. resolution, at the time that that would be
necessary.  Is there a use case that describes how clients would exploit
the AED-type resource property?

*        Since JSDL documents are self-describing, a BES service can
figure out by inspection whether the job description infoset parameter
to CreateActivityFromJSDL is JSDL or something else.  This would seem to
imply that naming the operation CreateActivity would lose no information
and would allow for transparent extension to other job description
infoset simply by using them (assuming they are self-describing).

*        Container attributes that I have questions about:

*        LocalResourceManagerType: where do these get defined
normatively?

*        Job Credential Service and File Credential Service:  these
imply a specific security model.  Given that security is undefined in
the BES spec, is this appropriate - especially given the rather vague
definition of both?

Given these questions, as well as the mandate for the HPC profile to
define a simple base interface, I would like to present the following
straw man proposal for a modified BES specification for feedback from
this community:

*        Operations:

*        CreateActivity(jsdlDoc) --> EPR

*        GetActivity(EPR) --> activityState

*        GetActivityProvenance(EPR) --> either JSDL doc (if that can
describe all the necessary provenance info) or JSDL+

*        CancelActivity(EPR)

*        For non-WSRF versions: QueryResources() -->
schedulerResourcesInfoset

*        'schedulerResourcesInfoset' is essentially the union of the RPs
that would be exported in a WSRF-based version for describing the
resources that are available for use at this BES service.  Note that a
BES service might also want to expose other kinds of information that
would not be returned from this operation - this operation is there so
that clients can determine whether or not a BES service could
potentially meet their needs and is necessary for meta-scheduling
scenarios.

*        One might argue that one could use WS-Transfer for this
operation.  However, since a BES service might want to export other
kinds of information, this would require an extra level of indirection
so that the BES service could expose which EPRs to use for retrieving
which kinds of information.

*        Additional topics/summary:

*        Simple state diagram and no notion of array operations, data
staging, suspension, or notifications in base BES case.

*        Extensions defined as separate profiles for array operations,
data staging, suspension, and notifications.

*        RequestActivityStateChange replaced by operations specifying
desired actions rather than states.  Base case supports activity
cancellation; extensions can define additional operations (e.g.
SuspendActivity).

*        Information model: small base set plus extensions model (which
ones to include in the base set TBD)

*        All system management functions moved out to a separate
interface.

Thanks for any and all feedback on these questions and this straw man
proposal,

Marvin.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/ogsa-bes-wg/attachments/20060605/bbc4d17f/attachment.html