[ogsa-wg] Paper proposing "evolutionary vertical design efforts"

Tue Mar 21 20:32:13 CST 2006

Yes ... in the LSF case they¹re written to stable storage, but there are no
³transactional² semantics in the protocol for submission. That¹s I believe
what this discussion is about (the at most once submission in the protocol),
not whether job submission is a reliable operation or not.

-- Chris

On 21/3/06 18:19, "Ian Foster" <foster at mcs.anl.gov> wrote:

> Marvin:
> 
> I'm sure you are wrong in impying that systems such as LSF do not write
> information about jobs to stable storage. LSF and other similar systems MUST
> be highly reliable so that they can guarantee that jobs will not be lost. I am
> sure that this means that they write some record of each job submitted to
> stable storage. Chris or others will I am sure correct me if I am wrong in
> this assertion.
> 
> That said, I certainly agree that we need input from the scheduler developers.
> We should be careful not to use the word "transactional semantics" as that is
> not what we are talking about.
> 
> Ian.
> 
> At 05:14 PM 3/21/2006 -0800, Marvin Theimer wrote:
> 
>> Hi;
>> 
>>  
>> 
>> I know that systems like LSF get used in high throughput settings where the
>> service time for a job request is an issue.  An example is running very large
>> numbers of relatively short jobs through a compute cluster.  Implementing
>> at-most-once semantics implies doing disk writes to get the necessary
>> persistency.  Im just speculating, but doing those disk writes efficiently
>> enough (e.g. via group commits) to support the throughputs that Ive been told
>> of by, for example, customers in the financial industry is not a trivial
>> design and implementation task.  If my assumption is correct, then this
>> common use case in the HPC world may be one that many, if not most job
>> schedulers would have a hard time supporting if they have to provide
>> at-most-once transactional semantics for all job submissions.
>> 
>>  
>> 
>> So do we exclude that use case and tell the scheduler vendors servicing that
>> market that they need to come up with a separate interaction protocol for
>> that use case?  I would prefer to define a base use case without
>> transactional semantics and immediately also define an extension that
>> provides those semantics. Any scheduler wanting to play in the wider grid
>> world would implement the extension because clients will be looking
>> for/insisting on it.  Any scheduler wanting to provide ultra-high-throughput
>> non-transactional semantics could do so via the base case.  Providing a
>> front-end that implements transactional semantics for a high-throughput
>> scheduler is arguably better and easier than forcing the high
>> throughput-scheduler to implement either a complicated high-performance job
>> metadata repository or an additional separate protocol for the
>> ultra-high-throughput case.
>> 
>>  
>> 
>> Now, of course, it may be the case that its really not that hard to provide
>> transactional semantics for such ultra-high-throughput use cases for all the
>> schedulers we care about; in which case Id be thrilled to agree to include
>> them in the base use case and move on.  This is where Id really like to
>> obtain input/guidance from representatives of those scheduler
>> vendors/suppliers.  If anyone from Platform Computing, Altair, SUN, and the
>> other scheduler vendor/providers is monitoring this email thread then please
>> speak up!
>> 
>>  
>> 
>> Marvin.
>> 
>>  
>> 
> From: Ian Foster [mailto:foster at mcs.anl.gov]
> Sent: Tuesday, March 21, 2006 10:53 AM
> To: Marty Humphrey; Marvin Theimer; 'Carl Kesselman'
> Cc: ogsa-wg at ggf.org; Marvin Theimer
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
>  
> 
> Marty:
> 
> I wasn't trying to be philosophical, just commenting that at-once-submission
> semantics is important. If a client can't be sure that a job is submitted or
> not, clients get very complicated.
> 
> Ian.
> 
> 
> At 01:43 PM 3/21/2006 -0500, Marty Humphrey wrote:
> 
> 
> But this is not so simple. The knee-jerk reaction is to separate these two
> concerns into implementation vs. interface, and develop each one
> independently. But taken to the extreme, a system that appearsto be rich in
> its capabilities might not be so in reality for some time (if EVER!).
> 
>  
> 
> Lets assume that we truly separate these concerns and build sophisticated
> interfaces. But then what about the potential consumer of such services?
> Building an overly complex interface to such a service (without any practical
> implementations behind it) might promote further complicated clients (which
> promotes further complexity upstream&) Build the interface and they will come
> with implementationsis a variation on a theme that doesnt always come true.
> Arguably, complexity is what were trying to get away from.
> 
>  
> 
> And no, Im not advocating only an interface that matches existing
> capabilities. Im just saying that its NOT obvious that the most effective
> approach is to entirely decouple these two concerns.
> 
>  
> 
> -- Marty
> 
>  
> 
> From: Ian Foster [mailto:foster at mcs.anl.gov]
> Sent: Tuesday, March 21, 2006 1:34 PM
> To: Marvin Theimer; Carl Kesselman
> Cc: humphrey at cs.virginia.edu; ogsa-wg at ggf.org; Marvin Theimer
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
>  
> 
> Marvin:
> 
> I think you are mixing two things together: the capabilities of the scheduler
> and the capabilities of the remote submission interface. The proposal that we
> submit at-most-once submission capabilities is a proposal for capabilities in
> the remote submission interface, not the scheduler. I wouldn't expect existing
> schedulers to provide this capability, just as they don't (for the most part)
> support Web Services interfaces. But once we define a Web Services-based
> remote submission interface, at-most-once submission capabilities become
> important.
> 
> Ian.
> 
> 
> At 10:28 AM 3/21/2006 -0800, Marvin Theimer wrote:
> 
> 
> Hi;
> 
>  
> 
> Whereas I agree with you that at-most-once semantics are very desirable, I
> would like to point out that not all existing job schedulers implement them.
> I know that both LSF and CCS (the Microsoft HPC job scheduler) dont.  Ive been
> trying to find out whether PBS and SGE do or dont.
> 
>  
> 
> So, this brings up the following slightly more general question: should the
> simplest base case be the simplest case that does something useful, or should
> it be more complicated than that?  I can see good arguments on both sides:
> 
> ·        Whittling things down to the simplest possible base case maximizes
> the likelihood that parties can participate.  Every feature added represents
> one more feature that some existing system may not be able to support or that
> a new system has to provide even when its not needed in the context of that
> system.  Suppose, for example, that PBS and SGE dont provide transactional
> semantics of the type you described. Then 4 of the 6 most common job
> scheduling systems would not have this feature and would need to somehow add
> it to their implementations. In this particular case it might be too difficult
> to add in practice, but in general there might be problems.
> 
> ·        On the other hand, since there are many clients and arguably far
> fewer server implementations, features that substantially simplify client
> behavior/programming and that are not too onerous to implement in existing and
> future systems should be part of the base case.  The problem, of course, is
> that this is a slippery slope at the end of which lies the number 42 (ignore
> that last phrase if youre not a fan of The Hitchhikers Guide to the Galaxy).
> 
>  
> 
> Personally, the slippery slope argument makes me lean towards defining the
> simplest possible base use case, since otherwise well spend a (potentially
> very) long time arguing about which features are important enough to justify
> being in the base case.  One possible way forward on this issue is to have
> people come up with lists of features that they feel belong in the base use
> case and then we agree to include only those that have a large majority of the
> community arguing for their inclusion in the base case.
> 
>  
> 
> Unfortunately defining what large majorityshould be is also not easy or
> obvious.  Indeed, one can argue that we cant even afford to let all votes be
> equal.  Consider the following hypothetical (and contrived) case: 100 members
> of a particular academic research community show up and vote that the base
> case must include support for a particular complicated scheduling policy and
> the less-than-ten suppliers of existing job schedulers with significant
> numbers of users all vote against it. Should it be included in the base case?
> What happens if the major scheduler vendors/suppliers decide that they cant
> justify implementing it and therefore cant be GGF spec-compliant and therefore
> go off and define their own job scheduling standard?  The hidden issue is, of
> course, whether those voting are representative of the overall HPC user
> population.  I cant personally answer that question, but it does again lead me
> to want to minimize the number of times I have to ask that question i.e. the
> number of features that I have to consider for inclusion in the base case.
> 
>  
> 
> So this brings me to the question of next steps.  Recall that the approach Im
> advocating and that others have bought in to as far as I can tell is that we
> define a base case and the mechanisms and approach to how extensions of the
> base case are done.  I assert that the absolutely most important part of
> defining how extension should work is ensuring that multiple extensions dont
> end up producing a hairball thats impossible to understand, implement, or use.
> In practice this means coming up with a restricted form of extension since
> history is pretty clear on the pitfalls of trying to support arbitrarily
> general extension schemes.
> 
>  
> 
> This is one of the places where identification of common use cases comes in.
> If we define the use cases that we think might actually occur then we can ask
> whether a given approach to extension has a plausible way of achieving all the
> identified use cases.  Of course, future desired use cases might not be
> achievable by the extension schemes we come up with now, but that possibility
> is inevitable given anything less than a fully general extension scheme.
> Indeed, even among the common use cases we identify now, we might discover
> that there are trade-offs where a simpler (and hence probably more
> understandable and easier to implement and use) extension scheme can cover 80%
> of the use cases while a much more complicated scheme is required to cover
> 100% of the use cases.
> 
>  
> 
> Given all this, here are the concrete next steps Id like to propose:
> 
> ·        Everyone who is participating in this design effort should define
> what they feel should be the HPC base use case.  This represents the simplest
> use case and associated features like transactional submit semantics that you
> feel everyone in the HPC grid world must implement.  We will take these use
> case candidates and debate which one to actually settle on.
> 
> ·        Everyone should define the set of HPC use cases that they believe
> might actually occur in practice.  I will refer to these as the common use
> cases, in contrast to the base use case.  The goal here is not to define the
> most general HPC use case, but rather the more restricted use cases that might
> occur in real life.  For example, not all systems will support job migration,
> so whereas a fully general HPC use case would include the notion of job
> migration, I argue that one or more common use cases will not include job
> migration.
> 
> Everyone should also prioritize and rank their common use cases so that we can
> discuss 80/20-style trade-offs concerning which use cases to support with any
> given approach to extension.  Thus prioritization should include the notion of
> how common you think a use case will actually be, and hence how important it
> will be to actually support that use case.
> 
> ·        Everyone should start thinking about what kinds of extension
> approaches they believe we should define, given the base use case and common
> use cases that they have identified.
> 
>  
> 
> As multiple people have pointed out, an exploration of common HPC use cases
> has already been done one or several times before, including in the EMS
> working group.  Im still catching up on reading GGF documents, so I dont know
> how much those prior efforts explored the issue from the point-of-view of base
> case plus extensions.  If these prior explorations did address the topic of
> base-plus-extensions and you agree with the specifics that were arrived at
> then this exercise will be a quick-and-easy one for you: you can simply
> publish the appropriate links to prior material in an email to this mailing
> list.  I will personally be sending in my list independent of prior efforts in
> order to provide a newcomersperspective on the subject.  It will interesting
> to see how much overlap there is.
> 
>  
> 
> One very important point that Id like to raise is the following: Time is short
> and bestis the enemy of good enough.  Microsoft is planning to provide a Web
> services-based interoperability interface to its job scheduler sometime in the
> next year or two.  I know that many of the other job scheduler
> vendors/suppliers are also interested in having an interoperability story in
> place sooner rather than later.  To meet this schedule on the Microsoft side
> will require locking down a first fairly complete draft of whatever design
> will be shipped by essentially the end of August.  That's so that we can do
> all the necessary debugging, interoperability testing, security threat
> modeling, etc. that goes with shipping an actual finished product.  What that
> means for the HPC profile work is that, come the end of August, Microsoft and
> possibly other scheduler vendors/suppliers will need to lock down and start
> coding some version of Web Services-based job scheduling and data transfer
> protocols.  If there is a fairly well-defined, feasible set of specs/profile
> coming out of the GGF HPC working group (for recommendation NOT yet for actual
> standards approval) that has some reasonable level of consensus by then, then
> that's what Microsoft will very likely go with.  Otherwise Microsoft will need
> to defer the idea of shipping anything that might be GGF compliant to version
> 3 of our product, which will probably ship about 4 years from now.
> 
>  
> 
> The chances of coming up with the bestHPC profile by the end of August are
> slim.  The chances of coming up with a fairly simple design that is good
> enoughto cover the most important common cases by means of a relatively
> simple, restricted form of extension seems much more feasible.  Covering a
> richer set of use cases would need to be deferred to a future version of the
> profile, much in the manner that BES has been defined to cover an important
> sub-category of use cases now, with a fuller EMS design being done in parallel
> as future work.  So I would argue that perhaps the most important thing this
> design effort and the planned HPC profile working group that will be set up in
> Tokyo can do is to identify what a good enoughversion 1 HPC profile should be.
> 
>  
> 
> Marvin.
> 
>  
> 
>  
> 
> From: Carl Kesselman [mailto:carl at isi.edu]
> Sent: Thursday, March 16, 2006 12:49 AM
> To: Marvin Theimer
> Cc: humphrey at cs.virginia.edu; ogsa-wg at ggf.org
> Subject: Re: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
>  
> 
> Hi,
> 
> In the interest of furthering agreement, I was not arguing that the
> application had to be restartable. Rather, what has been shown to be important
> is that the protocol be restartable in the following sense:  if you submit a
> job and the far and server fails, is the job running or not, if you resubmit,
> do you get another job instance. The GT sumbission protocol and Condor have a
> transactional semantics so that you can have at most once submit semantics
> reegardless of client and server failures. The fact that your application may
> be non-itempote is exactly why having a well defined semantics in this case is
> important.
> 
> So what is the next step?
> 
> Carl
> 
> Dr. Carl Kesselman                             email:   carl at isi.edu
> USC/Information Sciences Institute        WWW: http://www.isi.edu/~carl
> 4676 Admiralty Way, Suite 1001          Phone:  (310) 448-9338
> Marina del Rey, CA 90292-6695           Fax:      (310) 823-6714
> 
> 
> 
> -----Original Message-----
> From: Marvin Theimer <theimer at microsoft.com>
> To: Carl Kesselman <carl at isi.edu>
> CC: Marvin Theimer <theimer at microsoft.com>; Marty Humphrey
> <humphrey at cs.virginia.edu>; ogsa-wg at ggf.org <ogsa-wg at ggf.org>
> Sent: Wed Mar 15 14:26:36 2006
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design  efforts"
> 
> Hi;
> 
> 
> 
> I suspect that were mostly in agreement on things.  In particular, I think
> your list of four core aspects is a great starting point for a discussion on
> the topic.
> 
> 
> 
> I just replied to an earlier email from Ravi with a description of what Im
> hoping to get out of examining various HPC use cases:
> 
> ·        Identification of the simplest base case that everyone will have to
> implement.
> 
> ·        Identification of common cases we want to optimize.
> 
> ·        Identification of how evolution and selective extension will work.
> 
> 
> 
> I totally agree with you that the base use case I described isnt really a
> griduse case.  But it is an HPC use case in fact it is arguably the most
> common use case in current existence. J  So I think its important that we
> understand how to seamlessly integrate and support that common and very simple
> use case.
> 
> 
> 
> I also totally agree with you that we cant let a solution to the simplest HPC
> use case paint us into a corner that prevents supporting the richer use cases
> that grid computing is all about.  Thats why Id like to spend significant
> effort exploring and understanding the issues of how to support evolution and
> selective extension.  In an ideal world a legacy compute cluster job scheduler
> could have a simple grid shimthat let it participate at a basic level, in a
> natural manner, in a grid environment, while smarter clients and HPC services
> could interoperate with each other in various selectively richer manners by
> means of extensions to the basic HPC grid design.
> 
> 
> 
> One place where I disagree with you is your assertion that everything needs to
> be designed to be restartable.  While thats a good goal to pursue Im not
> convinced that you can achieve it in all cases.  In particular, there are at
> least two cases that I claim we want to support that arent restartable:
> 
> ·        We want to be able to run applications that arent restartable; for
> example, because they perform non-idempotent operations on the external
> physical environment.  If such an application fails during execution then the
> only one who can figure out what the proper next steps are is the end user.
> 
> ·        We want to be able to include (often-times legacy) systems that arent
> fault tolerant, such as simple small compute clusters where the owners didnt
> think that fault tolerance was worth paying for.
> 
> Of course any acceptable design will have to enable systems that are fault
> tolerant to export/expose that capability.  To my mind its more a matter of
> ensuring that non-fault-tolerant systems arent excluded from participation in
> a grid.
> 
> 
> 
> Other things we agree on:
> 
> ·        We should certainly examine what remote job submission systems do.
> We should certainly look at existing systems like Globus, Unicore, and Legion.
> In general, we should be looking at everything that has any actual experience
> that we can learn from and everything that is actually deployed and hence
> represents a system that we potentially need to interoperate with. (Whether a
> final design is actually able to interoperate at any but the most basic level
> with various exotic existing systems is a separate issue.)
> 
> ·        We should absolutely focus on codifying what we know how to do and
> avoid doing research as part of a standards process.  I believe that thinking
> carefully about how to support evolution and extension is our best hope for
> allowing people to defer trying to bake their pet research topic into
> standards since it provides a story for why todays standards dont preclude
> tomorrows improvements.
> 
> 
> 
> So I would propose that next steps are:
> 
> ·        Continue to explore and classify various HPC use cases of various
> differing levels of complexity.
> 
> ·        Describe the requirements and limitations of existing job scheduling
> and remote job submission systems.
> 
> ·        Continue identifying and discussing key featuresof use cases and
> potential design solutions, such as the four that you identified in your last
> email.
> 
> 
> 
> Marvin.
> 
> 
> 
> ________________________________
> 
> From: Carl Kesselman [mailto:carl at isi.edu]
> Sent: Tuesday, March 14, 2006 7:50 AM
> To: Marty Humphrey; ogsa-wg at ggf.org
> Cc: Marvin Theimer
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
> 
> 
> Hi,
> 
> 
> 
> Just to be clear, Im not trying to suggest that the scope be expanded. I agree
> with the approach of focusing on a baby step is a good one, and many of the
> assumptions stated in Marvins list I am in total agreement with. However, in
> taking baby steps I think that it is important that we end up walking, and
> that in defining the use case, one can easily create solutions that will not
> get you to the next step. This is my point about looking at what we know how
> to do and have been doing in production settings for many years now. In my
> mind, one of the scope grandness problems has been that there has been far too
> little focus on codifying what we know how to do in favor of using a standards
> process as an excuse to design new things.  So at the risk of sounding
> partisan, the simplified use case that Marvin is proposing is exactly the use
> case that GRAM has been doing for over ten years now (I think the same can be
> said about UNICORE and Legion).
> 
> 
> 
> So let me try to be  constructive.  One of the things that falls out of
> Marvins list could be a set of basic concepts/operations that need to be
> defined.  These include:
> 
> 1) A way of describing localjob configuration, i.e. where to find the
> executable, data files, etc. This should be very conservative with its
> assumptions on shared file systems and accessibility. In general, what needs
> to be stated here are what are the underlying aspects of the underlying
> resource that are exposed to the outward facing interface.
> 
> 2) A way of naming a submission point (should probably have a way of modeling
> queues).
> 
> 3) A core set of job management operations, submit, status, kill. These need
> to be defined in such a way at to be tolerate to a variety of failure
> scenarios, in that the state needs to be well defined in the case of failure.
> 
> 4) A state model that one can use to describe what is going on with the jobs
> and a way to access that state.  Can be simple (queued, running, done), may
> need to be extensible.  One can view the accounting information as being
> exposed
> 
> 
> 
> So, one thing to do would be to agree that these are (or are not) the right
> four things that need to be defined and if so, start to flesh out these in a
> way that supports the core use case but doesnt introduce assumptions that
> would preclude more complex use cases in the future.
> 
> 
> 
> 
> 
> Carl
> 
> 
> 
> ________________________________
> 
> From: owner-ogsa-wg at ggf.org [mailto:owner-ogsa-wg at ggf.org] On Behalf Of Marty
> Humphrey
> Sent: Tuesday, March 14, 2006 6:32 AM
> To: ogsa-wg at ggf.org
> Cc: 'Marvin Theimer'
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
> 
> 
> Carl,
> 
> 
> 
> Your comments are very important. We would love to have your active
> participation in this effort. Your experience is, of course, matched by few!
> 
> 
> 
> I re-emphasize that this represents (my words, not anyone elses) baby
> stepsthat are necessary and important for the Grid community.  In my opinion,
> the biggest challenge will be to fight the urge to expand the scope beyond a
> small size. You cannot ignore the possibility that the GGF has NOT made as
> much progress as it should have to date. Furthermore, one such plausible
> explanation is that the scope is too grand.
> 
> 
> 
> -- Marty
> 
> 
> 
> 
> 
> ________________________________
> 
> From: owner-ogsa-wg at ggf.org [mailto:owner-ogsa-wg at ggf.org] On Behalf Of Carl
> Kesselman
> Sent: Tuesday, March 14, 2006 8:47 AM
> To: Marvin Theimer; Ian Foster; ogsa-wg at ggf.org
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
> 
> 
> Hi,
> 
> 
> 
> While I have no wish to engage in the what is a Gridargument, there are some
> elements of your base use case that I would be concerned about.  Specifically,
> the assumption that the submission in into a local clusteron which there is an
> existing account may lead one to a solution that may not generalize to the
> solution to the case of submission across autonomous policy domains.  I would
> also argue that ignoring issues of fault tolerance from the beginning is also
> problematic.  One must at least design operations that are restartable (for
> example at most once submission semantics).
> 
> 
> 
> I would finally suggest that while examining existing job schedule systems is
> a good thing to do, we should also examine existing remote submission systems
> (dare I say Grid systems).  The basic HPC use case is one in which there is a
> significant amount implementation and usage experience.
> 
> 
> 
> Thanks,
> 
> 
> Carl
> 
> 
> 
> 
> 
> ________________________________
> 
> From: owner-ogsa-wg at ggf.org [mailto:owner-ogsa-wg at ggf.org] On Behalf Of Marvin
> Theimer
> Sent: Monday, March 13, 2006 2:42 PM
> To: Ian Foster; ogsa-wg at ggf.org
> Cc: Marvin Theimer
> Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
> 
> 
> Hi;
> 
> 
> 
> Ian, you are correct that I view job submission to a cluster as being one of
> the simplest, and hence most basic, HPC use cases to start with. Or, to be
> slightly more general, I view job submission to a black boxthat can run jobs
> be it a cluster or an SMP or an SGI NUMA machine or what-have-you as being the
> simplest and hence most basic HPC use case to start with.  The key distinction
> for me is that the internals of the boxare for the most part not visible to
> the client, at least as far as submitting and running compute jobs is
> concerned.  There may well be a separate interface for dealing with things
> like system management, but I want to explicitly separate those things out in
> order to allow for use of boxesthat might be managed by proprietary means or
> by means obeying standards that a particular job submission client is
> unfamiliar with.
> 
> 
> 
> I think the use case that Ravi Subramaniam posted to this mailing list back on
> 2/17 is a good one to start a discussion around.  However, Id like to present
> it from a different point-of-view than he did. The manner in which the use
> case is currently presented emphasizes all the capabilities and services
> needed to handle the fully general case of submitting a batch job to a
> computing utility/service.  Thats a great way of producing a taxonomy against
> which any given system or design can be compared to see what it has to offer.
> I would argue that the next step is to ask whats the simplest subset that
> represents a useful system/design and how should one categorize the various
> capabilities and services he has identified so as to arrive at meaningful
> components that can be selectively used to obtain progressively more capable
> systems.
> 
> 
> 
> Another useful exercise to do is to examine existing job scheduling systems in
> order to understand what they provide.  Since in the real world we will have
> to deal with the legacy of existing systems it will be important to understand
> how they relate to the use cases we explore.  In the same vein, it will be
> important to take into account and understand other existing infrastructures
> that people use that are related to HPC use cases.  Im thinking of things like
> security infrastructures, directory services, and so forth.  From the
> point-of-view of managing complexity and reducing total-cost-of-ownership, it
> will be important to understand the extent to which existing infrastructure
> and services can be reused rather than reinvented.
> 
> 
> 
> To kick off a discussion around the topic of a minimalist HPC use case, I
> present a straw man description of such below and then present a first attempt
> at categorizing various areas of extension.  The categorization of extension
> areas is not meant to be complete or even all that carefully thought-out as
> far as componentization boundaries are concerned; it is merely meant to be a
> first contribution to get the discussion going.
> 
> 
> 
> A basic HPC use case: Compute cluster embedded within an organization.
> 
> ·     This is your basic batch job scheduling scenario.  Only a very basic
> state transition diagram is visible to the client, with the following states
> for a job: queued, running, finished.  Additional states -- and associated
> state transition request operations and functionality -- are not supported.
> Examples of additional states and associated functionality include suspension
> of jobs and migration of jobs.
> 
> ·     Only "standard" resources can be described, for example: number of
> cpus/nodes needed, memory requirements, disk requirements, etc.  (think
> resources that are describable by JSDL).
> 
> ·     Once a job has been submitted it can be cancelled, but its resource
> requests can't be modified.
> 
> ·     A distributed file system is accessible from client desktop machines and
> client file servers, as well as compute nodes of the compute cluster.  This
> implies that no data staging is required, that programs can be (for the most
> part) executed from existing file system locations, and that no program
> "provisioning" is required (since you can execute them from wherever they are
> already installed).  Thus in this use case all data transfer and program
> installation operations are the responsibility of the user.
> 
> ·     Users already have accounts within the existing security infrastructure
> (e.g. Kerberos).  They would like to use these and not have to create/manage
> additional authentication/authorization credentials (at least at the level
> that is visible to them).
> 
> ·     The job scheduling service resides at a well-known network name and it
> is aware of the compute cluster and its resources by "private" means (e.g. it
> runs on the head node of the cluster and employs private means to monitor and
> control the resources of the cluster).  This implies that there is no need for
> any sort of directory services for finding the compute cluster or the
> resources it represents other than basic DNS.
> 
> ·     Compute cluster system management is opaque to users and is the concern
> of the compute cluster's owners.  This implies that system management is not
> part of the compute cluster's public job scheduling interface.  This also
> implies that there is no need for a logging interface to the service.  I
> assume that application-level logging can be done by means of libraries that
> write to client files; i.e. that there is no need for any sort of special
> system support for logging.
> 
> ·     A simple polling-based interface is the simplest form of interface to
> something like a job scheduling service.  However, a simple call-back
> notification interface is a very useful addition that potentially provides
> substantial performance benefits since it can enable the avoidance of lots of
> unnecessary network traffic.  Only job state changes result in notification
> messages.
> 
> ·     There are no notions of fault tolerance. Jobs that fail must be
> resubmitted by the client.  Neither the cluster head node nor its compute
> nodes are fault tolerant.  I do expect the client software to return an
> indication of failure-due-system-fault when appropriate.  (Note that this may
> also occur when things like network partitions occur.)
> 
> ·     One does need some notion of how to deal with orphaned resources and 
> jobs.  The notion of job lifetime and post-expiration garbage collection is a 
> natural approach here.
> 
> ·     The scheduling service provides a fixed set of scheduling policies, with 
> only a few basic choices (or maybe even just one), such as FIFO or 
> round-robin.  There is no notion, in general, of SLAs (which are a form of 
> scheduling policy).
> 
> ·     Enough information must be returned to the client when a job finishes to 
> enable basic accounting functionality.  This means things like total 
> wall-clock time the job ran and a summary of resources used.  There is not a 
> need for the interface to support any sort of grouping of accounting 
> information.  That is, jobs do not need to be associated with projects, 
> groups, or other accounting entities and the job scheduling service is not 
> responsible for tracking accounting information across such entities.  As long 
> as basic resource utilization information is returnable for each job, 
> accounting can be done externally to the job scheduling service.  I do assume 
> that jobs can be uniquely identified by some means and can be uniquely 
> associated with some principal entity existing in the overall system, such as 
> a user name.
> 
> ·     Just as there is no notion of requiring the job scheduling service to 
> track any but the most basic job-level accounting information, there is no 
> notion of the service enforcing quotas on jobs.
> 
> ·     Although it is generally useful to separate the notions of resource 
> reservation from resource usage (e.g. to enable interactive and debugging use 
> of resources), it is not a necessity for the most basic of job scheduling 
> services. 
> 
> ·     There is no notion of tying multiple jobs together, either to support 
> things like dependency graphs or to support things like workflows.  Such 
> capabilities must be implemented by clients of the job scheduling service.
> 
> 
> 
> Interesting extension areas:
> 
> ·      Additional scheduling policies
> 
> o     Weighted fair-share, &
> 
> o     Multiple queues
> 
> o     SLAs
> 
> o     ...
> 
> ·      Extended resource descriptions
> 
> o     Additional resource types, such as GPUs
> 
> o     Additional types of compute resources, such as desktop computers
> 
> o     Condor-style class ads
> 
> ·      Extended job descriptions (as returned to requesting clients and sys 
> admins)
> 
> ·      Additional classes of security credentials
> 
> ·      Reservations separated from execution
> 
> o     Enabling interactive and debugging jobs
> 
> o     Support for multiple competing schedulers (incl. desktop cycle stealing 
> and market-based approaches to scheduling compute resources)
> 
> ·      Ability to modify jobs during their existence
> 
> ·      Fault tolerance
> 
> o     Automatic rescheduling of jobs that failed due to system faults
> 
> o     Highly available resources:  This is partly a policy statement by a 
> scheduling service about its characteristics and partly the ability to rebind 
> clients to migrated service endpoints
> 
> ·      Extended state transition diagrams and associated functionalities
> 
> o     Job suspension
> 
> o     Job migration
> 
> o     &
> 
> ·      Accounting & quotas
> 
> ·      Operating on arrays of jobs
> 
> ·      Meta-schedulers, multiple schedulers, and ecologies and hierarchies of 
> multiple schedulers
> 
> o     Meta-schedulers
> 
> ·      Hierarchical job scheduling with a meta-scheduler as the only entry 
> point; forwarding jobs to the meta-scheduler from other subsidiary schedulers
> 
> o     Condor-style matchmaking
> 
> ·      Directory services
> 
> o     Using existing directory services
> 
> o     Abstract directory service interface(s)
> 
> ·      Data transfer topics
> 
> o     Application data staging
> 
> ·      Naming
> 
> ·      Efficiency
> 
> ·      Convenience
> 
> ·      Cleanup
> 
> o     Program staging/provisioning
> 
> ·      Description
> 
> ·      Installation
> 
> ·      Cleanup
> 
> 
> 
> 
> 
> Marvin.
> 
> 
> 
> ________________________________
> 
> From: Ian Foster [mailto:foster at mcs.anl.gov]
> Sent: Monday, February 20, 2006 9:20 AM
> To: Marvin Theimer; ogsa-wg at ggf.org
> Cc: Marvin Theimer; Savas Parastatidis; Tony Hey; Marty Humphrey; 
> gcf at grids.ucs.indiana.edu
> Subject: Re: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"
> 
> 
> 
> Dear All:
> 
> The most important thing to understand at this point (IMHO) is the scope of 
> this "HPC use case," as this will determine just how minimal we can be.
> 
> I get the impression that the principal goal may be "job submission to a 
> cluster." Is that correct? How do we start to circumscribe the scope more 
> explicitly?
> 
> Ian.
> 
> 
> 
> At 05:45 AM 2/16/2006 -0800, Marvin Theimer wrote:
> 
> Enclosed is a paper that advocates an additional set of activities that the 
> authors believe that the OGSA working groups should engage in.
> 
> 
> 
> Broadly speaking, the OGSA and related working groups are already doing a 
> bunch of important things:
> 
> ·         There is broad exploration of the big picture, including enumeration 
> of use cases, taxonomy of areas, identification of research issues, etc.
> 
> ·         There is work going on in each of the horizontal areas that have 
> been identified, such as EMS, data services, etc.
> 
> ·         There is working going around individual specifications, such as 
> BES, JSDL, etc.
> 
> 
> 
> Given that individual specifications are beginning to come to fruition, the 
> authors believe it is time to also start defining vertical profilesthat 
> precisely describe how groups of individual specifications should be employed 
> to implement specific use cases in an interoperable manner.  The authors also 
> believe that the process of defining these profiles offers an opportunity to 
> close the design loopby relating the various on-going protocol and standards 
> efforts back to the use cases in a very concrete manner.  This provides an 
> end-to-end setting in which to identify holes and issues that might require 
> additional protocols and/or (incremental) changes to existing protocols.  The 
> paper introduces both the general notion of doing focused vertical design 
> effortsand then focuses on a specific vertical design effort, namely a minimal 
> HPC design. 
> 
> 
> 
> The paper derives a specific HPC design in a first principlesmanner since the 
> authors believe that this increases the chances of identifying issues.  As a 
> consequence, existing specifications and the activities of existing working 
> groups are not mentioned and this paper is not an attempt to actually define a 
> specifications profile.  Also, the absence of references to existing work is 
> not meant to imply that such work is in any way irrelevant or inappropriate.  
> The paper should be viewed as a first abstract attempt to propose a new kind 
> of activity within OGSA.  The expectation is that future open discussions and 
> publications will explore the concrete details of such a proposal.
> 
> 
> 
> This paper was recently sent to a few key individuals in order to get feedback 
> from them before submitting it to the wider GGF community. Unfortunately that 
> process took longer than intended and some members of the community may have 
> already seen a copy of the paper without knowing the context within it was 
> written.  This email should hopefully dispel any misconceptions that may have 
> occurred.
> 
> 
> 
> For those people who will be around on for the F2F meetings on Friday, Marvin 
> Theimer will be giving a talk on the contents of this paper at a time and 
> place to be announced.
> 
> 
> 
> Marvin Theimer, Savas Parastatidis, Tony Hey, Marty Humphrey, Geoffrey Fox
> 
> 
> 
> _______________________________________________________________
> Ian Foster                   www.mcs.anl.gov/~foster 
> <http://www.mcs.anl.gov/~foster> 
> Math & Computer Science Div.  Dept of Computer Science
> Argonne National Laboratory   The University of Chicago   
> Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
> Tel: 630 252 4619            Fax: 630 252 1997
>         Globus Alliance, www.globus.org <http://www.globus.org/> 
> <http://www.globus.org/>
> 
> _______________________________________________________________
> Ian Foster                    www.mcs.anl.gov/~foster 
> <http://www.mcs.anl.gov/~foster> 
> Math & Computer Science Div.  Dept of Computer Science
> Argonne National Laboratory   The University of Chicago    
> Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
> Tel: 630 252 4619             Fax: 630 252 1997
>         Globus Alliance, www.globus.org <http://www.globus.org/> 
> 
> _______________________________________________________________
> Ian Foster                    www.mcs.anl.gov/~foster 
> <http://www.mcs.anl.gov/~foster> 
> Math & Computer Science Div.  Dept of Computer Science
> Argonne National Laboratory   The University of Chicago    
> Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
> Tel: 630 252 4619             Fax: 630 252 1997
>         Globus Alliance, www.globus.org <http://www.globus.org/> 
> _______________________________________________________________
> Ian Foster                    www.mcs.anl.gov/~foster 
> <http://www.mcs.anl.gov/~foster> 
> Math & Computer Science Div.  Dept of Computer Science
> Argonne National Laboratory   The University of Chicago    
> Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
> Tel: 630 252 4619             Fax: 630 252 1997
>         Globus Alliance, www.globus.org <http://www.globus.org/> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/ogsa-wg/attachments/20060321/05a6930d/attachment.html