[Pgi-wg] OGF PGI Session 2 and 3 at OGF 33 on Wednesday 21 September - Draft Minutes

Wed Sep 21 13:11:10 CDT 2011

Steve,

Concerning OGF PGI Session 2 and 3 at OGF 33 on Wednesday 21 September :

Thank you very much for your 'Draft Minutes' :  They are readable, 
understandable and quite accurate.

I suggest following improvements for the beginning of Session 3 :

Can you replace  "If cancelled -> [end], it's purged." by
"[end] means purged.  Add failed -> pending (automatic resubmission if 
requested inside JSDL)."

Thank you in advance.

Best regards

-----------------------------------------------------
Etienne URBAH         LAL, Univ Paris-Sud, IN2P3/CNRS
                       Bat 200   91898 ORSAY    France
Tel: +33 1 64 46 84 87      Skype: etienne.urbah
Mob: +33 6 22 30 53 27      mailto:urbah at lal.in2p3.fr
-----------------------------------------------------

On Wed, 21/09/2011 15:28, Steve Crouch wrote:
> 14:00-13:30 PGI Session 3
> -------------------------
>
> Chair: Andrew Grimshaw
> Minutes: Steve Crouch
>
> Attendance:
>
> Name shortcuts:
>    AG - Andrew Grimshaw
>    EU - Etiennce Urbah
>    DK - Daniel Katz
>    OS - Oxana Smirnov
>    JPN - JP Navarro
>    KS - Katsushige Saga
>    SC - Steve Crouch
>    DM - David Meredith
>
> New actions:
>
> Minutes
>
> Discussion of state model - how can we accommodate additional states?
>
> EU: doc number is 16306 under OGSA-BES WG on GridForge.
>
> [Page 31, fig 6]
>
> EU: take same states as original BES, add in transitions. Add purge
> finished ->  [end], add failure transition pending ->  failed.
Next line to be replaced :
> If cancelled ->  [end], it's purged.
[end] means purged.  Add failed -> pending (automatic resubmission if 
requested inside JSDL).
>
> AG: this is change in underlying state model.
>
> EU: could drop it, it exists on one system. If it doesn't fit, we drop it.
>
> EU: next diagram for held states [fig 7, pg 34]. Based on original BES,
> only added some transitions.
>
> AG: if specified to go into suspended state initially in JSDL, it starts
> in suspend until told to resume ('proceeding'). Transitions from
> suspended/proceeding to both cancelled and failed. Still need to add a
> transition. Automatic resubmission perhaps not put in.
>
> AG: other one is extended job state [referring to session slide 3] from
> RENKEI.
>
> KS: EU covered this in more detail yesterday. (Discussion of incoming
> queues for pre-processing).
>
> JPN: other terms meaning same thing: pre-processing:pending,
> pre-processing:complete.
>
> AG: are we trying to go to a different state model? We'e handled this as
> specified in the spec, which has substates dealing with staging in and
> out. e.g. staging-in is pre-processing state. Also running:executing is
> a substate. Didn't we want to stay with the basic state model and extend
> via substates? This state model is extended.
>
> EU: not compatible with basic state model?
>
> KS: mapping exists from this to basic state model substates [next slide].
>
> AG: good profile some of these substates - we need to know what these
> substates actually mean. Are these proposed JSDL extensions?
>
> KS: this implementation is just for testing. We don't care deeply about
> extensions.
>
> AG: you name a substate where you want it to stop?
>
> KS: yes.
>
> AG: thought something similar to EU's idea. In original state model, two
> place for helds (need pre and post): in running (as EU did), in pending,
> and terminated held/finished held failed/held. But this is subsumed into
> EU's.
>     Do we want to give advice for how to order additional substates? Do
> we recommend these in proceeding or suspended? If we use substates as in
> data staging, execution in running where to put the helds?
>
> SM: better to have a named state for data staging pre/post. Confusing
> for client otherwise.
>
> AG: optional client initiated staging hold. Letting client do whatever
> they want to do. Dont want to hold for all purposes.
>
> Mike: you know it's held for pre-processing.
>
> AG: what does delegated refer to? Delegated queue for queued, delegated
> running for running.
>
> SM: want to have pre and post for delegated.
>
> EU: names not well chosen.
>
> AG: suggest change names for some of these states.
>
> SM: delgated - incoming is queued.
>
> AG: delegated should be processing. Delegated:incoming should be
> delegated:queued-in. Processing:outgoing (renamed) not visible to
> outside, used as transitory state.
>
> Mike: no error around it, it's mandatory right?
>
> AG: if somebody were to subscribe to notification about this, ...
>
> AG: processing:running, processing:hold.
>     If no queueing system, just executes - can it go directly to running.
>
> SM: debugging problematic. Held not implemented for all middlewares.
>
> AG: not sure I like this (confusion with states of running jobs for
> held/running for executing jobs).
>
> SM: have u seen EMI execution service? Can take some of these from there.
>
> AG: keep same 5 states.
>
> SM: have attributes for substates.
>
> AG: need to agree on these attributes. To avoid having to guess on
> substate meanings.
>
> SM: many reasons for including more info in substates e.g. errors -
> storage is down, etc.
>
> EU: doc - 10th March on pgi list. PGI execution specification.
>
> [Discussion about pg 20 and state definitions - pre-processing and
> processing]
>
> AG: mapping of these states to our discussed states.
>
> SM discusses how this mapping works.
>
> AG: referring to current state model...
>     Agreed yesterday not to change state model otherwise, have to change
> spec. Profile substates, don't have to.
>
> SM: too simple to represent all things. Problems with users to
> distinguish clearly with states.
>
> AG: use [substates] and no pushback from users.
>
> Mike: but we'd have to change the spec.
>
> AG: want to expedite the process. If it can be done in context of the
> existing state model, it vastly simplifies things unless it's too ugly.
>
> SM having multiple states.
>
> AG: how do we feel about this.
>
> SM: many substates of failed - how do we identify that?
>
> SC: this would break state model, new model, should use existing spec
> where possible.
>
> SM: make concept smoother - define running, focus on pre/main/post
> processing.
>
> AG: we went around the state discussions before, with substates you can
> model everything.
>     Doing a whole new spec would take longer.
>
> SM: one more possible point - model from EU/KS they have this community
> requirement.
>
> AG: NAREGI state model - all these collapse into running.
>
> SM: limiting, should find a better way.
>
> AG: can't go backwards and change the document.
>
> SM: staging as substate - what does this mean?
>
> Mike: it may need to change anyway. A cleaner way would be to redo this.
>
> SM: substates of failed, terminate...
>
> AG: legitimate to do this in model.
>
> MM: problem we found was no additional additional transition to failure
> state. It can occur at any moment. We introduced artificial transition
> to failure state.
>     Afraid about too much substating of running state - other substates
> have more informative meaning e.g. running: data-staging.
>
> AG: if we do a new version of the spec base it on existing spec, and
> just change the bits we need e.g. FactoryAttributes, vector operations,
> but keep it simple, otherwise it will escalate - genie out of the bottle.
>     SM is right - various people have substated this in very similar ways
> in implementations - substates in running. This is fine, can even have k
> layers deep of substates. What are transitions introduced? To error states?
>
> AG: too much of a decision to reach today. If we redo the spec, we
> timebox it. If not, we will profile. Otherwise it will get into a
> problem of arguing about stuff as in PGI.
>
> JPN: use cases for these requirements?
>
> AG: held states got us here. They wanted client-initiated held states
> for data staging, then can say 'go'. Then be able to stop it afterwards.
> No requirement for us, since data staging is automated, or done manually.
>
> JPN: requirement for all BES to support thee states?
>
> AG: specified in JSDL, if you don't support it, you throw a fault.
>
> EU: not in favour of manual staging in complicated state model. Have
> documented it in documentation so people can understand these
> complexities. Advocate existing BES states, just adding few transitions
> that are missing e.g. pending ->  failed.
>
> AG: pending ->  failed could be done as an addendum e.g. it was
> forgotten, ok. Adding new states is different.
>
> SM: have given this requirement to PGI and EMI.
>
> EU: have a requirement to users. They should think about workflow model
> before trying to send jobs.
>
> SM: our users are expecting this held, manual staging. We shoulnd't
> perhaps introduce hold mechanism, just implement simple flags. But
> doesn't have to be supported.
>
> EU: do we have to implement new state model.
>
> SM: no. Different issue.
>
> EU: related to post/pre processing.
>
> AG: hold and manual can be substated away. SM wants to restate
> misleading substates.
>
> SM: e.g. running:queued.
>
> EU: just rename.
>
> DM: haven't heard a convincing argument for these pre/post processing
> substates...
>
> AG: it's ugly, a bit of ugly at a time. Term for this: technical debt.
> The easy path, some point in future clean it up, not today - fine with
> me! Not me who will have to implement.
>     SM from UNICORE. PGI and EMI had this as their requirements. PGI/WG
> had this as a requirement driven from gLite and ARC. But now UNICORE?
>
> SM: yes.
>
> AG: how functionality differnet from telling them to copy data in first?
>
> SM: job starts, get EPR&  storage service reference, use this to stage
> data into. Can upload files/directory into this, once done, start job
> physically.
>
> AG: 3 choices
>    1) We stay on profile only path, incorporate reqs fully later on
>    2) Bite bullet and do the spec rewrites
>    3) Addendum - additional profile to stick changes in only
>
> AG: addendum to add state ok, changing state space wouldn't ba an
> addendum as such.
>     Other than UNICORE, others want to profile.
>     UNICORE - do it properly, rewrite.
> Need input from others - gLite, CREAM, SAGA, NorduGrid, etc. Need to get
> this approach right.
>
> SM: first priority data staging profiling ok, state model - secondary.
> Thirdly (personal) BES - meaning of substates from user's point of view.
>     Rename running as processing.
>
> AG: addendum?
>
> Mike: can we do all this in profiles?
>
> AG: don't think vectors can be done this way. Is it important? Not sure
> - may slide on important scale. Those that wanted it have walked away
> from the table.
>
> EU: in EDGI, we're implementing handling of vectors. Tell users if you
> want vector, you explain in separate file internally and generate
> internally as many jobs as necessary using existing client interfaces.
>
> AG: support in our client implementation by generating multiple internal
> JSDLs from a single one, returning single EPR. If you send list, we'll
> accept and start them all.
>     Param sweep spec never says what to return.
>
> SM: rename states.
>
> AG: perhaps not a bad idea. Notion of addendums to specs; very easily done.
>     a) transition from pending to failed.
>     b) rename running to processing.
>
> AG: can have conditional - support both running and processing (but they
> mean the same thing).
>     On to the substate model. Assuming processing, not running. Perhaps
> not enough time. Back and forth transitions are problematic, not sure
> it's way to go. Alternative way would be to have hold states within
> processing; not clear how existing imps that break down running into
> substates would handle this. Look at RENKEI's mapping.
>
> EU: see previews drawing.
>
> KS: we implement pre-processing/post-processing only for data staging.
> Maybe original PGI specification state model comes from strawman. Also
> doc says pre-post used for data staging.
>
> AG: think they are.
>
> KS: implement this state model. In our system we have workflow system,
> with this system, we can transfer data directory from and to computer
> resources. We implemented these states.
>
> EU: pre/post processing not only for manual but automatic data staging.
>
> AG: right.
>     Not a strictly requirement to support suspend/resume.
>
> AG: consensus to take running renamed as processing, change from
> running:stage-in/running:stage-out to new processing states.
>
> SM: how to do this resume in service?
>
> AG: discussed yesterday, new port type.

On Wed, 21/09/2011 12:30, Steve Crouch wrote:
> 11:00-12:30 PGI Session 2
> -------------------------
>
> Chair: Andrew Grimshaw
> Minutes: Steve Crouch
>
> Attendance: ~16
>
> Name shortcuts:
>   AG - Andrew Grimshaw
>   EU - Etiennce Urbah
>   DK - Daniel Katz
>   OS - Oxana Smirnov
>   JPN - JP Navarro
>   KS - Katsushige Saga
>   SC - Steve Crouch
>
> New actions:
>
> [BS] Write-up proposal for including benchmarking requirements in JSDL
> [EU/OS] Write-up proposal for using benchmarks as measurement units in
> terms of the resource requirements
> [SC/AG] Another file staging profile++
>
> Minutes:
>
> Summary of yesterday's session on agreement of moving forward with
> existing specs, profiling/updating individual ones and moving towards an
> overall profile which brings these together. See AG's session
> presentation for more details.
>
> BES++
>
> [See AG's session presentation for more details. List of proposals for
> last session and open questions.]
>
> JSDL
>
> Which XML rendering to use for GLUE2 embedded in JSDL, hold before/after
> execution. Any others?
>
> AM: rumour of final version of XML rendering from GLUE group is
> imminent. [Is it flat or hierarchical?] Can use sub-elements from GLUE2
> and include smaller parts.
>
> AG: representative from GLUE?
>
> AM: JP Navarro good candidate.
>
> AG: JSDL's session poorly attended.
>
> ActivityManagement Port Type
>
> i.e. suspend(), resume(), get_status(), terminate()
>
> AG: Idea being that you could have hold states prior to data staging.
> With this port type, should have other sensible management activies e.g.
> terminate().
>
> EU: a 'purge' exists in PGI requirements.
>
> AG: may get rid of info about the job.
>
> 'Activity Integration Profile'
>
> AG: profiling for this management port type. e.g. RNS 1.1, OGSA-ByteIO
> 1.0 (yesterday's session).
>
> JPN: ad-hoc GLUE2 session - need understanding of how JSDL wants to use
> GLUE, to address use cases. 1) allow GLUE2 subelement rendering in other
> schemas e.g. JSDL; 2) understanding PGI use cases. GLUE2 describes
> configuration, not matching rules. JSDL may want a, b, c but ranges
> cannot be described in GLUE2, it's static description.
>
> AG: 1) have BES resources use GLUE2 to describe itself; 2) specify job
> requirements. In JSDL, can specify ranges?
>
> AM: how is this matched with the two elements?
>
> AG: shortcoming of JSDL - we need to do OR's, but more really guarded
> statements.
>    We originally want players in JSDL/BES/GLUE to present in these sessions.
>    Do we want to go depth-first? i.e. JSDL first. BES appears
> straightforward, EU mentioned one way yesterday. Decision on substate
> model? Then can profile out.
>
> EU: can KS provide links to their state model?
>
> KS: will provide this this afternoon.
>
> OS: user likes to describe job assuming resources offer different
> configurations, but discovery and matching already done, but when it
> gets to resource, it's a single item. Do we need logic instructions?
>
> AG: JSDL with user submits and what ends up in BES can be different.
> Important to have consensus on how this is described in JSDL, even if a
> broker exists in the middle dealing with this translation.
>    User wants to run app, don't care where. May be diff staging thing to
> do. With ORs? Better with guarded statements with predicates. Can decide
> based on these. This is better.
>    Guarded statements presented in CSP, incorporated into lots of
> languages in 80's.
>
> OS: implementors - dont know how to implement discovery with logic,
> through workflows, or deterministic model. Either end of execution
> service, or in client.
>
> AG: request things that have been modified in JSDL from groups e.g. UNICORE.
>
> OS: limit JSDL to what arrives at BES. User tasks can describe whatever
> they like.
>
> AG: not a big fan. Would like to take JSDL docs and throw them to
> EUropean BES'. Not necessary to define these things, but will hurt
> interop, won't be able to use others' JSDLs and vice versa.
>
> OS: broker?
>
> AG: EMS can take in documents and transform them along the way.
>
> EU: Oxana - do you see BES as a site service, or a machine service? ...
>
> AG: would like as symmetric as possible. A site or machine service, but
> can be entire set of sites or machines. I like recursive layer not being
> part of design.
>
> AG: [on guarded statements]
>
> Select
>   (condition expression): action
>   (condition expression): action
>   ...
>
> Sepearte our resource req section, but repeating groups of that. For
> each one have job description, staging elements. On diff machines, want
> to execute diff apps and move diff data. i.e. diff actions dependent on
> conditions. Too much of a ++ or BES? We want to have small steps really,
> instead have smaller pieces. Leave until later?
>
> AG: postpone for now, not much enthusiasm for tackling this.
>
> EU: inside JSDL, have POSIXApplication. Would like to deprecate it. In
> same element, it describes subelements on environment, and others
> belonging to resources. e.g. input/output/error/working directory -
> really belong in env of execution. Already sep blocks for env and
> resources in JSDL.
>
> Mike: need to be careful careful.
>
> BS: +1 on deprecating limits.
>
> Mike: places where we specify OS, GLUE for this. We want to still allow
> JSDL to specify this, and profile to specify this as GLUE2 attributes.
>
> EU: but conflict of attributes ...
>
> Mike: it complicates the logic.
>
> AG: deprecate - not something to count on in future. Take it out of
> follow-on spec. If moving forward, can take these out, just discourage
> use in future.
>    Go thru JSDL issues and put them on table. Maybe not make decision
> yet, but just to list them. BES things fairly well understood. Just JSDL
> for now.
>    For BES++, what should a BES return on a param sweep [added to slides].
>
> EU: param sweep is dynamic creation of jobs. Not possible to return
> fixed list of EPRs since created ... ?
>
> AG: cardinality of set known at request time.
>
> EU: for bulk requests yes, but param sweep number is much higher.
>
> AG: need to get to what a BES should return.
>
> EU: jobs created one after the other in dynamic way. Supposed that there
> are too many to list them.
>
> AG: our imp generates EPRs from param sweep at the time.
>
> Mike: timeout situation?
>
> AG: would like to push this back.
>
> EU: propose to replace param sweep with vector or bulk.
>
> AG: disagree strongly.
>
> EU: param sweep is different than vector/bulk. Vector/bulk are finite
> lists = finite list of EPRs. Param sweep, not finite, but algorithm to
> dynamically create these.
>
> AG: no conditional statements in algorithm, deterministic. Size can be
> worked out. Open to discussing param sweep on call, but esoteric for
> now, perhaps not useful.
>    Another JSDL thing (Mark Morgan originally) found resouce factory
> attributes not enough, sometimes want to match on a property, exec host
> have requirement that job has attribute. e.g. Kraken, Crays require
> statically linked binaries. Can'd send dynamically linked program. App
> needs to say this about itself. So would like to have matching token
> which has these descriptions for matching on.
>
> EU: solved by GLUE?
>
> AG: in JSDL ability to describe wanting something, but not predefined
> e.g. supply as a token.
>
> EU: GLUE env entity.
>
> AG: can have strings? Corresponding element in the resource description?
>
> JPN: app env subelement of other GLUE2 elements, including resource
> descriptions.
>
> AG: could we do - host has BLAST 3.4 and app needs BLAST 3.4 for this.
> Similarly, relationship for specifying apps as statically linked.
>
> JPN: statically linked - haven't heard of this. Perhaps turn this into
> GLUE2 requirements - not currently adtervised but could be.
>
> EU: in GLUE - app name, app version, state of app, can add static desc.
> in 'Other info'.
>
> AG: just a string? e.g. accepts VISA? [Yes] we could include this as we go.
>
> BS: difficult - our main thing is having extensible resource model, to
> have something that allows resources defined by us (not in GLUE/JSDL
> days) to stay extensible for things we dont know yet. Extension -
> key/value pairs.
>
> AG: can add key/value pairs to GLUE2? JPN?
>
> JPN: extension elements all over GLUE - the way to do it. Don't know if
> formatted as key/valur pairs, perhaps strings.
>
> AG: hinge upon things expressed in JSDL. With GLUE, you must have what
> BES must support, and features for matching.
>    We'll be guinea pigs.
>
> EU: good to use benchmarks as measurement units.
>
> BS: scalable requirements ... e.g. 2000 CPU hours as benchmark value.
>
> AG: normalisation of CPU hours based on benchmark?
>
> BS: impossible or very very hard. Benchmark very approximate.
>
> AG: scaling factor depends on app.
>
> OS: benchmarks may be related to GLUE2, but not JSDL - a valid unit?
>
> EU: don't like normalisation or scaling factor. Don't use scalable time.
> Want to specify 2000 SPECints; it means something.
>
> EG: IEEE computer (Freund) - how do we do scaling based on affinities.
> Very app dependent. you suggest we just use specint family?
>
> OS: any.
>
> AG: here's a table - this resource has this factor.
>
> EU: in GLUE2, can specify.
>
> AG: can u point to a benchmark and use it?
>
> DK: can go into long list, but best to think of a couple of parameters.
>
> AG: really want a performance estimator. Problem is either use
> simplistic methods, or complicated models. Fine with what ever is desired.
>
> DK: needs to be something app env will provide. More complicated, less
> likely.
>
> AM: allowing users to submit benchmarks, nothing could match it.
>
> AG: [removing normalisation and benchmarks on slides] - want to specify
> benchs in terms of what we want. It makes sense.
>
> EU: yes.
>
> AG: need to come up with extensible list of benchmarks that each have
> float/int associated with them.
>
> EU: already in GLUE.
>
> AG: BES endpoint manager advertises that they are e.g. 27 in this
> benchmark. Request says at least 30.
>    Do we say time I want is based on this?
>
> DK: 1) using benchmarks nice, need to say what these are - painful, poss
> worth it; 2) what do you do with it?
>
> AG: break this up into separate standalone profile in JSDL e.g. JSDL
> Benchmark Profile?
>
> OS: depends on application.
>
> DK: no, depend on something generic.
>
> AG: on benchmark. If extensible, have BLAST benchmark.
>
> EU: GLUE bogomips, specint, ... it's extensible.
>
> AG: these sorts of benchmarks typically function of machine.
>
> EU: open enumeration equivalent of string.
>
> AG: this machine, runnig this problem size, ... could be all over the
> map for tightly coupled systems. Suggest ask those that want it to
> provide proposal - what would it look like for JSDL and for BES Factory
> Attributes for how ti would be characterised. At least 2 groups
> interested - no reason not to profile it, but not as a MUST for resource
> providers.
>
> DK: do res provs have obligation to provide some benchmarks?
>
> AG: not mandatory, but profile in terms of how it _can_ be specified.
>
> DK: interesting to know how it would be used.
>
> AG: yes, people who care about this write this up. e.g. profile, we have
> use cases, in JSDL do this as a profile and not mandatory in JSDL.
>
> EU: go to PGI reqs Wiki and JD20.
>
> BS: ideally have a schema for this, but prob not necessary. Have diff
> models for providing resources and requesting resources. Often not the
> same. ...
>
> AG: action item to write this up?
>
> Action: [BS] Write-up proposal for extensible resource model for systems
> we don't know yet - key/value pairs
>
> AG: leave guarded statements for now, but have agreed on many things
> [see session slides].
>    Want names to some of these things.
>
> Action: [EU/OS] Write-up proposal for using benchmarks as measurement
> units in terms of the resource requirements
>
> EU: should be able to describe diff data sources in sequential-try mode
> e.g. first fails, second fails, use third (it works).
>    Output file semantics not clear, don't discuss.
>
> AG: JSDL is declarative.
>
> EU: order doesn't matter.
>
> AG: shortcoming of JSDL from users - can't do wildcard staging,
> including differentiating betwen files and folders.
>    Right now we have File Staging extensions. Do a profile for
> additional file staging protocols.
>
> Action: [SC/AG] Another file staging profile++
>
> AG: related to wild card staging, may want to do -r recursive,
> inclusion/exclusion patterns.
>    How to prioritise - last 9 mins, get consensus on this. High, medium,
> low groups.
>
> [See AG's session slides for prioritisation list]

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3882 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/pgi-wg/attachments/20110921/15dc02f5/attachment-0001.bin