[Pgi-wg] OGF PGI Session 1, 14:00-15:30 Draft Minutes

Steve Crouch s.crouch at omii.ac.uk
Wed Sep 21 04:14:14 CDT 2011


14:00-15:30 PGI Session 1
-------------------------

[These are draft minutes, please respond with revisions/comments from 
those present]

Project representation;
  - IGE Steve Crouch (SC)
  - SAGA Andre Merzky (AM)
  - EMI Jon Kiier Nielsen (JKN)
  - XSEDE, Genesis II - A Grimshaw (AG)
  - RENKEI Katzushige Saga (KS)
  - UNICORE (who was representative here?)
  - QCG-Computing Marius Mamonski (MM)

[Is this list complete?]

Minutes: Steve Crouch

New actions:

[AG] will write a description on the differences between these two 
approaches (new port type, or on EPR)

Minutes

AG: One big spec or multiple specs/iterations? What should we do?

SC: Not one big spec. 200 reqs in one spec would not be scalable in 
terms of developments and very impractical.  AG agrees - just too difficult.

DW: we don't have all spec experts required in the room to put into 
this. Not enough represtnation. We shoulnd't do this.

DK: any other representatives that should be here?

AG: 5th F2F PGI meeting - stakeholders have come in and out over time. 
Some big groups (interested in a big spec) have been represented here 
quite a bit.

AG: do we start with what we have, or start again? We can leverage what 
our experiences and what is there if we go with what is there.

AM: should be split the work into various areas with various specs?

AG: agreement to use GLUE2 resource properties, replace in JSDL resource 
properties - Etienne. BES: PGI wanted state model changes (primary thing 
wanted), vector operations (I see as trivial). Job management - JSDL for 
description, BES for management. GridFTP, Byte/IO, put these play into 
how the data is managed in JSDL - a secondary concern.

EU: HPC file staging - ppl interested in how to specify file staging.

AG: we want to do this, do we change for JSDL or not. Also the SPMD, 
ParamSweep profile.

EU: we should reuse where possible, where no problems exist with doing so.

AG: big difference in profiling and extending JSDL, than doing something 
new (new namespaces, whole doc, etc). Do tweaks and profiles where possible.

AG: look at using existing specs, and building on these, rather than 
restarting - a consensus?

DW: EGI virtualisation TF is all about profiling existing specs.

Alan E Sill: for security group fully intends to take what works, what 
exists, working fast using this approach.

[Consensus agreed on doing]

AG: PGI in Catania - we can agree on func bits forever, but security has 
to work too. Don't drop security on the floor

JJ: need to find right people to engage with the security ???? group, to 
develop in areas that overlap with PGI.

AES: PGI needs to state clearly what needs to be done to handoff to 
other groups for the work.

AG: take existing PGI work, as inputs to the process. Authn/z/delegation 
pass to security group. We have 2 profiles for delegation, for each 
style of delegation: message-level delegation (SAML assertions) and 
transport layer. Problems in PGI - not much compromise possible with 
security, people wouldn't change transport level and vice versa (hence 
two profiles).

AES: all authz implementations use SAML or XACML.

AG: not always - e.g. Genesis II uses ACLs.

AES/JJ: profile work to decide on SAML assertion for VO-using sites that 
internally s/w does its own thing.

AG: transport of auth mechanism already profiled in WSI-BSP. Semantics 
need to be profiled. Orthogonal for Access Control policy; that's 
completely site dependent.

DW: security not original focus, what's desired session outcome?

AG: decide on move forward on modifying existing specs - have consensus 
here. Continue along the JSDL/BES path. Or reject, and go another way. 
If we use BES - 2 approaches. 1) change state machine to meet 
requirement to hold jobs at various data-related stages; 2) history 
about BES - some historic bad decisions, should we revisit those 
decisions. You get EPR back, subsequent management done through factory 
port type - some thought one service. Rexamining this would mean 
changing this.

EU: three diagrams about this state model - is there a bottleneck on 
only one endpoint? BES issue.

DM: interested in param sweeping. don't drop it.

DW: writing profiles.

AG: agree on work items for group.

DW: we should have experiences doc on interop work done so far?

AG: already written - SC.

AG:
  - data staging FSE+
  - param sweep
  - state model extensions (getstatus)
  - vector ops
  - JSDL changes

EU: many of these profiles already exist - should we build on them or 
make something different?

AG: motion to change JSDL to reflect JSDL group changes in resource 
profiling (through GLUE2).

DW: we need something coherent and whole, would like to see something 
that fits all things all together e.g. GLUE fits with BES fits with 
JSDL. With this top-level doc referencing other profile docs.

DM: some work needs to be done to move towards this position first.

DW: agreed.

AES: e.g. CDMI interesting as a spec for this standpoint.

DW: Virtualisation session - everything broken up into smaller parts. 
But we keep revisiting this, but need to approach this sensibly so it 
all comes together again at the top.

MM: extension related to BES used in production(?) - how to deal with this?

EU/AG: PGI use cases and requirements analysis done last year - painful.

EU: moving forward, pick one of the above listed items.

AG: jsdl changes need to be done, really want to do data staging. BES 
without data staging is more or less useless. You need to understand 
protocols.

DM: can we bring up faults in JSDL data staging?

AG: yes, perhaps in next session?

AG: need new working group chairs decided in Salt Lake City - Michel 
Drescher, Andrew Grimshaw.

AG: what problems are people having with BES? We changed it support RNS 
(small change).

MM: change - added things to factory attributes. Manual data staging, 
other changes.

AG: have manual staging as a separate interface.

Might want activity status achieved through one resource, but forces one 
service to take all requests.

AG: leap into JSDL stuff (not good, JSDL have their own ideas), or leap 
into BES stuff - e.g. status requests through a single endpoint.

AM: should go through requirements made to PGI.

AG: for BES - manual data staging. Current state model says you can't 
make new states, only substates. PGI req from some members stated that 
job could be submitted, but in blocked state until explicit 'go' telling 
it to move into next state. Support through new port type.

AM: add a new port type - don't break it badly.

Some clients wouldn't know about suspend/resume substates.

EU: full suspend/resume state model is complicated.

AG: define new port type which defines what happens for each substate in 
BES. Circulate this idea. Have this as a compliance target in one of the 
uber-high level profiles.

DM: mentioned already in OGSA-DMI - would be good for us.

EU: existing state model of BES good enough to be profiled. But we need 
a valid state change from Pending to Failed.

AG: one other thing - to get to PGI use case, have it going from Pending 
to Running:Suspended initially. We would need to do a little bit more 
changes for this. JSDL might not like it but, can put it in a JSDL 
element to state 'suspended'. But not a clean way since JSDL should only 
be a job description, not management.

AG: motion - we will use this basic state model, ask JSDL to include 
'start in this suspended' state, or new port type tht controls states a 
job is in.

DM: yes, becuase being able to suspend and resume at off peak times 
would be useful for OGSA-DMI.

AG: ok, we define a new port type with suspend and resume. It is 
resolved! It will have 'suspend' with an EPR (or a vector of EPRs), and 
similarly 'resume' with an EPR (or vector of EPRs).

EU: this relates back to the recent PGI requirements doc I wrote for 
these requirements.

AM: ppl will complain hold and suspend mean same thing.

AG: suspend is at BES level, hold gets put in JSDL.

EU: rename hold as suspend?

DW: it'll do whatever LRMS supports at back end. Which support hold?

EU/AG: if it can't do suspend, it faults.

AG: could put suspend/resume on the activity itself. Instead of new port 
type on BES, instead call on EPR to have

Action [AG]: will write a description on the difrerences between these 
two approaches (new port type, or on EPR)

AG: feels like we've made good progress today!

DW: those who are not here should have limited voice.



More information about the Pgi-wg mailing list