[Pgi-wg] OGF PGI Use Cases and Scpoe

Etienne URBAH urbah at lal.in2p3.fr
Fri May 14 15:52:08 CDT 2010


Balazs, Morris, Luigi, Johannes and all PGI members,


Concerning OGF PGI Uses Cases and Scope :


UML diagrams
------------
As announced, I just published inside 
http://forge.gridforum.org/sf/docman/do/listDocuments/projects.pgi-wg/docman.root.input_documents 
the source file and the pictures of UML Class and Collaboration Diagrams 
(designed with the ArgoUML tool) :

-  showing INTERFACES of Distributed Data Processing which are necessary 
or useful to standardize,

-  permitting to assess the impact of ARCHITECTURE on the list of 
INTERFACES which absolutely require to be standardized to permit minimum 
interoperability.

For each interface, I have added the known relevant standard(s) inside 
square brackets.  Please note that Accounting Backend is standardized by 
'UR' (I just forgot it).

Inside the Collaboration Diagrams, arrows in RED depict relationships 
NOT using interfaces.  These relationships hinder interoperability and 
should be avoided when possible.


Terminology / Vocabulary
------------------------
-  PAYLOAD :  Any Grid or Cloud Entity directly useful for a scientist 
(Data, Activity, Instrument, ...)

-  SUPPORT :  Any other Grid or Cloud Entity (Security, Info, 
Application, License, VM image, Log, Accounting, ...)
    Security, Info, Log and Accounting are absolutely needed to operate 
a Production Grid.


Files
-----
http://forge.gridforum.org/sf/go/doc15977?nav=1   ArgoUML source file

http://forge.gridforum.org/sf/go/doc15978?nav=1   Class Diagram

http://forge.gridforum.org/sf/go/doc15979?nav=1   Small Collab. Diagram

http://forge.gridforum.org/sf/go/doc15980?nav=1   Big   Collab. Diagram

http://forge.gridforum.org/sf/go/doc15981?nav=1   Scope with BES + JSDL

http://forge.gridforum.org/sf/go/doc15982?nav=1   Interoperability with 
a Monolithic Execution Service


Standardization priorities
--------------------------
The collaboration diagrams show in particular that :

-  The interfaces for Activity Management are only connected to the 
Activity Managers and the Computing Resources.
    So, it is easy to build a gateway to bridge Activity Management 
between 2 completely different Grid or Cloud infrastructures  (The EDGeS 
3G bridge is in full operation between gLite- and BOINC-powered 
infrastructures).
    Therefore, Activity Management alone does NOT require urgent 
standardization.

-  The interfaces for Security, Info, Log and Accounting are directly 
connected to most functionalities.
    Therefore, interoperability between Production Grids absolutely 
require urgent standardization of these interfaces.


Criticism, remarks and suggestions are welcome.

Best regards.

-----------------------------------------------------
Etienne URBAH         LAL, Univ Paris-Sud, IN2P3/CNRS
                       Bat 200   91898 ORSAY    France
Tel: +33 1 64 46 84 87      Skype: etienne.urbah
Mob: +33 6 22 30 53 27      mailto:urbah at lal.in2p3.fr
-----------------------------------------------------


On Fri, 14/05/2010 18:03, Etienne URBAH wrote:
> Balazs, Morris, Luigi, Johannes and all PGI members,
>
>
> Lot of thanks to Oxana, Aleksandr and Andrew for their contribution to
> OGF PGI :
>
> We finally come to what we all should have begun with, that is the USE
> CASES and the SCOPE.
>
>
> +-------------+
> |  USE CASES  |
> +-------------+
> I suggest that :
>
> - We all provide clear descriptions of Use Cases inside
> http://forge.gridforum.org/sf/docman/do/listDocuments/projects.pgi-wg/docman.root.input_documents.use_cases
>
> In particular, Use Cases for the five LHC categories described by Oxana
> are welcome.
> Could the already published GROMACS Use Case be included inside one LHC
> category ?
>
> - Each Use Case (in particular the GROMACS Use Case) must clearly
> indicate if the required file stagings should be performed :
> - automatically by the Execution Service (from locations specified
> inside the Job Description), or
> - manually by the Submitter of the Activity, which then requires to be
> in a 'Hold' state, and its 'session directory' (or alike) published.
>
> - We describe these Use Cases with a graphical UML tool, such as ArgoUML
> (which runs on MS-Windows, MAC OS X and Linux), and publish the
> corresponding source files inside GridForge (open collaborative process).
>
> - Simple example of Use Case :
> The Submitter of an Activity MAY be a scientist using a Scientific Grid
> Portal permitting to submit predefined Applications.
> Therefore, this Activity Submitter MAY have very little knowledge of
> Grids, and MAY have very little knowledge of the Application executed as
> Payload on the Computing Resource.
>
>
> +---------+
> |  SCOPE  |
> +---------+
> Execution of Activities or Jobs is only a little part of the
> 'Distributed Data Processing' functionalities that a Production Grid is
> required to provide.
>
> In particular, we MUST clearly understand the difference of 'Quality of
> Service' required by :
> - Transient entities (such as Activities or Jobs), which MAY fail at any
> time for any reason,
> - Persistent entities (such as Grid resource descriptors, Security
> descriptors, Data sets, Accounting Records, Log records, ...), which
> SHOULD be securely kept.
>
> I will publish very soon the source file and the pictures of UML
> Collaboration Diagrams (designed with the above mentioned ArgoUML tool)
> showing :
>
> - INTERFACES of Distributed Data Processing which are necessary or
> useful to standardize
>
> - The impact of ARCHITECTURE on the list of INTERFACES which absolutely
> require to be standardized to permit minimum interoperability.
>
>
> +------------------------+
> |  FOUNDATION STANDARDS  |
> +------------------------+
> In order to ease mutual understanding and general agreement, we need
> Foundation Standards as sound basis.
>
> Requirement NF6 (162) : JSPG (Security Policies)
> -------------------------------------------------
> Without agreement on AUTHN and AUTHZ, there is simply NO interoperability.
>
> Requirement IS1 (1) : GLUE model
> ---------------------------------
> - At the Amsterdam meeting, someone said about GLUE : 'Information model
> does not concretely say anything'.
> - I have the totally opposite opinion : I strongly suggest to use the
> GLUE model (currently GLUE 2.0) as one of these Foundation Standards,
> and to describe as much concepts as possible using GLUE entities.
>
>
> +----------------------------+
> |  TERMINOLOGY / VOCABULARY  |
> +----------------------------+
> For clarity, I suggest following definitions :
>
> Client
> ------
> Holder of credentials belonging to a member of a GLUE UserDomain.
> For example, a Client MAY submit an Activity to an Execution Service, or
> query the Status of an Activity.
>
> Activity = Job (fully synonyms)
> --------------
> Remote processing which a Client describes in a 'Job Description', which
> the Client then submits to an Execution Service.
>
> Payload
> -------
> Anything (Application, Script, Pilot Job, ...) executed by a Computing
> Resource on request of the Activity. The payload MAY completely ignore
> that is is executed inside a grid Activity.
>
> Simple Activity
> ---------------
> Simple Job Description containing only ONE local job executed by only
> ONE batch system, WITHOUT 'Hold' states NOR manual staging.
> - This is a suggestion of restrictive evolution of requirement JM5 (55).
> - An Activity requiring 'Hold' states and/or manual staging is then
> called 'Interactive Activity'.
>
>
> Criticism, remarks and suggestions are welcome.
>
> Best regards.
>
> -----------------------------------------------------
> Etienne URBAH LAL, Univ Paris-Sud, IN2P3/CNRS
> Bat 200 91898 ORSAY France
> Tel: +33 1 64 46 84 87 Skype: etienne.urbah
> Mob: +33 6 22 30 53 27 mailto:urbah at lal.in2p3.fr
> -----------------------------------------------------
>
>
> On 13/05/2010 21:00, Oxana Smirnova wrote:
>> Hi Andrew, all,
>>
>> allow me to start from very starters, to explain "typical" workload
>> Aleksandr referred to.
>>
>> Both ARC and gLite "grew" from the requirements of the High Energy
>> Physics community, more specifically - those of LHC experiments. I'll
>> come back later to what it means in practice.
>>
>> The basic difference between gLite and ARC starting requirements is that
>> gLite is designed for resources owned or controlled to a large extent by
>> their users, while ARC is designed to support resources shared between
>> different users and controlled by fairly independent resource providers.
>>
>> The immediate difference is policies: while gLite community is largely
>> expected to comply with policies devised by the LHC Grid (WLCG) Joint
>> Security and Policy Group, ARC community has no unique set of policies.
>> ARC sites that contribute to LHC computing in general tend to respect
>> the WLCG policies, but not too close, giving priority to local policies
>> of resource owners. Needless to say, this introduces extra complexity
>> into requirements, and reduces the rate of simple use cases.
>>
>> Now, LHC experiments are huge communities, between 500 and 3000 members
>> in each. All well-educated computer-savvy people who never hesitate
>> coming with their own brilliant solutions. Even within one experiment
>> divergence is huge, and there are sites that support 4 experiments.
>> This adds to the complexity: not only we have diverging or contradictory
>> requirements from resource owners, we also have all sorts of diverging
>> requirements from users.
>> The least common denominator is 1: meaning "ping", because even
>> "hello world" is ambiguous
>> - where do you send the output, to a file or to a standard output?
>> If it is standard output, do you write it to a log?
>> Who said "hello world", the individual user or the whole experiment?
>> Do we have a right to log individual activities at all? And so on.
>>
>> There are several basic kinds of typical jobs by LHC experiments, in
>> general they can be separated in 5 categories:
>>
>> 1. Monte Carlo generation (no input, small resource consumption, small
>> output)
>> 2. Detector simulation (one input file, moderate resource consumption,
>> moderate output)
>> 3. Signal reconstruction (multiple input files, moderate resource
>> consumption, large output)
>> 4. Data merging (very many input files - like, 400, large resource
>> consumption, large output)
>> 5. Data analysis (huge amount of input files not necessarily known in
>> advance, small resource consumption, small output)
>>
>> The job can have a number of states, e.g.:
>>
>> 1. Job is defined (may require authorisation)
>> 2. Job matched to a site (requires authorisation, detailed site
>> information, maybe data availability information)
>> 3. Data are made available to a job (authorisation, probably delegation
>> of data staging rights to a staging service)
>> 4. Job processes data (CPU, I/O, network access to external databases
>> requiring authorisation)
>> 5. Job places data at [multiple] pre-defined destinations
>> (authorisation, contacting external databases, probably delegation to a
>> staging service)
>> 6. Job is finished
>> 7. Job has failed
>> 8. Job is terminated (requires authorisation)
>> 9. Job is resubmitted (authorisation, information)
>>
>> Each state may have a number of sub-states, depending on the
>> experiment-specific framework.
>>
>> Authorisation may be per Virtual Organisation (file reading), per Role
>> and/or Group within a VO (job types 1-4), per person (job type 5), or
>> even per service (some frameworks accept services as VO members).
>>
>> Delegation of rights in general is very much needed, because of the
>> large number of auxiliary services, distributed environment and general
>> complexity of the workflow. No-one really knows how to achieve the goals
>> without delegation.
>>
>> Each experiment has their own framework. Most such frameworks circumvent
>> Grid services because they are too generic. This means that jobs are
>> highly non-trivial as they attempt to re-implement Grid services such as
>> information gathering and publishing, data movement and registration,
>> resubmission etc.; they also tend to tweak authorisation by executing
>> payload of users not authorised by Grid means. This complicates even
>> further the job state picture.
>>
>> If PGI's outcome will make any of the above mentioned jobs impossible,
>> most key ARC and gLite customers will not use PGI specs, and they will
>> have only academic value. This was not the PGI goal, as "P" stands for
>> "Production".
>>
>> Cheers,
>> Oxana
>>
>>
>> 13.05.2010 15:20, Andrew Grimshaw пишет:
>>> Aleksandr,
>>> Referring to your sentence/paragraph
>>>
>>> "Such "simple" job is very far from being "typical". At least in
>>> NorduGrid world AFAIK."
>>>
>>> Could you elaborate. I see in my work basically two "types" of jobs
>>> that dominate - sets of HTC "parameter space" jobs, and true parallel
>>> MPI jobs.
>>> In both cases the "job" is basically a command line - either an
>>> mpiexec/run, and application with parameters, or a script with
>>> parameters.
>>> The job has known inputs and outputs, or a directory tree location
>>> where it needs to run.
>>> The jobs runs to completion, or it fails, in either case there are
>>> output files and result codes.
>>> Sometimes the job is a workflow, but when you pick that apart it turns
>>> into jobs that have inputs and outputs along with a workflow engine
>>> orchestrating it all.
>>>
>>> What is a typical job that you see?
>>> When I say "typical" I mean covers 80% of the jobs.
>>>
>>> A
>>>
>>> -----Original Message-----
>>> From: Aleksandr Konstantinov [mailto:aleksandr.konstantinov at fys.uio.no]
>>> Sent: Sunday, May 02, 2010 3:36 PM
>>> To: pgi-wg at ogf.org
>>> Cc: Andrew Grimshaw; 'Oxana Smirnova'; 'Etienne URBAH';
>>> 'David SNELLING'; lodygens at lal.in2p3.fr
>>> Subject: Re: [Pgi-wg] OGF PGI Requirements - Flexibility and clarity
>>> versus Rigidity and confusion
>>>
>>> Hello,
>>>
>>> I agree that problem is to difficult to solve.
>>> One should take into account that initially task was different.
>>> Originally AFAIR it was an attempt of few grid project to make a
>>> common interface suitable for them. Later those were forced to OGF
>>> and problem upgraded to almost unsolvable.
>>>
>>> Andrew Grimshaw at Saturday 01 May 2010 15:36 wrote:
>>>> Oxana,
>>>> Well said.
>>>>
>>>> I would add that I fear we may be trying too much to solve all
>>>> problems the first time around - to "boil the ocean".
>>>> To completely solve the whole problem is a daunting task indeed
>>>> as there are so many issues.
>>>>
>>>> I personally believe we will make more progress if we solve the
>>>> minimum problem first, e.g., securely run a simple job from
>>>> infrastructure/sw-stack A on infrastructure/sw-stack B.
>>>
>>> This problem is already solved. And it was done in few ways.
>>> 1. Client stacks supporting multiple service stacks
>>> 2. BES + GSI
>>> 3. Other combinations currently in use
>>> And none is fully suitable for real production.
>>> So unless task of PGI is considered to be purely theoretical this
>>> approach would become equal to one more delay.
>>>
>>>>
>>>> "Infrastructure/sw-stack A" means a set of resources (e.g., true
>>>> parallel-Jugene, clusters, sets of desktops) running a middleware
>>>> stack (e.g., Unicore 6 or Arc) configured a particular way.
>>>> In the European context this might mean an NGI such as D-Grid with
>>>> Unicore 6 running a job on NorduGrid running Arc.
>>>> (Please forgive me if I have the particulars of the NGIs wrong.)
>>>>
>>>> "Simple job" means a job that is typical, not special.
>>>> This is not to say that its resource requirements are simple, it may
>>>> have very particular requirements (cores per socket, interconnect,
>>>> memory), rather I mean that the job processing required is simple:
>>>> run w/o staging, simple staging,
>>>
>>> Such "simple" job is very far from being "typical". At least in
>>> NorduGrid world AFAIK.
>>>
>>>> perhaps client interaction with the session directory pre, post, and
>>>> during execution.
>>>> Try to avoid complex job state models that will be hard to agree on,
>>>> and difficult to implement in some environments.
>>>>
>>>> "Securely" means sufficient authentication information required at B
>>>> is provided to B in a form it will accept from a policy perspective.
>>>> Further, that we try as much as possible to avoid a delegation
>>>> definition that extends inwards beyond the outer boundary of a
>>>> particular infrastructure/sw-stack.
>>>
>>> I'm lost. Is is delegation or definition which extends?
>>>
>>>> (The last sentence is a bit awkward, I personally think that we will
>>>> need to have two models of authentication and delegation
>>>> - a legacy transport layer mechanism,
>>>> and a message layer mechanism based on SAML, and that inside of a
>>>> software stack we cannot expect sw-stacks to change their internal
>>>> delegation mechanism.)
>>>>
>>>> I believe authentication/delegation is the most critical item:
>>>> if we cannot get the authentication/delegation issues solved,
>>>> the rest is moot with respect to a PRODUCTION environment.
>>>> We may be able to do demo's and stunts while punting on
>>>> authentication/delegation, but we will not integrate production
>>>> systems.)
>>>
>>> Wasn't delegation voted no during last review?
>>>
>>>
>>> A.K.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5073 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/pgi-wg/attachments/20100514/c619520e/attachment-0001.bin 


More information about the Pgi-wg mailing list