[Pgi-wg] OGF PGI - AGU Execution Service Strawman Rendering
Etienne URBAH
urbah at lal.in2p3.fr
Fri Oct 16 11:53:23 CDT 2009
Balazs, Morris, Luigi, Johannes and all,
Concerning the 'AGU Execution Service Strawman Rendering' of OGF PGI and
the telephone conference of last week on 09 October 2009 :
- Many thanks to Morris for having given detailed explanations on
chapter 2.1 'CreateActivity Operation'.
I now much better understand what is described inside an 'operation'.
- Many thanks to Johannes for the Report and for the Action list.
Consistency between the CreateActivity operation and the State Model
--------------------------------------------------------------------
Inside chapter 2.1 'CreateActivity operation', I found discrepancies
between the current description of the 'CreateActivity' operation and
the PGI Single Job State Model :
- Inside the PGI Single Job State Model, the Execution Service :
- Allocates a Jobid (or an EPR) to the Job and sends it back to the
Submitter at the end of the 'Submitted' state, BEFORE any storage
allocation could be performed,
- Notifies the submitter with allocated storage resources for
stage-in only inside the 'Pre-processing:Hold' state.
- The current description of the 'CreateActivity' operation encompass
both the 'Submitted' and 'Pre-processing' states, and describes that the
response can contain information about storage resources for stage-in.
In fact :
- The 'CreateActivity' operation should be limited to the
'Submitted' state, and the response can only be only a vector of Jobids
(or EPRs). Information about storage resources for stage-in can only be
given later, through a 'GetActivityInfo' request or a notification to
the submitter.
- In order to permit notification, the 'CreateActivity' operation
should allow an 'Notification EPR' as an additional optional input
parameter.
I have updated the document with changes highlighted at
http://forge.gridforum.org/sf/go/doc15628?nav=1
Hold substate inside the 'Submitted' state ?
--------------------------------------------
See mail below.
Best regards.
-----------------------------------------------------
Etienne URBAH LAL, Univ Paris-Sud, IN2P3/CNRS
Bat 200 91898 ORSAY France
Tel: +33 1 64 46 84 87 Skype: etienne.urbah
Mob: +33 6 22 30 53 27 mailto:urbah at lal.in2p3.fr
-----------------------------------------------------
On Thu, 13 Aug 2009, Etienne URBAH wrote:
> Balazs, Morris and all,
>
>
> Concerning the last OGF PGI telephone conference on 05 August 2009 :
>
>
> Meeting notes
> -------------
> I see NO meeting notes about this telephone conference at
> http://forge.gridforum.org/sf/discussion/do/listTopics/projects.pgi-wg/discussion.meetings
>
>
> So I am working with my own (fragmentary) notes.
>
> For all future OGF PGI telephone conferences, is it possible that a
> secretary or a chair takes meeting notes, then writes them down in a
> understandable form, and publish them at the above mentioned page ?
>
>
> Creation of a 'Submitted:Hold' substate ?
> -----------------------------------------
> First, as general rules, I consider that :
>
> - In order to AVOID keeping (potentially large) grid resources while
> NOT computing, grid Jobs should be designed to be processed completely
> automatically, with NO provision for 'Hold' substates,
>
> - A grid Job needing many 'Hold' substates can NOT be handled by an
> automatic Submitter, but should be submitted by a human grid User as an
> 'Interactive Job', as described for example at
> https://edms.cern.ch/file/722398//gLite-3-UserGuide.html#SECTION00084400000000000000
>
>
>
> Someone asked for the creation of a 'Hold' substate inside the
> 'Submitted' state, like inside other states.
>
> This 'Submitted:Hold' substate would make sense only if the Job
> Submitter could perform an operation on this substate.
>
> In order to request such an operation, the Job Submitter needs the Jobid
> (or Job EPR).
>
> This Jobid (or Job EPR) is guaranteed to be allocated by the Execution
> Service only at the END of the 'Submitted' state, but NOT before.
>
> Therefore, I consider that the 'Submitted' state can NOT contain a
> 'Hold' substate.
>
> If anyone thinks otherwise, can he/she please present a convincing Use
> Case ?
>
>
> Precisions about the 'Finished with Success or Error' state
> -----------------------------------------------------------
> Someone asked that the 'Error' case of the 'Finished with Success or
> Error' state should be moved to the 'Failed' state.
>
> In fact, inside the current Job State Model, a Job reaches the 'Finished
> with Success or Error' state if and only if it successively reached the
> end of following states, without failure or cancellation at the JOB level :
> - 'Pre-processing'
> - 'Delegated', whatever the Application result :
> - Success = Application return code equal to zero
> - Error = Application return code different of zero
> - 'Post-processing'
>
> Inside the 'Finished with Success or Error' state :
> - Success means 'Application return code was equal to zero',
> - Error means 'Application return code was different of zero'.
>
> I copied this behavior from the Job State Model of 'gLite', where the
> 'Done' state contains both the 'Success' and 'Exit Code !=0' cases, as
> can be seen in the 'bookkeeping information' at
> https://edms.cern.ch/file/722398//gLite-3-UserGuide.html#SECTION00084100000000000000
>
>
>
> I consider this behavior design, and the strong separation between the
> 'Failed' and 'Finished with Success or Error' states, as fully justified
> by following reasons :
>
> - Whenever a Job reaches the 'Failed' state, the grid Execution Service
> detected an unrecoverable inconsistency at the JOB level.
> Therefore, the Job output sandbox and the post-processed Application
> output files can potentially be NOT consistent and NOT even accessible
> by the Job Submitter.
> In order to investigate the Job failure, the grid User then needs
> some grid knowledge (and often experience and expertise) to retrieve and
> interpret :
> - the Job failure code and message,
> - the Job logging and bookkeeping, in comparison with the Job
> description.
> This 'grid level' investigation can sometimes prove that the cause of
> the Job failure came from the Application, but is ALWAYS necessary.
>
> - Whenever a Job reaches the 'Finished with Success or Error' state,
> the grid Execution Service could create the Job output sandbox, and
> perform post-processing on Application output files, WITHOUT detecting
> any unrecoverable inconsistency at the JOB level.
> Therefore, the Job output sandbox, and the post-processed Application
> output files, can be supposed to be consistent and easily accessible by
> the Job Submitter.
> On a non-zero return code of the Application, the grid User :
> - first has to look (WITHOUT needing any grid knowledge) at the Job
> output sandbox and at the post-processed Application output files for an
> Application problem,
> - before, if necessary, using grid knowledge (and often experience
> and expertise) to provide any evidence that the Application error was
> caused by a faulty Job description, the Batch system, or the grid
> Execution Service.
>
> As a summary, I consider that the 'Error' case of the 'Finished with
> Success or Error' state should be kept as it is, and NOT be moved to the
> 'Failed' state.
>
> If anyone thinks otherwise, can he/she please present convincing reasons ?
>
>
> Strawman Rendering
> ------------------
> I will work on the ODT version of 'Strawman Rendering' at
> http://forge.gridforum.org/sf/go/doc15628?nav=1 in order to :
>
> - include the above precisions on states,
>
> - include the 'Types of grid Jobs' section of my 'PGI Execution Service
> Overview' document,
>
> - check consistency, and present the relationships between the
> operations described in chapter 2 'Interface: Execution Port-Type' and
> the different states of the different types of grid Jobs.
>
>
> Joining +9900827049931906 (plus perhaps Skype typing) on Friday 14
> August 2009 at 16h CET.
>
> Best regards.
>
> -----------------------------------------------------
> Etienne URBAH LAL, Univ Paris-Sud, IN2P3/CNRS
> Bat 200 91898 ORSAY France
> Tel: +33 1 64 46 84 87 Skype: etienne.urbah
> Mob: +33 6 22 30 53 27 mailto:urbah at lal.in2p3.fr
> -----------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5073 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/pgi-wg/attachments/20091016/0ec24b05/attachment.bin
More information about the Pgi-wg
mailing list