[DRMAA-WG] Meeting Minutes - Conference call - Apr 27th - 19:00 UTC

Peter Tröger peter at troeger.eu
Mon May 2 01:42:25 CDT 2011


Hi,

>> Participants: Daniel, Mariusz, Roger, Andre (SAGA), Peter
>> 
>> Line 707 - Reaction on reaching soft / hard limits
>> - Grid Engine: Signal depends on particular limit type
>> - Agreement that crossing a hard limit should lead to FAILED state of
>> the DRMAA job
>> - Agreement to remove softResourceLimits completely, since DRMAA cannot
>> promise any kind of common semantics, and since the attribute is not
>> important enough to add it as opaque concept (as with slots)
> 
> i promised to do some research, so:
> 
> we are mixing different resources wich limits have different purpose
> and thus associated policy:
> 
> enum ResourceLimitType { CORE_FILE_SIZE ,	CPU_TIME ,	DATA_SEG_SIZE
> ,	FILE_SIZE ,	OPEN_FILES , STACK_SIZE ,	VIRTUAL_MEMORY
> ,	WALLCLOCK_TIME	};
> 
> lets take the first one:
> 
> CORE_FILE_SIZE  and Grid Engine
> 
> man queue_conf: " The  remaining parameters in the queue configuration
> template specify per job soft and hard resource limits as implemented
> by the setrlimit(2) ..."
> 
> man setrlimit " RLIMIT_CORE Maximum size of core file. When 0 no core
> dump files are created.  When non-zero, larger dumps are truncated to
> this size."
> 
> and the difference between Soft and Hard limit is defined as follows:
> " The hard limit acts as a  ceiling  for  the  soft  limit:  an
> unprivileged  process  may only set its soft limit to a value in the
> range from 0 up to the hard limit, and (irreversibly) lower its hard
> limit."
> 
> exceeding other limits like OPEN_FILES would result just in errors on
> calls like open() which application can handle end exits with 0.
> 
> So the agreement that "crossing a hard limit should lead to FAILED"
> should be valid only to some of the limits e.g.: WALLCLOCK_TIME,
> CPU_TIME.

That's an issue. I see basically three options here: 

1) We define the hard limit violation behavior per parameter. In this case, we could add the soft limits again with the same approach.
2) We declare the job termination as MAY happen at any time after violation, and stick with leaving out the soft limits.
3) We drop resource limits completely.

Number 1 is most explicit (== good), but demands careful research on operating system level. Number 2 is our usual safe net. Number 3 is as explicit as number 1, but people may miss the feature.And no, doint it the 'slots' way is not an option ;-) ...

Best regards,
Peter.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110502/3a8f6357/attachment.html 


More information about the drmaa-wg mailing list