[DRMAA-WG] MonitoringSession

Mariusz Mamoński mamonski at man.poznan.pl
Mon Nov 8 09:58:03 CST 2010


Can we split the discussion into two paths? ;-)

Does anyone vote against 1. point (MachineInfo)?

Regarding 2. point:

 We could make maxSlotsCount optional (like the threadsPerCore) and
define it only as the upper bound (and only for the purpose of
computing host/cluster "capacity"). Alternatively: what do you think
about the idea of:
  - changing threadsPerCore -> maxThreads
  - relaxing meaning of the threadsPerCore/maxThreads: as "maximal
number of threads that can run *simultaneously* on given machine"
without saying if this is imposed by hardware configuration or system
policy.

Cheers,


2010/11/8 Daniel Gruber <daniel.x.gruber at oracle.com>:
> Adding slotsCount to machineInfo would destroy the separation between queue
> level and host level information. Different users could have different
> slotCount on the same machine. The information must be retrieved on queue
> level (we have the queueMaxSlotsAllowed()) method for that.
but in this case the slots can spawn multiple hosts
> Cheers
>
> Daniel
>
>
> On 11/08/10 16:01, Mariusz Mamoński wrote:
>>
>> Hi all,
>>
>>  If we are talking about the monitoring session... what do you think
>> about the idea of:
>>
>>  1. creating a new data struct MachineInfo with all the predefined
>> machine attributes (e.g.: threadsPerCore) +  "readonly attribute
>> Dictionary drmsSpecific;" (an extension point) and providing one
>> method: "MachineInfo getMachineInfo(in string machineName) for
>> accessing all of them
>>  2. adding a new attribute "slotsCount", which denotes the maximum
>> number of single-process jobs that can run on given machine
>> concurrently (use case: system administrator may either choose
>> configuration where one process runs per physical core or hardware
>> thread or choose choose any number that is totally independent from
>> hardware configuration)
>>
>> Cheers,
>>
>>
>> 2010/11/8 Daniel Gruber <daniel.x.gruber at oracle.com>:
>>>
>>> Ok. For simplicity we take 1 as default value with the
>>> drawback that we loose information if the SMT value
>>> is available (and correct) or not.
>>>
>>> Regards,
>>>
>>> Daniel
>>>
>>> On 11/08/10 15:14, Peter Tröger wrote:
>>>>
>>>> Hi,
>>>>
>>>> I can agree to the new "threadsPerCore" attribute, but would prefer to
>>>> have "1" as default value. From our understanding of a core, each one can
>>>> always execute at least one thread. It would also allow to compute an
>>>> estimation of the number of parallel threads, without looking on the
>>>> specific numbers.
>>>>
>>>> Best,
>>>> Peter.
>>>>
>>>>
>>>> Am 08.11.2010 um 10:55 schrieb Daniel Gruber:
>>>>
>>>>> Hi,
>>>>>
>>>>> in the MonitorinSession we have on machine level machineSockets and
>>>>> coresPerSocket.
>>>>> To be consequent we should also add threadsPerCore. At least OGE/SGE
>>>>> does
>>>>> support this. I added it into our spreadsheet.
>>>>> If this is not supported by a DRM/OS it could return 0 as value for
>>>>> unknown.
>>>>>
>>>>> 0 for coresPerSocket and machineSockets is not allowed since we should
>>>>> define coresPerSocket*machineSockets=="processors" in case a DRM or OS
>>>>> does not support this kind of architectural information. I suggest to
>>>>> leave
>>>>> it open for the DRMAA implementation if it maps the "processors"
>>>>> information
>>>>> to coresPerSocket or machineSockets in case of missing architectural
>>>>> details.
>>>>>
>>>>> If there is no objection I'll take this as accepted.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Daniel
>>>>> --
>>>>>  drmaa-wg mailing list
>>>>>  drmaa-wg at ogf.org
>>>>>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>
>>> --
>>>  drmaa-wg mailing list
>>>  drmaa-wg at ogf.org
>>>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>
>>
>>
>>
>
>



-- 
Mariusz


More information about the drmaa-wg mailing list