[DRMAA-WG] Load average interval ?

Mariusz Mamoński mamonski at man.poznan.pl
Thu Mar 25 09:03:44 CDT 2010


Also for me. As we are talking about monitoring interface i propose
two more changes to the machine monitoring interface:

1. Having a new data struct called "MachineInfo" with attributes like
Load, PhysMemory, ... and getMachineInfo(in String machineName) method
in the Monitoring interface. Rationale: the same as for the JobInfo
(consistency issue, fetching all machines attributes at once is more
natural in DRMS APIs then querying for each attribute separately)

2. change machineCoresPerSocket to machinesCores, if one have
machineSockets he or she can easily determine the
machineCoresPerSocket. The problem with the current API is that if the
DRM do not support "machineSockets" (as far i checked only LSF provide
this two-level granularity @see Google Doc) we loose the most
essential information: "how many single processing units do we have on
single machine?"

Cheers,

On 23 March 2010 23:00, Daniel Templeton <daniel.templeton at oracle.com> wrote:
> That's fine with me.
>
> Daniel
>
> On 03/23/10 13:51, Peter Tröger wrote:
>>> Any non-SGE opinion ?
>>
>> Here is mine:
>>
>> I could only find one single source that explains the load average
>> source in Condor :)
>>
>> http://www.patentstorm.us/patents/5978829/description.html
>>
>> Condor provides only the 1-minute load average from the uptime command.
>>
>> Same holds for Moab:
>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
>>
>> And PBS:
>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
>>
>> And MAUI:
>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
>>
>> I vote for reporting only the 1-minute load average.
>>
>> /Peter.
>>
>>> And BTW, by using the uptime(1) load semantics, we loose Windows
>>> support. There is no such attribute there, load is measured in
>>> percentage of non-idle time, and has no direct relationship to the
>>> ready queue lengths.
>>>
>>> Best,
>>> Peter.
>>>
>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
>>>
>>>> SGE tends to look at the 5-minute average, although any can be
>>>> configured.  You could solve it the same way we did for SGE -- offer
>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
>>>>
>>>> Daniel
>>>>
>>>> On 03/22/10 06:05, Peter Tröger wrote:
>>>>> Hi,
>>>>>
>>>>> next remaining thing from OGF28:
>>>>>
>>>>> We support the determination of machineLoad average in the
>>>>> MonitoringSession interface. At OGF, we could not agree on which of
>>>>> the typical intervals (1/5/15 minutes) we want to use here. Maybe
>>>>> all of them ?
>>>>>
>>>>> Best,
>>>>> Peter.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>   drmaa-wg mailing list
>>>>>   drmaa-wg at ogf.org
>>>>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>> --
>>>> drmaa-wg mailing list
>>>> drmaa-wg at ogf.org
>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>
>>> --
>>>   drmaa-wg mailing list
>>>   drmaa-wg at ogf.org
>>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>> --
>>    drmaa-wg mailing list
>>    drmaa-wg at ogf.org
>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>



-- 
Mariusz


More information about the drmaa-wg mailing list