[DRMAA-WG] Load average interval ?

Daniel Templeton daniel.templeton at oracle.com
Thu Mar 25 09:44:09 CDT 2010


I would tend to agree that total core count is more useful.  SGE also 
reports socket count as of 6.2u5, by the way.  (That's actually thanks 
to our own Daniel Gruber.)

Daniel

On 03/25/10 07:03, Mariusz Mamoński wrote:
> Also for me. As we are talking about monitoring interface i propose
> two more changes to the machine monitoring interface:
>
> 1. Having a new data struct called "MachineInfo" with attributes like
> Load, PhysMemory, ... and getMachineInfo(in String machineName) method
> in the Monitoring interface. Rationale: the same as for the JobInfo
> (consistency issue, fetching all machines attributes at once is more
> natural in DRMS APIs then querying for each attribute separately)
>
> 2. change machineCoresPerSocket to machinesCores, if one have
> machineSockets he or she can easily determine the
> machineCoresPerSocket. The problem with the current API is that if the
> DRM do not support "machineSockets" (as far i checked only LSF provide
> this two-level granularity @see Google Doc) we loose the most
> essential information: "how many single processing units do we have on
> single machine?"
>
> Cheers,
>
> On 23 March 2010 23:00, Daniel Templeton<daniel.templeton at oracle.com>  wrote:
>> That's fine with me.
>>
>> Daniel
>>
>> On 03/23/10 13:51, Peter Tröger wrote:
>>>> Any non-SGE opinion ?
>>>
>>> Here is mine:
>>>
>>> I could only find one single source that explains the load average
>>> source in Condor :)
>>>
>>> http://www.patentstorm.us/patents/5978829/description.html
>>>
>>> Condor provides only the 1-minute load average from the uptime command.
>>>
>>> Same holds for Moab:
>>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
>>>
>>> And PBS:
>>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
>>>
>>> And MAUI:
>>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
>>>
>>> I vote for reporting only the 1-minute load average.
>>>
>>> /Peter.
>>>
>>>> And BTW, by using the uptime(1) load semantics, we loose Windows
>>>> support. There is no such attribute there, load is measured in
>>>> percentage of non-idle time, and has no direct relationship to the
>>>> ready queue lengths.
>>>>
>>>> Best,
>>>> Peter.
>>>>
>>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
>>>>
>>>>> SGE tends to look at the 5-minute average, although any can be
>>>>> configured.  You could solve it the same way we did for SGE -- offer
>>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
>>>>>
>>>>> Daniel
>>>>>
>>>>> On 03/22/10 06:05, Peter Tröger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> next remaining thing from OGF28:
>>>>>>
>>>>>> We support the determination of machineLoad average in the
>>>>>> MonitoringSession interface. At OGF, we could not agree on which of
>>>>>> the typical intervals (1/5/15 minutes) we want to use here. Maybe
>>>>>> all of them ?
>>>>>>
>>>>>> Best,
>>>>>> Peter.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>    drmaa-wg mailing list
>>>>>>    drmaa-wg at ogf.org
>>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>> --
>>>>> drmaa-wg mailing list
>>>>> drmaa-wg at ogf.org
>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>
>>>> --
>>>>    drmaa-wg mailing list
>>>>    drmaa-wg at ogf.org
>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>
>>> --
>>>     drmaa-wg mailing list
>>>     drmaa-wg at ogf.org
>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>> --
>>   drmaa-wg mailing list
>>   drmaa-wg at ogf.org
>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>
>
>



More information about the drmaa-wg mailing list