[DRMAA-WG] Load average interval ?

Peter Tröger peter at troeger.eu
Thu Mar 25 19:02:41 CDT 2010


Condor usually reports the number of cores incl. hyperthreaded ones,  
which confirms to the 'concurrent threads' metric Daniel proposed. To  
my (negative) surprise, they report nothing else:

http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#16294

When we only look into this case, the according attribute could be  
named 'supportedSlots', since we created the understanding of slots as  
resources for concurrent job activities / threads / processes. The  
sockets attribute would not be implementable in Condor. The value of  
the cores attribute could be guessable (supportedSlots/2).

But Condor is not our primary use case ;-)

/Peter.


Am 25.03.2010 um 16:50 schrieb Daniel Gruber:

> I would also vote for the total amount of cores and sockets :)
>
> We could also think about reporting the amount of concurrent
> threads that are supported by the hardware (hyperthreading in
> case of Intel or chip-multithreading in case of Sun T2 processors).
> This could prevent the user for puzzling out what is meant by
> a core (is it a real one, or the hyperthreading/CMT thing).
>
> If not we should at least define that a core is really a physical  
> core.
>
> Daniel
>
>
> On 03/25/10 15:44, Daniel Templeton wrote:
>>
>> I would tend to agree that total core count is more useful.  SGE also
>> reports socket count as of 6.2u5, by the way.  (That's actually  
>> thanks
>> to our own Daniel Gruber.)
>>
>> Daniel
>>
>> On 03/25/10 07:03, Mariusz Mamoński wrote:
>>
>>> Also for me. As we are talking about monitoring interface i propose
>>> two more changes to the machine monitoring interface:
>>>
>>> 1. Having a new data struct called "MachineInfo" with attributes  
>>> like
>>> Load, PhysMemory, ... and getMachineInfo(in String machineName)  
>>> method
>>> in the Monitoring interface. Rationale: the same as for the JobInfo
>>> (consistency issue, fetching all machines attributes at once is more
>>> natural in DRMS APIs then querying for each attribute separately)
>>>
>>> 2. change machineCoresPerSocket to machinesCores, if one have
>>> machineSockets he or she can easily determine the
>>> machineCoresPerSocket. The problem with the current API is that if  
>>> the
>>> DRM do not support "machineSockets" (as far i checked only LSF  
>>> provide
>>> this two-level granularity @see Google Doc) we loose the most
>>> essential information: "how many single processing units do we  
>>> have on
>>> single machine?"
>>>
>>> Cheers,
>>>
>>> On 23 March 2010 23:00, Daniel  
>>> Templeton<daniel.templeton at oracle.com>  wrote:
>>>
>>>> That's fine with me.
>>>>
>>>> Daniel
>>>>
>>>> On 03/23/10 13:51, Peter Tröger wrote:
>>>>
>>>>>> Any non-SGE opinion ?
>>>>>>
>>>>> Here is mine:
>>>>>
>>>>> I could only find one single source that explains the load average
>>>>> source in Condor :)
>>>>>
>>>>> http://www.patentstorm.us/patents/5978829/description.html
>>>>>
>>>>> Condor provides only the 1-minute load average from the uptime  
>>>>> command.
>>>>>
>>>>> Same holds for Moab:
>>>>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
>>>>>
>>>>> And PBS:
>>>>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
>>>>>
>>>>> And MAUI:
>>>>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
>>>>>
>>>>> I vote for reporting only the 1-minute load average.
>>>>>
>>>>> /Peter.
>>>>>
>>>>>
>>>>>> And BTW, by using the uptime(1) load semantics, we loose Windows
>>>>>> support. There is no such attribute there, load is measured in
>>>>>> percentage of non-idle time, and has no direct relationship to  
>>>>>> the
>>>>>> ready queue lengths.
>>>>>>
>>>>>> Best,
>>>>>> Peter.
>>>>>>
>>>>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
>>>>>>
>>>>>>
>>>>>>> SGE tends to look at the 5-minute average, although any can be
>>>>>>> configured.  You could solve it the same way we did for SGE --  
>>>>>>> offer
>>>>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 03/22/10 06:05, Peter Tröger wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> next remaining thing from OGF28:
>>>>>>>>
>>>>>>>> We support the determination of machineLoad average in the
>>>>>>>> MonitoringSession interface. At OGF, we could not agree on  
>>>>>>>> which of
>>>>>>>> the typical intervals (1/5/15 minutes) we want to use here.  
>>>>>>>> Maybe
>>>>>>>> all of them ?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>    drmaa-wg mailing list
>>>>>>>>    drmaa-wg at ogf.org
>>>>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>
>>>>>>> --
>>>>>>> drmaa-wg mailing list
>>>>>>> drmaa-wg at ogf.org
>>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>
>>>>>> --
>>>>>>    drmaa-wg mailing list
>>>>>>    drmaa-wg at ogf.org
>>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>
>>>>> --
>>>>>     drmaa-wg mailing list
>>>>>     drmaa-wg at ogf.org
>>>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>
>>>> --
>>>>   drmaa-wg mailing list
>>>>   drmaa-wg at ogf.org
>>>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>
>>>>
>>>
>>
>> --
>>   drmaa-wg mailing list
>>   drmaa-wg at ogf.org
>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg



More information about the drmaa-wg mailing list