[DRMAA-WG] Load average interval ?

Daniel Templeton daniel.templeton at oracle.com
Fri Mar 26 09:36:37 CDT 2010


The concept of slots in SGE is only loosely bound to CPU architecture. 
We assume a slot per thread or core, but it's only a suggestion. 
Administrators can configure an arbitrary number of slots.  For example, 
the 1-node test cluster I have running on my workstation current has 
over 200 slots on a dual-core machine.

Daniel

On 03/25/10 17:09, Andre Merzky wrote:
> Quoting [Peter Tr?ger] (Mar 26 2010):
>>
>> Condor usually reports the number of cores incl. hyperthreaded ones,
>> which confirms to the 'concurrent threads' metric Daniel proposed. To
>> my (negative) surprise, they report nothing else:
>>
>> http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#16294
>>
>> When we only look into this case, the according attribute could be
>> named 'supportedSlots', since we created the understanding of slots as
>> resources for concurrent job activities / threads / processes. The
>> sockets attribute would not be implementable in Condor. The value of
>> the cores attribute could be guessable (supportedSlots/2).
>
> Please don't hardcode that number '2': that is only valid for Intels
> Hyperthreading, and only at this point in time... ;-)
>
> Anyway: if one has to chose, the hardware threads are likely more
> useful than cores, IMHO, although learning both, or even the full
> hierarchy (nodes/sockets/cores/threads) would be simply nice...
>
> Best, Andre.
>
>
>>
>> But Condor is not our primary use case ;-)
>>
>> /Peter.
>>
>>
>> Am 25.03.2010 um 16:50 schrieb Daniel Gruber:
>>
>>> I would also vote for the total amount of cores and sockets :)
>>>
>>> We could also think about reporting the amount of concurrent
>>> threads that are supported by the hardware (hyperthreading in
>>> case of Intel or chip-multithreading in case of Sun T2 processors).
>>> This could prevent the user for puzzling out what is meant by
>>> a core (is it a real one, or the hyperthreading/CMT thing).
>>>
>>> If not we should at least define that a core is really a physical
>>> core.
>>>
>>> Daniel
>>>
>>>
>>> On 03/25/10 15:44, Daniel Templeton wrote:
>>>>
>>>> I would tend to agree that total core count is more useful.  SGE also
>>>> reports socket count as of 6.2u5, by the way.  (That's actually
>>>> thanks
>>>> to our own Daniel Gruber.)
>>>>
>>>> Daniel
>>>>
>>>> On 03/25/10 07:03, Mariusz Mamo??ski wrote:
>>>>
>>>>> Also for me. As we are talking about monitoring interface i propose
>>>>> two more changes to the machine monitoring interface:
>>>>>
>>>>> 1. Having a new data struct called "MachineInfo" with attributes
>>>>> like
>>>>> Load, PhysMemory, ... and getMachineInfo(in String machineName)
>>>>> method
>>>>> in the Monitoring interface. Rationale: the same as for the JobInfo
>>>>> (consistency issue, fetching all machines attributes at once is more
>>>>> natural in DRMS APIs then querying for each attribute separately)
>>>>>
>>>>> 2. change machineCoresPerSocket to machinesCores, if one have
>>>>> machineSockets he or she can easily determine the
>>>>> machineCoresPerSocket. The problem with the current API is that if
>>>>> the
>>>>> DRM do not support "machineSockets" (as far i checked only LSF
>>>>> provide
>>>>> this two-level granularity @see Google Doc) we loose the most
>>>>> essential information: "how many single processing units do we
>>>>> have on
>>>>> single machine?"
>>>>>
>>>>> Cheers,
>>>>>
>>>>> On 23 March 2010 23:00, Daniel
>>>>> Templeton<daniel.templeton at oracle.com>   wrote:
>>>>>
>>>>>> That's fine with me.
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> On 03/23/10 13:51, Peter Tröger wrote:
>>>>>>
>>>>>>>> Any non-SGE opinion ?
>>>>>>>>
>>>>>>> Here is mine:
>>>>>>>
>>>>>>> I could only find one single source that explains the load average
>>>>>>> source in Condor :)
>>>>>>>
>>>>>>> http://www.patentstorm.us/patents/5978829/description.html
>>>>>>>
>>>>>>> Condor provides only the 1-minute load average from the uptime
>>>>>>> command.
>>>>>>>
>>>>>>> Same holds for Moab:
>>>>>>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
>>>>>>>
>>>>>>> And PBS:
>>>>>>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
>>>>>>>
>>>>>>> And MAUI:
>>>>>>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
>>>>>>>
>>>>>>> I vote for reporting only the 1-minute load average.
>>>>>>>
>>>>>>> /Peter.
>>>>>>>
>>>>>>>
>>>>>>>> And BTW, by using the uptime(1) load semantics, we loose Windows
>>>>>>>> support. There is no such attribute there, load is measured in
>>>>>>>> percentage of non-idle time, and has no direct relationship to
>>>>>>>> the
>>>>>>>> ready queue lengths.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
>>>>>>>>
>>>>>>>>
>>>>>>>>> SGE tends to look at the 5-minute average, although any can be
>>>>>>>>> configured.  You could solve it the same way we did for SGE --
>>>>>>>>> offer
>>>>>>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
>>>>>>>>>
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>> On 03/22/10 06:05, Peter Tröger wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> next remaining thing from OGF28:
>>>>>>>>>>
>>>>>>>>>> We support the determination of machineLoad average in the
>>>>>>>>>> MonitoringSession interface. At OGF, we could not agree on
>>>>>>>>>> which of
>>>>>>>>>> the typical intervals (1/5/15 minutes) we want to use here.
>>>>>>>>>> Maybe
>>>>>>>>>> all of them ?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>     drmaa-wg mailing list
>>>>>>>>>>     drmaa-wg at ogf.org
>>>>>>>>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> drmaa-wg mailing list
>>>>>>>>> drmaa-wg at ogf.org
>>>>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>>
>>>>>>>> --
>>>>>>>>     drmaa-wg mailing list
>>>>>>>>     drmaa-wg at ogf.org
>>>>>>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>
>>>>>>> --
>>>>>>>      drmaa-wg mailing list
>>>>>>>      drmaa-wg at ogf.org
>>>>>>>      http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>
>>>>>> --
>>>>>>    drmaa-wg mailing list
>>>>>>    drmaa-wg at ogf.org
>>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>>    drmaa-wg mailing list
>>>>    drmaa-wg at ogf.org
>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>
>>>
>>> --
>>>   drmaa-wg mailing list
>>>   drmaa-wg at ogf.org
>>>   http://www.ogf.org/mailman/listinfo/drmaa-wg



More information about the drmaa-wg mailing list