[DRMAA-WG] Load average interval ?

Mariusz Mamoński mamonski at man.poznan.pl
Fri Mar 26 09:42:39 CDT 2010


On 26 March 2010 15:36, Daniel Templeton <daniel.templeton at oracle.com> wrote:
> The concept of slots in SGE is only loosely bound to CPU architecture.
> We assume a slot per thread or core, but it's only a suggestion.
> Administrators can configure an arbitrary number of slots.  For example,
> the 1-node test cluster I have running on my workstation current has
> over 200 slots on a dual-core machine.
Is it common to observe production system that permits oversubsription
of cpus? We can always add slots as machineInfo attribute additional
(or instead of) to cpu/cores.
>
> Daniel
>
> On 03/25/10 17:09, Andre Merzky wrote:
>> Quoting [Peter Tr?ger] (Mar 26 2010):
>>>
>>> Condor usually reports the number of cores incl. hyperthreaded ones,
>>> which confirms to the 'concurrent threads' metric Daniel proposed. To
>>> my (negative) surprise, they report nothing else:
>>>
>>> http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#16294
>>>
>>> When we only look into this case, the according attribute could be
>>> named 'supportedSlots', since we created the understanding of slots as
>>> resources for concurrent job activities / threads / processes. The
>>> sockets attribute would not be implementable in Condor. The value of
>>> the cores attribute could be guessable (supportedSlots/2).
>>
>> Please don't hardcode that number '2': that is only valid for Intels
>> Hyperthreading, and only at this point in time... ;-)
>>
>> Anyway: if one has to chose, the hardware threads are likely more
>> useful than cores, IMHO, although learning both, or even the full
>> hierarchy (nodes/sockets/cores/threads) would be simply nice...
>>
>> Best, Andre.
>>
>>
>>>
>>> But Condor is not our primary use case ;-)
>>>
>>> /Peter.
>>>
>>>
>>> Am 25.03.2010 um 16:50 schrieb Daniel Gruber:
>>>
>>>> I would also vote for the total amount of cores and sockets :)
>>>>
>>>> We could also think about reporting the amount of concurrent
>>>> threads that are supported by the hardware (hyperthreading in
>>>> case of Intel or chip-multithreading in case of Sun T2 processors).
>>>> This could prevent the user for puzzling out what is meant by
>>>> a core (is it a real one, or the hyperthreading/CMT thing).
>>>>
>>>> If not we should at least define that a core is really a physical
>>>> core.
>>>>
>>>> Daniel
>>>>
>>>>
>>>> On 03/25/10 15:44, Daniel Templeton wrote:
>>>>>
>>>>> I would tend to agree that total core count is more useful.  SGE also
>>>>> reports socket count as of 6.2u5, by the way.  (That's actually
>>>>> thanks
>>>>> to our own Daniel Gruber.)
>>>>>
>>>>> Daniel
>>>>>
>>>>> On 03/25/10 07:03, Mariusz Mamo??ski wrote:
>>>>>
>>>>>> Also for me. As we are talking about monitoring interface i propose
>>>>>> two more changes to the machine monitoring interface:
>>>>>>
>>>>>> 1. Having a new data struct called "MachineInfo" with attributes
>>>>>> like
>>>>>> Load, PhysMemory, ... and getMachineInfo(in String machineName)
>>>>>> method
>>>>>> in the Monitoring interface. Rationale: the same as for the JobInfo
>>>>>> (consistency issue, fetching all machines attributes at once is more
>>>>>> natural in DRMS APIs then querying for each attribute separately)
>>>>>>
>>>>>> 2. change machineCoresPerSocket to machinesCores, if one have
>>>>>> machineSockets he or she can easily determine the
>>>>>> machineCoresPerSocket. The problem with the current API is that if
>>>>>> the
>>>>>> DRM do not support "machineSockets" (as far i checked only LSF
>>>>>> provide
>>>>>> this two-level granularity @see Google Doc) we loose the most
>>>>>> essential information: "how many single processing units do we
>>>>>> have on
>>>>>> single machine?"
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> On 23 March 2010 23:00, Daniel
>>>>>> Templeton<daniel.templeton at oracle.com>   wrote:
>>>>>>
>>>>>>> That's fine with me.
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 03/23/10 13:51, Peter Tröger wrote:
>>>>>>>
>>>>>>>>> Any non-SGE opinion ?
>>>>>>>>>
>>>>>>>> Here is mine:
>>>>>>>>
>>>>>>>> I could only find one single source that explains the load average
>>>>>>>> source in Condor :)
>>>>>>>>
>>>>>>>> http://www.patentstorm.us/patents/5978829/description.html
>>>>>>>>
>>>>>>>> Condor provides only the 1-minute load average from the uptime
>>>>>>>> command.
>>>>>>>>
>>>>>>>> Same holds for Moab:
>>>>>>>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
>>>>>>>>
>>>>>>>> And PBS:
>>>>>>>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
>>>>>>>>
>>>>>>>> And MAUI:
>>>>>>>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
>>>>>>>>
>>>>>>>> I vote for reporting only the 1-minute load average.
>>>>>>>>
>>>>>>>> /Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>> And BTW, by using the uptime(1) load semantics, we loose Windows
>>>>>>>>> support. There is no such attribute there, load is measured in
>>>>>>>>> percentage of non-idle time, and has no direct relationship to
>>>>>>>>> the
>>>>>>>>> ready queue lengths.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> SGE tends to look at the 5-minute average, although any can be
>>>>>>>>>> configured.  You could solve it the same way we did for SGE --
>>>>>>>>>> offer
>>>>>>>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
>>>>>>>>>>
>>>>>>>>>> Daniel
>>>>>>>>>>
>>>>>>>>>> On 03/22/10 06:05, Peter Tröger wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> next remaining thing from OGF28:
>>>>>>>>>>>
>>>>>>>>>>> We support the determination of machineLoad average in the
>>>>>>>>>>> MonitoringSession interface. At OGF, we could not agree on
>>>>>>>>>>> which of
>>>>>>>>>>> the typical intervals (1/5/15 minutes) we want to use here.
>>>>>>>>>>> Maybe
>>>>>>>>>>> all of them ?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Peter.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>     drmaa-wg mailing list
>>>>>>>>>>>     drmaa-wg at ogf.org
>>>>>>>>>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> drmaa-wg mailing list
>>>>>>>>>> drmaa-wg at ogf.org
>>>>>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>     drmaa-wg mailing list
>>>>>>>>>     drmaa-wg at ogf.org
>>>>>>>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>>
>>>>>>>> --
>>>>>>>>      drmaa-wg mailing list
>>>>>>>>      drmaa-wg at ogf.org
>>>>>>>>      http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>
>>>>>>> --
>>>>>>>    drmaa-wg mailing list
>>>>>>>    drmaa-wg at ogf.org
>>>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>    drmaa-wg mailing list
>>>>>    drmaa-wg at ogf.org
>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>
>>>>
>>>> --
>>>>   drmaa-wg mailing list
>>>>   drmaa-wg at ogf.org
>>>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>



-- 
Mariusz


More information about the drmaa-wg mailing list