[DRMAA-WG] Load average interval ?
Peter Tröger
peter at troeger.eu
Thu Mar 25 19:02:41 CDT 2010
Condor usually reports the number of cores incl. hyperthreaded ones,
which confirms to the 'concurrent threads' metric Daniel proposed. To
my (negative) surprise, they report nothing else:
http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#16294
When we only look into this case, the according attribute could be
named 'supportedSlots', since we created the understanding of slots as
resources for concurrent job activities / threads / processes. The
sockets attribute would not be implementable in Condor. The value of
the cores attribute could be guessable (supportedSlots/2).
But Condor is not our primary use case ;-)
/Peter.
Am 25.03.2010 um 16:50 schrieb Daniel Gruber:
> I would also vote for the total amount of cores and sockets :)
>
> We could also think about reporting the amount of concurrent
> threads that are supported by the hardware (hyperthreading in
> case of Intel or chip-multithreading in case of Sun T2 processors).
> This could prevent the user for puzzling out what is meant by
> a core (is it a real one, or the hyperthreading/CMT thing).
>
> If not we should at least define that a core is really a physical
> core.
>
> Daniel
>
>
> On 03/25/10 15:44, Daniel Templeton wrote:
>>
>> I would tend to agree that total core count is more useful. SGE also
>> reports socket count as of 6.2u5, by the way. (That's actually
>> thanks
>> to our own Daniel Gruber.)
>>
>> Daniel
>>
>> On 03/25/10 07:03, Mariusz Mamoński wrote:
>>
>>> Also for me. As we are talking about monitoring interface i propose
>>> two more changes to the machine monitoring interface:
>>>
>>> 1. Having a new data struct called "MachineInfo" with attributes
>>> like
>>> Load, PhysMemory, ... and getMachineInfo(in String machineName)
>>> method
>>> in the Monitoring interface. Rationale: the same as for the JobInfo
>>> (consistency issue, fetching all machines attributes at once is more
>>> natural in DRMS APIs then querying for each attribute separately)
>>>
>>> 2. change machineCoresPerSocket to machinesCores, if one have
>>> machineSockets he or she can easily determine the
>>> machineCoresPerSocket. The problem with the current API is that if
>>> the
>>> DRM do not support "machineSockets" (as far i checked only LSF
>>> provide
>>> this two-level granularity @see Google Doc) we loose the most
>>> essential information: "how many single processing units do we
>>> have on
>>> single machine?"
>>>
>>> Cheers,
>>>
>>> On 23 March 2010 23:00, Daniel
>>> Templeton<daniel.templeton at oracle.com> wrote:
>>>
>>>> That's fine with me.
>>>>
>>>> Daniel
>>>>
>>>> On 03/23/10 13:51, Peter Tröger wrote:
>>>>
>>>>>> Any non-SGE opinion ?
>>>>>>
>>>>> Here is mine:
>>>>>
>>>>> I could only find one single source that explains the load average
>>>>> source in Condor :)
>>>>>
>>>>> http://www.patentstorm.us/patents/5978829/description.html
>>>>>
>>>>> Condor provides only the 1-minute load average from the uptime
>>>>> command.
>>>>>
>>>>> Same holds for Moab:
>>>>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
>>>>>
>>>>> And PBS:
>>>>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
>>>>>
>>>>> And MAUI:
>>>>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
>>>>>
>>>>> I vote for reporting only the 1-minute load average.
>>>>>
>>>>> /Peter.
>>>>>
>>>>>
>>>>>> And BTW, by using the uptime(1) load semantics, we loose Windows
>>>>>> support. There is no such attribute there, load is measured in
>>>>>> percentage of non-idle time, and has no direct relationship to
>>>>>> the
>>>>>> ready queue lengths.
>>>>>>
>>>>>> Best,
>>>>>> Peter.
>>>>>>
>>>>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
>>>>>>
>>>>>>
>>>>>>> SGE tends to look at the 5-minute average, although any can be
>>>>>>> configured. You could solve it the same way we did for SGE --
>>>>>>> offer
>>>>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 03/22/10 06:05, Peter Tröger wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> next remaining thing from OGF28:
>>>>>>>>
>>>>>>>> We support the determination of machineLoad average in the
>>>>>>>> MonitoringSession interface. At OGF, we could not agree on
>>>>>>>> which of
>>>>>>>> the typical intervals (1/5/15 minutes) we want to use here.
>>>>>>>> Maybe
>>>>>>>> all of them ?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> drmaa-wg mailing list
>>>>>>>> drmaa-wg at ogf.org
>>>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>>
>>>>>>> --
>>>>>>> drmaa-wg mailing list
>>>>>>> drmaa-wg at ogf.org
>>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>>
>>>>>> --
>>>>>> drmaa-wg mailing list
>>>>>> drmaa-wg at ogf.org
>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>>
>>>>> --
>>>>> drmaa-wg mailing list
>>>>> drmaa-wg at ogf.org
>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>>
>>>> --
>>>> drmaa-wg mailing list
>>>> drmaa-wg at ogf.org
>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>>>
>>>>
>>>
>>
>> --
>> drmaa-wg mailing list
>> drmaa-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/drmaa-wg
>>
>
> --
> drmaa-wg mailing list
> drmaa-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/drmaa-wg
More information about the drmaa-wg
mailing list