[DRMAA-WG] Load average interval ?

Thu Mar 25 19:09:56 CDT 2010

Quoting [Peter Tr?ger] (Mar 26 2010):
> 
> Condor usually reports the number of cores incl. hyperthreaded ones,  
> which confirms to the 'concurrent threads' metric Daniel proposed. To  
> my (negative) surprise, they report nothing else:
> 
> http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#16294
> 
> When we only look into this case, the according attribute could be  
> named 'supportedSlots', since we created the understanding of slots as  
> resources for concurrent job activities / threads / processes. The  
> sockets attribute would not be implementable in Condor. The value of  
> the cores attribute could be guessable (supportedSlots/2).

Please don't hardcode that number '2': that is only valid for Intels
Hyperthreading, and only at this point in time... ;-)

Anyway: if one has to chose, the hardware threads are likely more
useful than cores, IMHO, although learning both, or even the full
hierarchy (nodes/sockets/cores/threads) would be simply nice...

Best, Andre.

> 
> But Condor is not our primary use case ;-)
> 
> /Peter.
> 
> 
> Am 25.03.2010 um 16:50 schrieb Daniel Gruber:
> 
> > I would also vote for the total amount of cores and sockets :)
> >
> > We could also think about reporting the amount of concurrent
> > threads that are supported by the hardware (hyperthreading in
> > case of Intel or chip-multithreading in case of Sun T2 processors).
> > This could prevent the user for puzzling out what is meant by
> > a core (is it a real one, or the hyperthreading/CMT thing).
> >
> > If not we should at least define that a core is really a physical  
> > core.
> >
> > Daniel
> >
> >
> > On 03/25/10 15:44, Daniel Templeton wrote:
> >>
> >> I would tend to agree that total core count is more useful.  SGE also
> >> reports socket count as of 6.2u5, by the way.  (That's actually  
> >> thanks
> >> to our own Daniel Gruber.)
> >>
> >> Daniel
> >>
> >> On 03/25/10 07:03, Mariusz Mamo??ski wrote:
> >>
> >>> Also for me. As we are talking about monitoring interface i propose
> >>> two more changes to the machine monitoring interface:
> >>>
> >>> 1. Having a new data struct called "MachineInfo" with attributes  
> >>> like
> >>> Load, PhysMemory, ... and getMachineInfo(in String machineName)  
> >>> method
> >>> in the Monitoring interface. Rationale: the same as for the JobInfo
> >>> (consistency issue, fetching all machines attributes at once is more
> >>> natural in DRMS APIs then querying for each attribute separately)
> >>>
> >>> 2. change machineCoresPerSocket to machinesCores, if one have
> >>> machineSockets he or she can easily determine the
> >>> machineCoresPerSocket. The problem with the current API is that if  
> >>> the
> >>> DRM do not support "machineSockets" (as far i checked only LSF  
> >>> provide
> >>> this two-level granularity @see Google Doc) we loose the most
> >>> essential information: "how many single processing units do we  
> >>> have on
> >>> single machine?"
> >>>
> >>> Cheers,
> >>>
> >>> On 23 March 2010 23:00, Daniel  
> >>> Templeton<daniel.templeton at oracle.com>  wrote:
> >>>
> >>>> That's fine with me.
> >>>>
> >>>> Daniel
> >>>>
> >>>> On 03/23/10 13:51, Peter Tröger wrote:
> >>>>
> >>>>>> Any non-SGE opinion ?
> >>>>>>
> >>>>> Here is mine:
> >>>>>
> >>>>> I could only find one single source that explains the load average
> >>>>> source in Condor :)
> >>>>>
> >>>>> http://www.patentstorm.us/patents/5978829/description.html
> >>>>>
> >>>>> Condor provides only the 1-minute load average from the uptime  
> >>>>> command.
> >>>>>
> >>>>> Same holds for Moab:
> >>>>> http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
> >>>>>
> >>>>> And PBS:
> >>>>> http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonALISA
> >>>>>
> >>>>> And MAUI:
> >>>>> https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
> >>>>>
> >>>>> I vote for reporting only the 1-minute load average.
> >>>>>
> >>>>> /Peter.
> >>>>>
> >>>>>
> >>>>>> And BTW, by using the uptime(1) load semantics, we loose Windows
> >>>>>> support. There is no such attribute there, load is measured in
> >>>>>> percentage of non-idle time, and has no direct relationship to  
> >>>>>> the
> >>>>>> ready queue lengths.
> >>>>>>
> >>>>>> Best,
> >>>>>> Peter.
> >>>>>>
> >>>>>> Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
> >>>>>>
> >>>>>>
> >>>>>>> SGE tends to look at the 5-minute average, although any can be
> >>>>>>> configured.  You could solve it the same way we did for SGE --  
> >>>>>>> offer
> >>>>>>> three: machineLoadShort, machineLoadMed, machineLoadLong.
> >>>>>>>
> >>>>>>> Daniel
> >>>>>>>
> >>>>>>> On 03/22/10 06:05, Peter Tröger wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> next remaining thing from OGF28:
> >>>>>>>>
> >>>>>>>> We support the determination of machineLoad average in the
> >>>>>>>> MonitoringSession interface. At OGF, we could not agree on  
> >>>>>>>> which of
> >>>>>>>> the typical intervals (1/5/15 minutes) we want to use here.  
> >>>>>>>> Maybe
> >>>>>>>> all of them ?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>    drmaa-wg mailing list
> >>>>>>>>    drmaa-wg at ogf.org
> >>>>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
> >>>>>>>>
> >>>>>>> --
> >>>>>>> drmaa-wg mailing list
> >>>>>>> drmaa-wg at ogf.org
> >>>>>>> http://www.ogf.org/mailman/listinfo/drmaa-wg
> >>>>>>>
> >>>>>> --
> >>>>>>    drmaa-wg mailing list
> >>>>>>    drmaa-wg at ogf.org
> >>>>>>    http://www.ogf.org/mailman/listinfo/drmaa-wg
> >>>>>>
> >>>>> --
> >>>>>     drmaa-wg mailing list
> >>>>>     drmaa-wg at ogf.org
> >>>>>     http://www.ogf.org/mailman/listinfo/drmaa-wg
> >>>>>
> >>>> --
> >>>>   drmaa-wg mailing list
> >>>>   drmaa-wg at ogf.org
> >>>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
> >>>>
> >>>>
> >>>
> >>
> >> --
> >>   drmaa-wg mailing list
> >>   drmaa-wg at ogf.org
> >>   http://www.ogf.org/mailman/listinfo/drmaa-wg
> >>
> >
> > --
> >  drmaa-wg mailing list
> >  drmaa-wg at ogf.org
> >  http://www.ogf.org/mailman/listinfo/drmaa-wg
-- 
Nothing is ever easy.