[DRMAA-WG] Call for participation: DRMAA2 machine monitoring

Mariusz Mamoński mamonski at man.poznan.pl
Tue Nov 10 12:01:03 CST 2009


Dear Peter,


2009/11/10 Peter Tröger <peter at troeger.eu>:
> In order to trigger some discussion, here is my first quick evaluation for LSF,
> based on
>
> http://www.zdv.uni-mainz.de/cms-extern/lsf/lsf6.0/pdf/manuals/lsf_ref_6.0.pdf
> http://www.cisl.ucar.edu/docs/LSF/7.0.3/command_reference/lshosts.cmdref.html
> http://ams.cern.ch/AMS/7/admin/troubleshooting.html
> http://www.slac.stanford.edu/comp/unix/package/lsf/LSF6.1_doc/html/lsf6.1_admin/manage_hosts.html
>
>> enum OperatingSystem {HPUX, LINUX, TRUE64, DUNIX, OSF1, MACOS, SUNOS,
>> WIN, WINNT, AIX, UNIXWARE, BSD, OTHER}
>>
>> enum CpuArchitecture {ALPHA, PA-RISC, X86, X64, IA-64, MIPS, PPC, PPC64,
>>  SPARC, SPARC64, OTHER}
>>
>> interface MonitoringSessions{
>>
>> readonly attribute string[] drmVersionString;
>> readonly attribute string[] drmMachineNames;
>> int machineSockets(in string machineName);
>> int machineCoresPerSocket(in string machineName);
>> int machineLoad(in string machineName, in long coreNumber);
>> int machinePhysMemory(in string machineName);
>> int machineVirtMemory(in string machineName);
>> OperatingSystem machineOS(in string machineName);
>> string machineOSVersion(in string machineName);
>> CpuArchitecture machineArch(in string machineName);
>>
>> };
>
> LSF supports the "lshosts" command, which can show (beside other things) the
> following machine information:
>
> - host name (== machineName)
> - type, e.g. CRAYJ, SUNSOL, ALPHA, RS6K, SGI6, HPPA, LINUX86 (== machineOS ???)
> - model, e.g. Ultra2, SunSparc, DEC3000, IBM350, R10K, HP715, Intel_IA64,
I will add only that LSF can be configured easily to report osname,
osver resources in a JSDL compatible way:
http://www.cisl.ucar.edu/docs/LSF/7.0.3/admin/jsdl.html#wp1593747

> Ultra5S, PowerPC_G4, HP300  (== machineArch)
> - cpuf, the relative CPU performance factor (== ???)
> - ncpus (== machineSockets * machineCoresPerSocket)
> - nprocs (== machineSockets)
> - ncores (== machineCoresPerSocket)
> - maxmem (== machinePhysMemory)
> - maxswp (== machineVirtMemory - machinePhysMemory)
> - ndisks, the number of local disks (== ???)
> - maxtmp, the maximum available temporary space (== ???)
>
> This mapping is still incomplete, but at a first glance, our interface seems to
> fit. Machine load information analysis is still unclear for me. LSF seems to
> support a lot of mainframe / Unix architectures that are missing in our current
> list. The amount of tmp space available might be an interesting addition.
>
> Best,
> Peter.
>
> P.S.: In case nobody finds time to contribute, I will skip tomorrow's phone
> call. We need to do the offline work first.
>
>
>> --- snip
>>
>> Some rationales:
>>
>> The list of operating systems is a reduced version of the DMTF list I
>> sent earlier, and currently only considers the supported OS types in
>> Condor. The list of CPU architectures is a combination of the supported
>> identifiers in Condor + Debian.
>>
>> It is assumed that each OS identifier only makes sense with an OS
>> version number string, which is not standardized by DRMAA. It is
>> tempting to derive this version number string from "uname -r" by
>> default. However, this might be too much of information for a DRMAA
>> application. You would get the Darwin kernel version in MacOS, or the
>> specific minor build revision with a Linux kernel. I think that such
>> information is not really useful for job submission decisions. Instead,
>> I favour the interpretation as true "operating system version",
>> something that does not change when you do software updates on the
>> machine. Some examples:
>>
>> Snow Leopard: "MACOS" + "10.6"
>> Windows 7: "WINNT" + "6.1"
>> Ubuntu Jaunty Jackalope: "LINUX" + "2.6"
>> Solaris 10: "SUNOS" + "5.10"
>>
>> Things I am not sure about:
>>
>> - Do we need to distinguish the different BSD derivations ?
>> - Do we really need support for Non-NT Windows and OSF/1 ?
>> - Is SCO OpenServer something different from SCO UnixWare ? Is this a
>> relevant separation ?
>> - Do we need to add mainframe operating systems ?
>> - Do we need a more fine-grained distinguishing between different Sparc
>> processors ?
>> - What is missing for LSF / PBS / Globus / ... ?
>>
>> Best,
>> Peter.
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> --
>>   drmaa-wg mailing list
>>   drmaa-wg at ogf.org
>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>


Regards,
-- 
Mariusz


More information about the drmaa-wg mailing list