[DRMAA-WG] Call for participation: DRMAA2 machine monitoring

Peter Tröger peter at troeger.eu
Tue Nov 10 08:07:57 CST 2009


In order to trigger some discussion, here is my first quick evaluation for LSF, 
based on

http://www.zdv.uni-mainz.de/cms-extern/lsf/lsf6.0/pdf/manuals/lsf_ref_6.0.pdf
http://www.cisl.ucar.edu/docs/LSF/7.0.3/command_reference/lshosts.cmdref.html
http://ams.cern.ch/AMS/7/admin/troubleshooting.html
http://www.slac.stanford.edu/comp/unix/package/lsf/LSF6.1_doc/html/lsf6.1_admin/manage_hosts.html

> enum OperatingSystem {HPUX, LINUX, TRUE64, DUNIX, OSF1, MACOS, SUNOS, 
> WIN, WINNT, AIX, UNIXWARE, BSD, OTHER}
> 
> enum CpuArchitecture {ALPHA, PA-RISC, X86, X64, IA-64, MIPS, PPC, PPC64, 
>  SPARC, SPARC64, OTHER}
> 
> interface MonitoringSessions{ 
> 
> readonly attribute string[] drmVersionString;
> readonly attribute string[] drmMachineNames;
> int machineSockets(in string machineName);
> int machineCoresPerSocket(in string machineName);
> int machineLoad(in string machineName, in long coreNumber);
> int machinePhysMemory(in string machineName); 
> int machineVirtMemory(in string machineName);
> OperatingSystem machineOS(in string machineName);
> string machineOSVersion(in string machineName);
> CpuArchitecture machineArch(in string machineName);
> 
> };

LSF supports the "lshosts" command, which can show (beside other things) the 
following machine information:

- host name (== machineName)
- type, e.g. CRAYJ, SUNSOL, ALPHA, RS6K, SGI6, HPPA, LINUX86 (== machineOS ???)
- model, e.g. Ultra2, SunSparc, DEC3000, IBM350, R10K, HP715, Intel_IA64, 
Ultra5S, PowerPC_G4, HP300  (== machineArch)
- cpuf, the relative CPU performance factor (== ???)
- ncpus (== machineSockets * machineCoresPerSocket)
- nprocs (== machineSockets)
- ncores (== machineCoresPerSocket)
- maxmem (== machinePhysMemory)
- maxswp (== machineVirtMemory - machinePhysMemory)
- ndisks, the number of local disks (== ???)
- maxtmp, the maximum available temporary space (== ???)

This mapping is still incomplete, but at a first glance, our interface seems to 
fit. Machine load information analysis is still unclear for me. LSF seems to 
support a lot of mainframe / Unix architectures that are missing in our current 
list. The amount of tmp space available might be an interesting addition.

Best,
Peter.

P.S.: In case nobody finds time to contribute, I will skip tomorrow's phone 
call. We need to do the offline work first.


> --- snip
> 
> Some rationales:
> 
> The list of operating systems is a reduced version of the DMTF list I 
> sent earlier, and currently only considers the supported OS types in 
> Condor. The list of CPU architectures is a combination of the supported 
> identifiers in Condor + Debian.
> 
> It is assumed that each OS identifier only makes sense with an OS 
> version number string, which is not standardized by DRMAA. It is 
> tempting to derive this version number string from "uname -r" by 
> default. However, this might be too much of information for a DRMAA 
> application. You would get the Darwin kernel version in MacOS, or the 
> specific minor build revision with a Linux kernel. I think that such 
> information is not really useful for job submission decisions. Instead, 
> I favour the interpretation as true "operating system version", 
> something that does not change when you do software updates on the 
> machine. Some examples:
> 
> Snow Leopard: "MACOS" + "10.6"
> Windows 7: "WINNT" + "6.1"
> Ubuntu Jaunty Jackalope: "LINUX" + "2.6"
> Solaris 10: "SUNOS" + "5.10"
> 
> Things I am not sure about:
> 
> - Do we need to distinguish the different BSD derivations ?
> - Do we really need support for Non-NT Windows and OSF/1 ?
> - Is SCO OpenServer something different from SCO UnixWare ? Is this a 
> relevant separation ?
> - Do we need to add mainframe operating systems ?
> - Do we need a more fine-grained distinguishing between different Sparc 
> processors ?
> - What is missing for LSF / PBS / Globus / ... ?
> 
> Best,
> Peter.
>  
> 
> 
> ------------------------------------------------------------------------
> 
> --
>   drmaa-wg mailing list
>   drmaa-wg at ogf.org
>   http://www.ogf.org/mailman/listinfo/drmaa-wg


More information about the drmaa-wg mailing list