[gin-info] Possible list of subset data for info interop

Wed Mar 1 10:40:28 CST 2006

Thomas,

Our (TeraGrid) definition of a subcluster is a homogeneous set of  
resources,
so moving certain h/w characteristics to the subcluster level will  
enable
users to select subclusters based on their characteristics without  
having
to sift thru the properties of each subcluster node.

Users can submit to a subcluster using queues and/or node properties.

JP

On Mar 1, 2006, at 10:00 AM, Jennifer M. Schopf wrote:

> Thomas-
>
>    I specifically put that list out as a set of attributes not as  
> anything bound to GLUE or CIM so we could pick the information to  
> be communicated in a schema-neutral way. The pieces should be  
> looked at in that way, not as being categorized or grouped really.  
> If you'd like to pull pieces out or group them differently than i  
> have below that's fine.
>
> The total counts were something TG needed, and many users request -  
> and in fact we come up with them in MDS by adding up other  
> attributes, not by having it reported natively. I'm trying to  
> separate out the data we need and the implementation of how it's  
> achieved.
>
> We grouped things into subcluster data because that's how people  
> are using the data - no one wants to wade through 1,000 nodes of  
> information. That's why those attributes are there for our work  
> with TG. This group may instead decide to only have node data,  
> which is a discussion we should have. I would argue quite strongly  
> that if we want this data to be useful, listing subcluster  
> attributes will help. but perhaps that's a second stage.
>
> The unique ids really ARE unique in the data we collect from TG -  
> we preface local names with a resource-specific "unique-fying"  
> attribute. We need a way to differentiate queue names that are the  
> same but at different sites, for example. I would argue quite  
> strongly we need unique identifiers associated with all the  
> components.
>
>  -jen
>
>
>
>
> At 14:54 01/03/2006, Dr. Thomas Soddemann wrote:
>> Hi Jen,
>>
>> just a few thoughts on your suggestions:
>>
>> The GLUE schema addresses ComputingElements (CEs) rather than  
>> queues. IMHO, queues are bound to a specific LRMS. A grid resource  
>> management system like Globus GRAM or UNICORE NJS/TSI can make use  
>> of a queue but its details should not be in the queue definition  
>> (a queue does not need to know that it is used by a GRAM). In the  
>> GLUE case of CEs the definition makes sense since a CE can be seen  
>> as a stand-alone unit.
>> I do not see that the GLUE schema expresses here, what we may need  
>> -- a description of queues, managed by local resource management  
>> systems.
>>
>> - Another tiny thing, do we really need the total counts in the  
>> cluster/subcluster data? A simple XPath expression would give it.
>> - SMPsize should be in the host definition, shouldn't it? Same for  
>> OS and memory.
>> - Storage information should be allowed for host and cluster.
>>
>> What do you (or the GLUE people) mean with the unique ids? If the  
>> hostname is its DN or DNS entry name it certainly is unique. Does  
>> a queue really has to have a unique id? Same for the clusters.
>>
>> Cheers,
>> Thomas
>>
>> P.S.: Maybe we should put the things into an UML diagram in order  
>> to have a clearer view of the whole picture.
>>
>> Jennifer M. Schopf wrote:
>>> Hello-
>>>
>>> At the grid interop session in GGF i said i'd circulate a  
>>> document describing the TeraGrid deployment of MDS4 which  
>>> included a possible starting point of a list of attributes that  
>>> we could ask resources to advertise. The document is attached  
>>> (any comments or interest in having your own MDS4 deployment let  
>>> me know!) The (possible) minimal list of attributes is:
>>>
>>> Queue Data:
>>> Queue name
>>> Unique queue ID
>>> GRAM version
>>> GRAM host name
>>> GRAM port/url
>>> LRMS type LL
>>> LRMS version 3.2.1
>>> Total CPUs 128
>>> Free CPUs 16
>>> Queue status ...
>>> Total jobs 4
>>> Running jobs 4
>>> Waiting jobs 0
>>> Policy  max wall clock time
>>> Policy  max CPU time
>>> Policy  max total jobs
>>> Policy  max running jobs
>>>
>>> cluster/subcluster data:
>>> Type (cluster/Subcluster)
>>> Cluster/subcluster Name
>>> cluster/subcluster Unique ID
>>> Processor type
>>> Processor speed
>>> Total memory
>>> Operating system
>>> SMP size
>>> Total nodes
>>> Storage device name
>>> Storage device size
>>> Storage device available space
>>>
>>> Host data
>>> Host Name
>>> Host Unique ID
>>> Node properties
>>>
>>> Note that about 95% of this is currently available as mapped into  
>>> the GLUE schema, there are a couple additions that aren't (total  
>>> nodes and node properties, maybe one other, not sure).
>>>
>>> -jen
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dr. Jennifer M. Schopf
>>> Scientist eInfrastructure Policy Advisor
>>> Distributed Systems Lab National eScience Centre and JISC
>>> Argonne National Laboratory The University of Edinburgh
>>> jms at mcs.anl.gov jms at nesc.ac.uk
>>> http://www.mcs.anl.gov/~jms http://homepages.nesc.ac.uk/~jms
>>
>>
>>
>
>
> Dr. Jennifer M. Schopf
> Scientist                              eInfrastructure Policy Advisor
> Distributed Systems Lab       National eScience Centre and JISC
> Argonne National Laboratory  The University of Edinburgh
> jms at mcs.anl.gov                 jms at nesc.ac.uk
> http://www.mcs.anl.gov/~jms http://homepages.nesc.ac.uk/~jms
>
>