[gin-info] Possible list of subset data for info interop

Thu Mar 2 09:18:53 CST 2006

The glue schema has been around for about 4 years and has been used in production on LCG for over three years. This is about a long as the ARC schema has been used in Nordugrid. There has been significant experience in this area and the problem of describing a CE and the limitations with the current schemas are quite well understood. The problems we currently face are with other areas such as how we describe in the information system the channels between different sites used by a file transfer service.  The recent work on interoperations between LCG and ARC required the translation of the ARC schema to Glue. This mapping has already highlighted the important attributes that are required for minimal submission.

The attributes are used to query to the information system and therefore it is the common use cases and  queries that are important, the attributes should automatically come out of this.

In the case of the CE, it should abstract out the details of the different systems to a common view that is useful to the user. The following is a list of queries group into specific areas which show the use case for each attribute. 

1) Resource Discovery

a) Show me all the endpoints for a specific service. 

ServiceEndPoint
ServiceType

In this case ServiceType would be "Computing Service" and EndPoint would be a URL such  as

gram://hostname:port/other-info
arc://hostname:port/other-info 

b) Show all the end points which I can use. 

ServiceAuth (or some similar name)
ServiceStatus

ServiceAuth should be the VO/group/roles that the services supports.  For example I only want endpoints where I have the privileges of production manager for my VO. The format for the description of VO/groups/roles is currently missing and I believe that this is something that the gin-security group should be looking at. 

2) Service Selection

For this, we must bear in mind that in the near future, Jobs might run in Virtual Machines on the worker node.

a) Show me all the end points where the hardware is...

Insert list of attributes here. 

This implies that a CE end point represents a list of homogenous machines. 

b) Show me the end points where the software environment is...

Insert list of attributes here.

This is difficult, we can't publish every package that is on the node and OS tag is not useful as different installations could install different packages. At the moment there is the  concept of a RunTime environment. 

3) Service Optimization.

a) Given the set of end points, which is the best to use or which ones are better to use. These metrics should be based on the vo/goups/roles

Currently use
TotalJobSlots
RunningJobs
WaitingJobs
EstimatedTraversalTime

These might be better described as percentages at the moment they are not. 

In summary a CE is an endpoint that represents a set of homogeneous machines and it is the privileges that your roles gives you which are important. Everything else should be hidden.

These attributes can be used for all services.
ServiceEndPoint
ServiceType
ServiceAuth
ServiceStatus

This is the metadata we need to know about the end point.
MemorySize
ProcessorSpeed
RuntimeEnvironment
MaxWallClock
MPISupport

For each vo/group/role supported we need to know
TotalJobSlots
RunningJobs
WaitingJobs
EstimatedTraversalTime

This list is similar to the list that Jenny sent around however there are fewer attributes.  What is missing is service monitoring, accounting, A link to the other services and parameters required for operations. However, I believe this is the minimal set that we need for job submission and I know that both Glue and ARC should map to this. 

Laurence