Web pages vs. Schema Translations (was Re: [gin-info] Notes from July 10 telecon)

Mon Jul 24 16:08:12 CDT 2006

I don't think the decision not to translate between schemas was ever 
(until Jen's recent mail) communicated to anyone who wasn't physically 
present at the Athens and Tokyo meetings. But more to the point, I think 
that's the wrong decision, so I'd like to open this for public 
discussion and present what I think are the two competing proposals, 
which I'll call the "web proposal" and the "translator proposal". Both 
proposals start by identifying a minimal set of information necessary 
for resource selection and end with displaying this information in a web 
page -- it's what happens in the middle that differs.

1. The web proposal to create a web page that will display information 
from various sources by using various APIs (or scripts or whatever) to 
query different information systems, then combine them and display them 
in some uniform fashion (probably in an HTML table).

2. The translator proposal is to translate that minimal set of 
information between existing schemas, advertise at least that minimal 
set of information about all GIN resources in at least one information 
server of each type, and use one of the existing web-based interfaces to 
display it.

Each has its own advantages and disadvantages, which I will try to 
summarize here:

Schema development. I don't want to split hairs about the definition of 
a schema, but my feeling is that both plans to some extent involve 
defining a schema and mappings between the existing schemas and the new 
schema. The translator plan involves creating an intermediary schema 
that's used by the translating software (but that is not exposed to 
users) as part of the work plan. The web page plan involves creating a 
definition of an HTML table to display the collected information in a 
uniform manner and creating mappings of the existing schemas into that 
new schema.

Schema flexibility. In the translator plan, users can view information 
about all GIN resources using whichever existing schema they prefer. In 
the web page plan, users who want to see information regarding all web 
pages must either use the web page schema (i.e., look at the web page) 
or do separate queries and combine data from all existing schemas.

Software development. Creating an "information provider" for one of the 
web-based display systems is probably more or less equivalent to 
creating an information provider to translate from a foreign schema to a 
native one (at least, that's the case in MDS/WebMDS). However, the 
translator plan probably involves writing more of these.

Software deployment. The web plan requires a single deployment at the 
site running the web server. The translator plan requires a deployment 
at each "edge" site.

Query language support. The translator plan allows users to run queries 
using their native query languages that will return information about 
all GIN resources (e.g., "show me all queues that use PBS"). The web 
page plan either requires that users run queries in each of the existing 
query languages and then combine the output or possibly defines and 
implements some new query language. The latter option probably involves 
a fair amount of development work, especially if it supports sorting or 
aggregation operations.

I look forward to hearing people's comments.

-- Laura

On 07/11/2006 02:00 PM, Jennifer M. Schopf wrote:
> Fundamentally, yes, this is a big part of what Charlie said when this 
> work started.
>
> That's why we were NOT going to translate between things and deploy 
> new information providers- the plan had been to leave things in their 
> native schema and simply have a web page (like WebMDS or the Pragma 
> monitoring tool) suck up the data and display it - that piece would 
> need to be built/adapted, but that would be one centralized thing, not 
> re-deploying/writing large numbers of information providers at 
> multiple sites.
>
> -j
>
>
> At 11:22 11/07/2006, Laura Pearlman wrote:
>> Jennifer M. Schopf wrote:
>>> I thought one of the fundamental aspects of GIN was that no new 
>>> software was to be created and deployed?
>>
>> None at all? I think that would severely limit what we could accomplish.
>>
>> I think we need to try our best to limit the restrictions that we put 
>> on the individual grids (e.g., by not requiring that everyone run the 
>> same monitoring system or use the same schema) and that we keep the 
>> gin-related software development as small as possible. But I think we 
>> need to balance that against the actual requirements of the project. 
>> For example, I think it's been accepted for quite some time now that 
>> we will create software to translate specific attributes from one 
>> schema into another.
>>
>> In this particular case, the issue that we have is that some of the 
>> proposed minimal attributes are not currently collected by TeraGrid. 
>> I am proposing that we look at the actual requirements and determine 
>> whether:
>>
>> * We need to collect this information everywhere, because the 
>> proposed GIN applications require it, or
>> * We need to advertise this information where available (that is, it 
>> should not be lost in the translation from one schema to another), 
>> but we don't need to collect it everywhere.
>> If we have a fundamental rule against creating any new software 
>> (other than the translators we've already talked about), then I 
>> suppose the decision is made for us. But it seems to me that it would 
>> make more sense to balance the requirements against the effort 
>> required to implement them -- and that is what I'm asking for the 
>> community's help in doing.
>>
>> -- Laura
>>>
>>> -j
>>>
>>>
>>> At 07:23 11/07/2006, JP Navarro wrote:
>>>> Laura,
>>>>
>>>> See below.
>>>>
>>>> On Jul 11, 2006, at 1:35 AM, Laura Pearlman wrote:
>>>>
>>>>> Attending: Kazu, Yuji, and Laura.
>>>>>
>>>>> TeraGrid resources: after the last meeting, I was going to talk to
>>>>> Stu Martin about what Teragrid resources are available for GIN;
>>>>> however, Stu is on vacation. I'll see whether anyone on tomorrow's
>>>>> wheels call knows the answer.
>>>>
>>>> Stu and I have been working together to perform GIN related
>>>> activities on the UC/ANL TeraGrid cluster. Let me know if
>>>> you'd like to implement something while Stu is on vacation.
>>>>
>>>>> Schema mapping: the spreadsheet that Kazu sent around looks pretty
>>>>> clear, but there are some issues using it for TeraGrid. TeraGrid
>>>>> is using (slightly modified versions of) standard Globus
>>>>> information providers, which report information in GLUE 1.1 schema,
>>>>> not GLUE 1.2. This means that a couple of the elements that appear
>>>>> in the spreadsheet (AuthVO and Software) are not advertised through
>>>>> TeraGrid's information systems. We have, I think, two options for
>>>>> dealing with this, depending on what our requirements are:
>>>>>
>>>>> 1. We could create extensions to the GLUE 1.1 schema to hold this
>>>>> information (the structure of these extensions would be the same as
>>>>> the corresponding elements in GLUE 1.2) and modify the TeraGrid
>>>>> information providers to provide this information.
>>>>
>>>> The TeraGrid schema was extended to meet it's own requirement.
>>>> Extending it further in support of our GIN activities is also
>>>> good. As long as GIN extensions don't break the TeraGrid's
>>>> schema we should implement them on the TeraGrid. Also, it
>>>> would be good to present the other extensions the TeraGrid is
>>>> planning on to the GIN community to determine if it would make
>>>> sense to add them to the GIN schema.
>>>>
>>>> JP
>>>>
>>>>> 2. We could create schema extensions as above, but provide this
>>>>> information only for non-TeraGrid resources (that is, anyone
>>>>> looking at any information system, including TeraGrid's, would see
>>>>> AuthVO and Software information for NAREGI and EGEE resources but
>>>>> not for TeraGrid resources).
>>>>>
>>>>> It would be good to nail down the requirements and choose between
>>>>> these courses of action fairly soon.
>>>>>
>>>>> -- Laura
>>>
>>> ------------------------------------------------------------------------------------------------ 
>>>
>>> Dr. Jennifer M. Schopf
>>> Scientist eInfrastructure Policy Advisor
>>> Distributed Systems Lab National eScience Centre and JISC
>>> Argonne National Laboratory The University of Edinburgh
>>> <mailto:jms at mcs.anl.gov>jms at mcs.anl.gov 
>>> <mailto:jms at nesc.ac.uk>jms at nesc.ac.uk
>>> <http://www.mcs.anl.gov/~jms>http://www.mcs.anl.gov/~jms 
>>> http://homepages.nesc.ac.uk/~jms
>>
>> ------------------------------------------------------------------------------------------------ 
>>
>> Dr. Jennifer M. Schopf
>> Scientist eInfrastructure Policy Advisor
>> Distributed Systems Lab National eScience Centre and JISC
>> Argonne National Laboratory The University of Edinburgh
>> jms at mcs.anl.gov jms at nesc.ac.uk
>> http://www.mcs.anl.gov/~jms http://homepages.nesc.ac.uk/~jms
>>