[RUS-WG] [UR-WG] OGF20 Session recommendations

Xiaoyu Chen Xiaoyu.Chen at brunel.ac.uk
Fri May 4 07:45:42 CDT 2007


Hello, Rosario and everyone:
 
         Just one more RUS usage scenario for OSG Gratia to show another possible implementation of RUS-like or usage reporting system for performance improvement.
   
         Usage information is metered on the completion of jobs at worknodes. These usage information are stored in local file system at worknodes. A centralised collector Web service accepts these job usage records and stores in file system as well. Within the collector machine/server, two configuration files, user acccounts and CPU information, are initialised to provide user-usage and resource-usage mapping information used for generation of a complete OGF-UR records. However, there is no RUS implementation at the moment for usage reporting. A seperation publisher component is used instead that periodically parse OGF-UR usage repository and store aggregate usage information in relational database for VO and User aggregate view. 
 
        Same situation in LCG, usage information at metered and collected from sites among three grid productions. These usage informaiton are stored at sites in LCG-specific schema which is SQL-based. These site-level usage table are shared to GOC via R-GMA. The central relational database are maintained in GOC and is reported via a PHP web portal. 
 
       Obviously there is no RUS implementation in existing grid productions for usage monitoring and tracing, even though RUS is potentially specified to provide this functionality through extraction service interface defintions. There is a clear gap betwen real accounting system and RUS. So shouldn't the specification designer consider the fact? I agree service interface definition based on XML document does provide certain flexiblity and standardisation. I am also interested how DGAS to implement RUS over usage record stored in HLR as a reporting system. As i know, DGAS share the same usage schema as LCG/APEL at present. Does it has certain mapping mechanism for SQL-XML mapping?
 
      cheers!
      
      xiaoyu

________________________________

From: Rosario Michael Piro [mailto:piro at to.infn.it]
Sent: Fri 04/05/2007 12:01
To: Xiaoyu Chen
Cc: Mailing List for RUS-WG
Subject: Re: [RUS-WG] [UR-WG] OGF20 Session recommendations



Hi Xiaoyu,

(I add the mailing list in CC, I suppose you forgot it in your reply,
but the topic is surely interesting for the other memebers as well).

Xiaoyu Chen wrote:
> hello, Rosario:
> 
>          thanks for you comments! Just a little arguments on:
> ------------------------------------------------------
>         4). Batch Query
>>                 Most of applicatoins have been diverted from XML database into Relational DB for UR storage because of performance.
>
> Actually nearly all implementations I know of so far use XML databases,
> not relational DBs.
> -------------------------------------------------------
> 
> MCS-RUS developed by manchester for gMarket uses Xindice initially and now diverted into Oracle now because of unbearable low performance when processing huge amount of usage data. And NGS in UK and LCG at RAL both store usage record in relational database. As a grid production, people more care about performance than standard comformance. So here is the question, in what extension, a specification concerns about implementations, because the specification is designed for implementations?
> 

Yes, I bet the performance is horrible when using native XML databases,
but that's an implementation problem, not a problem of the interface.
(And by the way, relational databases as well can be quite slow if the
queries envolve many join operations on large amounts of data).

DGAS as well stores its legacy records in a relational database, so I'm
perfectly aware that performance is a major concern of production
environments, bot the most important thing that we have to keep in mind
when defining the RUS specifications is _standardization_ and since we
treat XML documents we can achieve standardization only by applying
standard XML procedures. If, for example, we would allow for query
statements to be written in SQL instead of XPath then we would give up
all standardization efforts, because SQL queries that work on your
database will not work on mine and vice versa. That means if I would
want to implement a client that is capabale of querying your RUS server
with SQL statements, then I would have to keep in mind exaclty how your
database is organized (table names, column names, joins between tables,
ecc.) and that will then work with your RUS server, but not with others
that organize their data in different ways. The result would be exactly
the contrary of what we want to achieve: standardization and
interoperability. We would end up with clients that normally can talk
only with their own servers, but not with those of other implementations.
That's why I see now easy way around using standard XML procedures (like
XPath) when working with XML documents. But again: your implementation
can still use relational databases, it just has to take care of
converting received XPath statements into SQL statements to retrieve the
data (which is definitely not an easy task).

Is there any way to support SQL without compromising the standardization
efforts? We cannot dictate people how they have to organize their
relational database such that we would have uniform SQL statements for
all implementations (and we cannot force them to do so if they want to
use an XML database), so defining a mandatory table/column structure is
not an option. And even if we find a way of supporting SQL we will force
_all_ developers to support that _even_ if they use XML databases (which
means they have to convert SQL statements in XPath, which is not easier
I guess). A possible approach might be: allow a subset of SQL statements
that do not include the table names (and joins) and uses only the
properties defined in the OGF-UR specification, something like

SELECT * WHERE urf:ProjectName="myproject" AND urf:MachineName LIKE
"%infn.it";

How that is then mapped to the underlying database depends on the
implementation. It might be translated into:

SELECT * FROM BaseUR,ResourceExtensions WHERE
BaseUR.id=ResourceExtensions.id AND BaseUR.ProjectName="myproject" AND
BaseUR.MachineName LIKE "%infn.it";

But the result would still have to be converted into an XML document
(OGF-UR) in order to make sure the client understands it (that's
essential!), whatever the database structure may be.

But how would you say in a "standardized" SQL that you want only records
with <Resource description="VOName">infn</Resource>?

SELECT * WHERE urf:Resource_description="VOName" AND urf:Resource="infn"; ?

I think supporting some kind of "standardized RUS-SQL" is out of scope,
it would mean we first have to define such a RUS-SQL (and make sure it
fits all possible use cases) and then force developers to support it
even if they don't want to (why should somebody using an XML database
with UR XML documents be forced to accept SQL statements as query?). The
point is we are working with XML documents so the extra implementation
effort should be made by those who do not want to work with XML in their
underlying implementation, not by those who want to use native XML stuff.

Does anyone else have a comment/opinion to share?

Cheers,

Rosario.




More information about the rus-wg mailing list