[RUS-WG] [UR-WG] OGF20 Session recommendations

Rosario Michael Piro piro at to.infn.it
Fri May 4 06:01:51 CDT 2007


Hi Xiaoyu,

(I add the mailing list in CC, I suppose you forgot it in your reply, 
but the topic is surely interesting for the other memebers as well).

Xiaoyu Chen wrote:
> hello, Rosario:
>  
>          thanks for you comments! Just a little arguments on:
> ------------------------------------------------------
>         4). Batch Query
>>                 Most of applicatoins have been diverted from XML database into Relational DB for UR storage because of performance.
> 
> Actually nearly all implementations I know of so far use XML databases,
> not relational DBs.
> -------------------------------------------------------
>  
> MCS-RUS developed by manchester for gMarket uses Xindice initially and now diverted into Oracle now because of unbearable low performance when processing huge amount of usage data. And NGS in UK and LCG at RAL both store usage record in relational database. As a grid production, people more care about performance than standard comformance. So here is the question, in what extension, a specification concerns about implementations, because the specification is designed for implementations?
>  

Yes, I bet the performance is horrible when using native XML databases, 
but that's an implementation problem, not a problem of the interface. 
(And by the way, relational databases as well can be quite slow if the 
queries envolve many join operations on large amounts of data).

DGAS as well stores its legacy records in a relational database, so I'm 
perfectly aware that performance is a major concern of production 
environments, bot the most important thing that we have to keep in mind 
when defining the RUS specifications is _standardization_ and since we 
treat XML documents we can achieve standardization only by applying 
standard XML procedures. If, for example, we would allow for query 
statements to be written in SQL instead of XPath then we would give up 
all standardization efforts, because SQL queries that work on your 
database will not work on mine and vice versa. That means if I would 
want to implement a client that is capabale of querying your RUS server 
with SQL statements, then I would have to keep in mind exaclty how your 
database is organized (table names, column names, joins between tables, 
ecc.) and that will then work with your RUS server, but not with others 
that organize their data in different ways. The result would be exactly 
the contrary of what we want to achieve: standardization and 
interoperability. We would end up with clients that normally can talk 
only with their own servers, but not with those of other implementations.
That's why I see now easy way around using standard XML procedures (like 
XPath) when working with XML documents. But again: your implementation 
can still use relational databases, it just has to take care of 
converting received XPath statements into SQL statements to retrieve the 
data (which is definitely not an easy task).

Is there any way to support SQL without compromising the standardization 
efforts? We cannot dictate people how they have to organize their 
relational database such that we would have uniform SQL statements for 
all implementations (and we cannot force them to do so if they want to 
use an XML database), so defining a mandatory table/column structure is 
not an option. And even if we find a way of supporting SQL we will force 
_all_ developers to support that _even_ if they use XML databases (which 
means they have to convert SQL statements in XPath, which is not easier 
I guess). A possible approach might be: allow a subset of SQL statements 
that do not include the table names (and joins) and uses only the 
properties defined in the OGF-UR specification, something like

SELECT * WHERE urf:ProjectName="myproject" AND urf:MachineName LIKE 
"%infn.it";

How that is then mapped to the underlying database depends on the 
implementation. It might be translated into:

SELECT * FROM BaseUR,ResourceExtensions WHERE 
BaseUR.id=ResourceExtensions.id AND BaseUR.ProjectName="myproject" AND 
BaseUR.MachineName LIKE "%infn.it";

But the result would still have to be converted into an XML document 
(OGF-UR) in order to make sure the client understands it (that's 
essential!), whatever the database structure may be.

But how would you say in a "standardized" SQL that you want only records 
with <Resource description="VOName">infn</Resource>?

SELECT * WHERE urf:Resource_description="VOName" AND urf:Resource="infn"; ?

I think supporting some kind of "standardized RUS-SQL" is out of scope, 
it would mean we first have to define such a RUS-SQL (and make sure it 
fits all possible use cases) and then force developers to support it 
even if they don't want to (why should somebody using an XML database 
with UR XML documents be forced to accept SQL statements as query?). The 
point is we are working with XML documents so the extra implementation 
effort should be made by those who do not want to work with XML in their 
underlying implementation, not by those who want to use native XML stuff.

Does anyone else have a comment/opinion to share?

Cheers,

Rosario.


More information about the rus-wg mailing list