[RUS-WG] [UR-WG] OGF20 Session recommendations

Gilbert Netzer noname at pdc.kth.se
Fri May 4 09:02:27 CDT 2007


Hi Rosario, hi Xiaoyu,

in the "batch" slides I actually wanted to point to another issue, I will 
write that in a separate eMail, but I think this discussion is very good to 
follow up on!

Just to pick up on this thread, here come my ideas around the performance 
issue and which db:

I agree with Rosario that the type, model, make of the database (if it is 
even there, you could do something completely different) should not make 
any difference to the interface and the standard. However in reality the 
interface sort of makes a choice for you. In our case, if the interface 
allows queries with XPath, it is really much easier if your db also 
supports that, and if it doesn't it will mean a lot of work for 
implementors (e.g. by making a XPath to whatever translation). But then 
again that is true for whatever query dialect you specify (e.g. if you 
would use some home-baked SQL anyone using a XML db would have a hard time).

So my take on that is that the specification should use a well-known, 
standardized (very important!) query language that fits well to the format 
that the data is presented. This in our case means XML (Usage Record 
Format), so the language should be tailored for querying XML documents, 
otherwise it will be at least awkward or hard to understand. And I see that 
as a strong argument for XPath and XQuery.

The other side is about performance in general. My opinion on this goes a 
little bit towards the following:
A query language might allow any user to formulate very complex queries 
that can be really slow and/or expensive. You can also do that in SQL, as 
Rosario pointed out. But Xiaoyu has a point there in that implementations 
might get into big trouble. Potentially you could mount a denial of service 
attack by asking the right questions.


So my idea on this is the following:

- A server should be able to deny a query if it deems it too complex or 
costly. That could however be very bad for interoperability since if you 
deny a query, what should the client do then?

- Therefore all servers MUST guarantee to accept a certain subset of simple 
queries (e.g. to cover the most simple use-cases). This would allow clients 
to ask safe questions in case the server does deny them more advanced ones.

- For the MUST execute queries, we could also specify a very restrictive 
subset of XPath, so that you probably could get away with a very simple 
parser to find out which query you get. Then translation into whatever the 
backend needs is much simpler.

- However we (who are writing the specification) should make an effort to 
discover what queries should go into the standard as MUST understand so 
that clients can actually do something.

- This also could solve the problem with what to do with XPath expressions 
that select only part of a UR -> they could be declared illegal and simply 
denied.

Just a few thoughts from my side...

Best Regards
Gilbert

Rosario Michael Piro wrote:
> Hi Xiaoyu,
> 
> (I add the mailing list in CC, I suppose you forgot it in your reply, 
> but the topic is surely interesting for the other memebers as well).
> 
> Xiaoyu Chen wrote:
>> hello, Rosario:
>>  
>>          thanks for you comments! Just a little arguments on:
>> ------------------------------------------------------
>>         4). Batch Query
>>>                 Most of applicatoins have been diverted from XML database into Relational DB for UR storage because of performance.
>> Actually nearly all implementations I know of so far use XML databases,
>> not relational DBs.
>> -------------------------------------------------------
>>  
>> MCS-RUS developed by manchester for gMarket uses Xindice initially and now diverted into Oracle now because of unbearable low performance when processing huge amount of usage data. And NGS in UK and LCG at RAL both store usage record in relational database. As a grid production, people more care about performance than standard comformance. So here is the question, in what extension, a specification concerns about implementations, because the specification is designed for implementations?
>>  
> 
> Yes, I bet the performance is horrible when using native XML databases, 
> but that's an implementation problem, not a problem of the interface. 
> (And by the way, relational databases as well can be quite slow if the 
> queries envolve many join operations on large amounts of data).
> 
> DGAS as well stores its legacy records in a relational database, so I'm 
> perfectly aware that performance is a major concern of production 
> environments, bot the most important thing that we have to keep in mind 
> when defining the RUS specifications is _standardization_ and since we 
> treat XML documents we can achieve standardization only by applying 
> standard XML procedures. If, for example, we would allow for query 
> statements to be written in SQL instead of XPath then we would give up 
> all standardization efforts, because SQL queries that work on your 
> database will not work on mine and vice versa. That means if I would 
> want to implement a client that is capabale of querying your RUS server 
> with SQL statements, then I would have to keep in mind exaclty how your 
> database is organized (table names, column names, joins between tables, 
> ecc.) and that will then work with your RUS server, but not with others 
> that organize their data in different ways. The result would be exactly 
> the contrary of what we want to achieve: standardization and 
> interoperability. We would end up with clients that normally can talk 
> only with their own servers, but not with those of other implementations.
> That's why I see now easy way around using standard XML procedures (like 
> XPath) when working with XML documents. But again: your implementation 
> can still use relational databases, it just has to take care of 
> converting received XPath statements into SQL statements to retrieve the 
> data (which is definitely not an easy task).
> 
> Is there any way to support SQL without compromising the standardization 
> efforts? We cannot dictate people how they have to organize their 
> relational database such that we would have uniform SQL statements for 
> all implementations (and we cannot force them to do so if they want to 
> use an XML database), so defining a mandatory table/column structure is 
> not an option. And even if we find a way of supporting SQL we will force 
> _all_ developers to support that _even_ if they use XML databases (which 
> means they have to convert SQL statements in XPath, which is not easier 
> I guess). A possible approach might be: allow a subset of SQL statements 
> that do not include the table names (and joins) and uses only the 
> properties defined in the OGF-UR specification, something like
> 
> SELECT * WHERE urf:ProjectName="myproject" AND urf:MachineName LIKE 
> "%infn.it";
> 
> How that is then mapped to the underlying database depends on the 
> implementation. It might be translated into:
> 
> SELECT * FROM BaseUR,ResourceExtensions WHERE 
> BaseUR.id=ResourceExtensions.id AND BaseUR.ProjectName="myproject" AND 
> BaseUR.MachineName LIKE "%infn.it";
> 
> But the result would still have to be converted into an XML document 
> (OGF-UR) in order to make sure the client understands it (that's 
> essential!), whatever the database structure may be.
> 
> But how would you say in a "standardized" SQL that you want only records 
> with <Resource description="VOName">infn</Resource>?
> 
> SELECT * WHERE urf:Resource_description="VOName" AND urf:Resource="infn"; ?
> 
> I think supporting some kind of "standardized RUS-SQL" is out of scope, 
> it would mean we first have to define such a RUS-SQL (and make sure it 
> fits all possible use cases) and then force developers to support it 
> even if they don't want to (why should somebody using an XML database 
> with UR XML documents be forced to accept SQL statements as query?). The 
> point is we are working with XML documents so the extra implementation 
> effort should be made by those who do not want to work with XML in their 
> underlying implementation, not by those who want to use native XML stuff.
> 
> Does anyone else have a comment/opinion to share?
> 
> Cheers,
> 
> Rosario.
> --
>   rus-wg mailing list
>   rus-wg at ogf.org
>   http://www.ogf.org/mailman/listinfo/rus-wg



More information about the rus-wg mailing list