[RUS-WG] [UR-WG] OGF20 Session recommendations

Gilbert Netzer noname at pdc.kth.se
Fri May 4 10:26:04 CDT 2007


Hi Rosario, hi Xiaoyu and Everyone,

I am chopping up this email into smaller bits and pieces to allow for an 
easier discussion about different subjects, I hope you don't mind.

Rosario Michael Piro wrote:
> Hi,
> 
> Xiaoyu Chen wrote:
>> Hello, Gilbert and Everyone,
>>  ... cutted here
>>  
>>            3). Core RUS Specification 
>>                Bach processing is preferable in most applications, but performance is a big challenge. clients would like to have flexibility on processing as well as responsiveness. 
>>            
>>            4). Batch Query
>>                 Most of applicatoins have been diverted from XML database into Relational DB for UR storage because of performance. 
> 
> Actually nearly all implementations I know of so far use XML databases, 
> not relational DBs.
> 
>>                 However, the rus service interface definition is not relation DB friendly. 
> 
> It does not need to be, it needs to be only _XML_ friendly (not even 
> _XML database_ friendly, only XML friendly, since we are handling XML 
> documents). How URs are stored by the underlying application is not an 
> issue of the interface.
> 
>>                 In this sense, every simple query upon relational UR storage has to explicitly transform return results into XML UR as a well-formed query result.
> 
> Yes, but that is not a big problem the problem when using a relational 
> database is that first the XPath query has to be translated into an SQL 
> statement. That's a major reseach issue and as far as I know there is no 
> general recipe for that.
> 
>>                 I don't known how WS-Enumeration reduce 
>>                 returned usage records. But add a parameter for max request size is not a good idea in that if setting maximum returned usage records as 10, for example, how to return following usage records 
>>                 to the client, becuase each time the user query for URs, he or she always get first 10 usage records. How to put an anchor there? besides that usage repository are kept updating, and setting an 
>>                 anchor for query seems impossible. What my options here is to let RUS query operation to return either matched URs or partial of matched URs to client, even hugh amount of data, and leaves 
>>                 RUS client to put restrictions on how to restrict the number of usage records returned.
> 
> The restriction on how many records can be returned depends not only on 
> the client, if the client doesn't care the server will get into trouble. 
> So a specified maximum number is basically ok, but: it would require an 
> additional method that allows the client to know (something analohous to 
> retrieveing the list of mandatory elements). And: the client often 
> cannot know how many URs will be selected by its query (If I ask for URs 
> for the last month for a specific user I can get anything from 0 to a 
> million ...)
> We might think about something like a "TooManyURsSelectedFault" that can 
> be returned by the server.
> But you are right that limiting the number of records is not a really 
> good approach and we should definitely make sure the client can easily 
> get everything it needs even if the number of records should exceed the 
> limit (by multiple queries, by a response that is devided in multiple 
> parts, something that is maybe inspired on TCP/IP to allow the client to 
> get multiple pieces and put them together in the right order, or 
> wahtever ...)

Yes WS-Enumeration should not reduce the total amount of UsageRecords 
returned, it should still be the same query. But it will allow returning a 
large result set in small pieces to save resources (memory, bandwidth) on 
the server and client side. It does also allow a client to stop querying 
when it is satisfied (e.g. when the user does not want to see any more).

The idea with the maximum number is the following: A client does a query 
and it cannot up front know how many records that this query will return. 
So if the query returns a small number of records that the client can 
handle, all is fine and all the records should be returned. But when the 
number is large, the client probably does not want to get one big response 
but rather try again using the enumeration (or just get back an enumeration 
directly). So the idea for this is more to allow some client side control 
about the maximum return size.

For the server the situation is a little bit easier. Here we I like the 
idea from Rosario of a TooManyURsSelectedFault (or MessageTooBigFault). In 
this case the client can again ask for an enumeration and have the result 
in smaller bits to not overload the server, or just give up. This would 
give the server some control over the message size.

It also would be good if the server could abort a insert/modify/replace 
request in case the client keeps on streaming UsageRecords to it, before it 
breaks down. Here a MessageTooBigFault could come in handy too.

About a method for the client to determine how many URs a query will 
return, that would be good to have too, but probably not that necessary 
since you can about an enumeration at any time when you have enough. But a 
hint about how much more to expect would be good. Perhaps we should allow 
the server to return an estimated number of records in case of an 
enumeration attempt or a MessageTooBigFault.

Best Regards
Gilbert



More information about the rus-wg mailing list