[RUS-WG] RUS implementd on relational database

Thu Aug 14 09:57:26 CDT 2008

Stephen wrote:

>I'm new to this list but I'm currently contemplating adding a RUS
>service front end to an existing system that uses a relational
>database back-end.
That's what we implemented in fully RUS compatible pattern

>I'd therefore like to make some comments on what looks like a bit of a
>long standing issue which is the status of Xpath in the specification.

>The most recent documents in the tracker add the ability to specify
>alternate filter dialects which I think is a significant improvement.
>However they mandate Xpath-1.0 as a supported dialect which I'm not
>sure is a good idea.

>Mandating the use of Xpath-1.0 makes it very difficult
>to fully implement the specification using a relational database rather than
>a native XML database.

I cannot agree more. XPath-1.0 is not fully feature, especially for function calls of xsd:dataTime data type.
Here is the use case:

for a user that would like to get usage record of this month. The only possible way to specify xpath 1.0 expression is as following:

/urf:UsageRecord[urf:StartTime>'2008-08-01T00:00:00Z'][urf:EndTime<'2008-08-31T59:59:59Z']
However evaluation of XPath return null. I presume the reason is the XPath engine (of JDK 5) only returns the value of "urf:StartTime" and "urf:EndTime" as String and evaluate by comapring to specified values. 

>Actually I feel that Xpath-1.0 is not really sufficient anyway. There
>does not seem to be any easy way of selecting records by EndTime (e.g.
>records run in a specified month) without using features from Xpath-2.0.
>I think this is really important as I have over a million accounting
>records going back over 6 years in one of my databases and almost all
>operations I perform on it select a small subset of this by date range.
>Xpath-2.0/Xquery-1.0 have date functions and are fine in this respect.

However XPath 2.0 is more features with additional functional calls (for datatime dataype for example). However, for usage records persistent in realational database, XPath does not do any good for RUS operations. 
In our implementation (GRUS and WLCG-RUS) we developed a lightweigh XPath2HQL (Hibernate Query Language, which is SQL-like but object-oriented) tool. Rather than providing general-purpose transformation 
between XPath and HQL, the tool has a set of constrainted rules, such as not supporting XPath function calls. We also use Hiberante API for access to hetergoenous relational database. A high level abstract, the Data Access Object layer, cusotmer implementations can be developed to support XML:DB, file system, etc. 

>My own feeling is that rather than mandate any filter dialect it would
>be better to allow filters to be specified either by a search-string in a
>supported query dialect or alternatively by an XML element that encodes
>the minimal set of filters a RUS implementation needs to support.
>As this selection language would only exist as part of the RUS it might
>as well be written in XML and be part of the RUS schema rather than creating
>an additional specification and custom parsers for a subset of Xpath.
>Provided this additional selection language can be easily mapped to
>Xpath-2.0 predicates it should not add significantly to the difficulty of
>implementing the service on a native XML database.

>I think a sensible minimal filtering capability is a list of binary
>comparisons (== != < <= > >=) between
>leaf elements of the UR (identified in the same way as the mandatory
>elements) and literal values of the corresponding schema type.
>The selected records would be those where all the match conditions
>resolve true.
>The update function could be supported by supplying a set of element
>assignments along with the selector.

Totally agree, i don't know wether it is feasible for update but defintely good for query. [see our extension below]

>I imagine this would look something like

><FilterList>
><MatchCondition match="lt">
><Target>EndTime</Target>
><Value>2008-08-01 08:00:00Z</Value>
></MatchCondition>
><MatchCondition match="gt">
><Target>EndTime</Target>
><Value>2008-07-01 08:00:00Z</Value>
></MatchCondition>
></FilterList>

In the GRUS, we defined three header elements, the schema of which are as follows:

a) wlcgrus:GroupBy

With this header, the user can interrogate a RUS service endpoint without using XPath, but explictly identifying desired usage metrics to be returned. 

The groupBy element can be used for both job and aggregate/summary usage query by specifying the //urf:GroupBy/@aggregate value.

<xsd:element name="GroupBy">
 <xsd:complexType>
  <xsd:sequence>
   <xsd:element ref="urf:StartTime" minOccurs="0" maxOccurs="1"/>
   <xsd:element ref="urf:EndTime" minOccurs="0" maxOccurs="1"/>
   <xsd:element name="usage" type="xsd:QName" minOccurs="0" maxOccurs="unbounded"/>
   <xsd:any namespace="##other"
        minOccurs="0"
        maxOccurs="unbounded"
        processContents="lax" />
  </xsd:sequence>
  <xsd:attribute name="aggregate" type="xsd:boolean" use="optional" default="false" />
 </xsd:complexType>
 </xsd:element>

b). wlcgrus:SortBy

The usage of this header enables ordering return usage records for usage metrics as a paricular usage metric.

<xsd:element name="SortBy">
  <xsd:complexType>
   <xsd:choice>
    <xsd:element name="usage" type="xsd:QName" minOccurs="0" maxOccurs="1" />
    <xsd:any namespace="##other"
          minOccurs="0"
           maxOccurs="1"
           processContents="lax" />
      </xsd:choice>
      <xsd:attribute name="order" type="wlcgrus:orderType" use="required" />
  </xsd:complexType>
 </xsd:element> 

<xsd:simpleType name="orderType">
     <xsd:restriction base="xsd:token">
        <xsd:enumeration value="asc"/>
        <xsd:enumeration value="desc"/>
        </xsd:restriction>
   </xsd:simpleType>

C). wlcgrus:maxRecords

this header part is used to constrain the maximum usage records or usage metrics allowed by a specific request

<xsd:element name="maxRecords" type="xsd:int" />

With above header extensions, it is possible to query usage records through RUS service without using XPath.

e.g. 1: get the top 10 job usage records of 'Atlas' VO with respect to maximum CPU usage of this month

the request message is as follows:

<env:Header ...>

<wlcgrus:maxRecords>10</wlcgrus:maxRecords>

<wlcgrus:GroupBy aggregate="false">
   <urf:StartTime>2008-08-01T00:00:00Z</urf:StartTime>
   <urf:EndTime>2008-08-31T23:59:59Z</urf:EndTime>
   <wlcgrus:usage>urf:CpuDuration</urf:usage>
   <urf:Resource description="VOName">Atlas</urf:Resource>
 </wlcgrus:GroupBy>

<wlcgrus:SortBy order="desc">
  <wlcgrus:usage>urf:CpuDuration</wlcgrus:usage>
 </wlcgrus:SortBy>

</env:Header>

<env:Body>

<rus:extractUsageRecordsRequest />

</env:Body>

....

>The translation of this into predicates is straightforward as is the
>translation into a SQL select statment for elements that have been
>extracted into SQL fields, any remaining match conditions could be
>evaluated by regenerating the XML for the superset of the target records
>returned by the SQL and applying Xpath.
>Alternatively we could have a method that queries the permitted target
>elements for a selector.

if you are interested in more information, we can arrange further meeting (maybe face-to-face).

X. Chen, A. Khan