[RUS-WG] Ideas for the RUS specification

Mon Feb 9 11:56:01 CST 2009

Hi Joshua, hi Gilbert, hi everybody!

First of all let me say that I appreciate the excellent comments and
suggestions that Joshua has made, I agree with most of them.

In the following I will give some of my views regarding those suggestions
(I use the numbering in Joshua's document so that it is clearer what
section I'm refering to):

3) I agree that some more clearity about when, why and what faults can
(and should) be returned, especially regarding the IllegalRequestFault
(that of course must not be confounded with lacking authorization or
invalid records, etc).

4) Operation names should of course be consistent. But before deciding the
exact names we should keep in mind that there might be future extensions
regarding other types of usage records (e.g. storage usage, that I believe
will be a big forthcoming issue).

Should an eventual storage record be uploaded/extracted to a RUS through
exactly the same operations? (both job and storage usage records, maybe
also network usage records or whatever, can be extracted with
extractRecords and the request itself defines what exectly is wanted).

Or should there be destinct operations for distinct record types?
Then I'd suggest to extend the names to:
extractJob(Usage)Records, extractStorage(Usage)Records, ... (same for
insert, modify etc), whether we remove the "usage" part or not.

I would opt for the latter because it distinguishes between different
usage types and leaves the possibility to eventually define distinct
interfaces for them. (Or do we want to limit our work to only job usage?
That of course is another option, but I would at least keep the option of
extending the work to other usage types :o)

5) Here we should decide whether we want to split the RUS spec into
different sub-interfaces, as proposed by Joshua and Gilbert in their last
mails, or whether to allow for an OperationNotSupportedFault as previouslz
suggested by Joshua in his document. I think both are valid suggestions,
but actually I would prefer to keep one interface +
OperationNotSupportedFault, even if not everybody will implement all parts
of it.

For the interoperability of my implementation it doesn't make much
difference whether I implement only a hypothetical "Resoure Usage
Insertion Interface" (I prefer "Insertion", because "Publishing" is more
ambiguous and might also refer to the final publishing to the end-user,
i.e. extraction) or whether I return an OperationNotSupportedFault (or
NotAuthorizedFault) upon all attempts of extraction/modification.

Of course in any case no implementation should be forced to implement
every piece (most existing accounting systems don't implement modification
for example and already now the RUS spec does allow to simply return a
NotAuthorizedFault upon every attempt).

If we decide for splitting up the spec then I would say that it should
still be one WG to work on all of them (maybe subgroups of one WG),
otherwise there is a theoretical risk that we end up with interoperability
between the different specs :o)

5.1) It's true that the current specification doesn't permit for listing
differntiated properties as mandatory elements. But actually there's
already a solution (at least I hope so). I'm currently working on an
update that, among other improvements, provides also a much more flexible
way for specifying mandatory elements. Here's an excerpt. This must not
necessarily be the final form, but it will give you an idea about what I
had in mind:

   <xs:complexType name="MandatoryElementConstraintType">
      <xs:choice minOccurs="1" maxOccurs="1">
         <xs:sequence>
            <xs:element name="AllowedValue" type="xs:anyType"
                        minOccurs="1" maxOccurs="unbounded" />
         </xs:sequence>
         <xs:sequence>
            <xs:element name="MinValue" type="xs:anyType"
                        minOccurs="0" maxOccurs="1" />
            <xs:element name="MaxValue" type="xs:anyType"
                        minOccurs="0" maxOccurs="1" />
[IS anyType OK FOR MINVALUE AND MAXVALUE?]
         </xs:sequence>
      </xs:choice>
   </xs:complexType>

   <xs:complexType name="MandatoryElementType">
      <xs:attribute name="name" type="xs:QName" use="required" />
      <xs:element name="Constraint"
                  type="MandatoryElementConstraintType"
                  minOccurs="0" maxOccurs="1" />
      <xs:sequence>
         <xs:element name="Attribute"
                     minOccurs="0" maxOccurs="unbounded">
            <xs:attribute name="name" type="xs:token"
                          use="required" />
            <xs:element name="Constraint"
                        type="MandatoryElementConstraintType"
                        minOccurs="0" maxOccurs="1" />
         </xs:element>
      </xs:sequence>
   </xs:complexType>

[Is this specification formally correct? Allow also for mandatory
attributes, e.g. to require UR documents to have something like <Resource
description=”VOName”>alice</Resource> or maybe <TimeInstant
type="timeSubmitted">...</TimeInstant>?
Are there other types of constraints we should consider?]

Examples 1:

<MandatoryElement name="http://schema.ogf.org/2003/09/urf:ProjectName">
    <Constraint>
        <AllowedValue>LCG</AllowedValue>
        <AllowedValue>LHC Compute Grid</AllowedValue>
    </Constraint>
</MandatoryElement>

This example will require the UR property urf:ProjectName to be present
and to have either the value “LHC” or the value “LHC Compute Grid”.

Example 2:

<MandatoryElement name="http://schema.ogf.org/2003/09/urf:TimeInstant">
    <Attribute name="type">
        <Constraint>
            <AllowedValue>timeSubmitted</AllowedValue>
        </Constraint>
    </Attribute>
</MandatoryElement>

This example will require the presence of an element
<TimeInstant type=”timeSubmitted”>...</TimeInstant>
with a mandatory attribute (“type”) of a well-defined value
(“timeSubmitted”), but without restrictions on the content of TimeInstant
itself (i.e. any submission time is allowed).

This would also have the advantage that it does not provide a defined list
of UR properties that can be specified anymore. But all UR propertyies
could be specified with a QName and also properties of future extensions
regarding other resource types (e.g. storage usage records) would be
included.

Eventaually the "MaxValue", "MinValue", "AllowedValue" could be replaced
with EQ (equals), LE (less or equal), LT (less than), GE (greater or
equal), GT (greater than) and NE (not equal) to make sure it will be even
more flexible.

Would such a specification of mandatory elements make sense?

6.1) Regarding aggregate queries: this is a very important use case,
because very often the end user will want to have just some idea about how
many resources have been consumed, let's say, withing a month or year.
Currently that means that the user, if using the RUS interface, will have
to extract _all_ URs and then do the aggregation on the client side, which
causes a lot of unnecassary network traffic ... so there is a need for
this, but:

* the AUR although proposed, has never really been discussed in the UR-WG
(maybe at some of the last OGFs but I'm not sure), so it is far from being
a standardized I fear (although it would be great to have a standardized
version).

* we thought to have methods regarding aggregation in the advanced
specifications (so as optional add ons, instead of as part of the core
specification). It's somewhat a matter of distinguishing between the RUS
being a "storage service" (then handling single or multiple records,
without aggregation, is enough) for URs or being fully featured "usage
information" service (then aggregation is needed). Maybe the core could be
for the first, while the latter could be handled by advanced
functionalities.

5.6, 6.2 and 7) Regarding using different renderings for URs and
decoupling the RUS spec from XML databases by defining a RUS specific
query language instead of requiring XPath/XQuery/XQuerUpdate:

Since so far the XML version of URs is the only official rendering
(although others may not be excluded by the UR spec) it seemed natural to
work on that. And given an XML representation XPath, XQuery, etc. seem
natural choices, too.

XPath/XQuery have another advantage that we wanted to exploit for the next
spec: they allow to return not only entire UR documents, but also only
parts of them (if the user just needs a list or job ids to know which jobs
he/she submitted last month, why should the RUS return entire
documents???).

But you are right that this leads to implementation problems because it
either means you have to use a native XML database (with a notoriously low
performance) or will get into trouble when trying to convert XPath
expressions into SQL. In case you're interested, I've looked into the
issue of XML2SQL translation vs native XML databases when planning a RUS
interface for DGAS, the accounting system I have been working on with my
colleagues:

http://personalpages.to.infn.it/~piro/pub/techrep/RUSandUR4DGAS-0_2.pdf

Knowing the difficulties I support your proposal to instead define a
simple RUS-specific query language, even if it is less flexible. That does
not exclude XPath or XQuery because using the listSupportedDialects
operation (or similar mechanisms) single implementations can still support
XPath etc. if they want to.

Maybe it would not even be that difficult to come up with a RUS-specific
query language. Maybe you have noticed the similarity of your proposal and
what I have developed for the specification of mandatory elements (see
above)? That could be extended with notions to allow for AND/OR in order
to have a simple query language that allows for many of the most important
use cases. Those who wish to support more flexible queries could then use
XPath/Query or come up with implementation-specific solutions. Only the
simple RUS-specific query language would need to be supported by all
implementations.
Maybe it would even be straightforward to do this for modifications (after
record selection as in extraction: add something like "Replacement" or
"Increment" or similar that specifies the UR properties to be updated with
QNames):

<Relpacement
name="http://schema.ogf.org/2003/09/urf:ProjectName">newproject</Replacement>

Of course the modification operations are more difficult to address with a
RUS-specific language, but they are also less urgent, most systems don't
need them anyway.

Actually I would even say to remove modifiyUsageRecords from the core
specification and instead consider it for further advanced functionalities
(why have them in the core if most accounting systems don't need them? Or
alternatively leave them completely out of the spec and make them
implementation-specific issues (interoperability between different grid
infrastructures/project are likely to need common notions for insertion
and extraction, but I doubt they will need remote modifications).

If you agree I can try to come up with a simple RUS-specific query
language by extending what I already have for listing mandatory elements
and integrate some of Joshua's ideas (above all AND/OR!). Let's see what
that will look like and whether it can cover the most important use cases.
Maybe we could also use QNames to let the user specify which parts of the
URs should be returned (e.g. entire documents, or only job IDs, or job IDs
and job names, or ...).

What do you think?

Cheers,
Rosario.

> Hi Gilbert, everybody,
>
> I am glad to hear my analysis of the current RUS draft was helpful.  I
> recently read through it and must apologise for the many grammatical
> mistakes I found there.  Apparently my proof reading isn't up to much!
>
> I strongly agree with factoring out of publishing of usage record
> information into a another spec .  The team I work with are developing
> an accounting system that is expected to be RUS compliant.  When we
> consulted our stakeholders (namely NGS and various campus grids in the
> UK) and any possible interested users we could think of they were all
> very keen on us implementing the RUS.  However, after we questioned
> them in more detail we discovered they didn't really know what the RUS
> was.  When they said 'RUS' they really meant a 'usage record
> publishing specification'.  They assumed this is what RUS was.  Of
> course, they aren't wrong but they didn't seem to be very interested
> in other aspects of the RUS.
>
> These groups would greatly benefit from a 'Usage Record Publishing
> Interface'.  Such a standard could get finished off and finalised
> quite quickly which would be very helpful as this is the main feature
> users are crying out for right now.  The other aspects of the RUS,
> such as usage record and history querying may be useful to some, but
> their interface is a lot harder to get right, will take longer to tune
> and the target audience less certain.  While I can imagine there are
> such groups out there who would like such features, we have not
> encountered them yet in our requirements gathering exercises (apart
> from the odd user saying 'ooh that would be nice, we might consider
> using that if the product you gave us had that').
>
>
> Hope this was helpful.
>
> Joshua Green
> EPCC Applications Developer
> --
>   rus-wg mailing list
>   rus-wg at ogf.org
>   http://www.ogf.org/mailman/listinfo/rus-wg
>