[RUS-WG] Ideas for the RUS query specification

Thu Feb 19 04:45:49 CST 2009

Hi Joshua,

(I added the mailing list in CC)

Thanks for the work done! I don't have much time to answer right now, but
I wanted to send at least a few comments and let the mailing list have
you're mail, too, in case someone else has additional comments or
suggestions.

> Hi Rosario,
>
> Sorry for the long wait in replying.  Thank you for offering to come
> up with a RUS query language.  I am happy for you to do so and would
> like to help you in such an endeavor if that's OK with you.  I have
> been working on something myself over the past month or so.  As we
> seem to have been thinking along the same lines, I thought I should
> send you what I have come up with for you're consideration.
>
> Since you're email I've incorporated some of you're design into what I
> already had to make the two designs more comparable.  I particularly
> like the idea of using QNames to specify UR elements.  While this does
> move away from the idea of considering UR documents in terms of
> properties and metaproperties as I suggested in my analysis of the RUS
> IDL, I suppose it is a more pragmatic solution as usage records are
> unlikely to be represented in any other way.  Besides, so long as one
> does not start treating element values as chunks of (complex content)
> XML, it is easy to map between element/attribute and
> property/metaproperty.

It's true that QNames are more specific towards an XML representation of
the UR, but I also believe that they should be fairly easy to map to other
specifications as well. Honestly, considering that the work in the UR-WG
is proceeding slowly (absolutely no blame on them, the same is true for
the RUS-WG ;o) I don't believe that there will be other "official" UR
representations in the near future.

>
> The design shown here was targeted more at a RUS filter language to
> select usage records, although it could be used for such a purpose
> too.  This makes sense if we state that a usage record has the
> required mandatory elements if it passes the filter.  I believe it is
> important to keep the filtering of records separate from other
> operations (such as editing the records or selecting return values) as
> it allows implementors to use the information to check authorisation
> logs to make sure the query is permitted.  While one could run the
> query and determine this on a record by record basis, this could be
> very slow for a large results set.  Such an operation can often be
> avoided by comparison of the selection filter with an authorisation
> policy, thus avoiding the fetching of any records if the filter fails
> authorisation.
>
> Here is an example of what I was considering:
>
> <filter
>   xmlns="http://schemas.ogf.org/rus/2007/09/filter"
>   xmlns:urf="http://schema.ogf.org/urf/2003/09/urf">
>
>   <elementFilter name="urf:StartTime">
>     <constraints>
>       <andCombination>
>         <greaterThan>2003-08-13T12:00:00Z</greaterThan>
>         <lessThan>2003-08-14T12:00:00Z</lessThan>
>       </andCombination>
>     </constraints>
>   </elementFilter>
>
>   <elementFilter name="urf:Disk">
>     <constraints>
>       <lessThan storageUnit="MB">10</lessThan>
>     </constraints>
>   </elementFilter>
> </filter>
>
> This is fairly similar to the design you proposed.  The main
> difference is that a constraints element either takes one constraint
> or a constraint combination that can either be and andCombination or
> an orCombination.  Constraints can also be applied to attributes eg:
>
> <filter
>   xmlns="http://schemas.ogf.org/rus/2007/09/filter"
>   xmlns:urf="http://schema.ogf.org/urf/2003/09/urf">
>
>   <elementFilter name="urf:Memory">
>     <constraints>
>       <lessThan storageUnit="MB" phaseUnit="PT1s">10</lessThan>
>     </constraints>
>     <attribute name="metric">
>       <constraints>
>         <equals>average</equals>
>       </constraints>
>     </attribute>
>   </elementFilter>
> </filter>
>
> I have attached the schema I have developed so far to the email.  An
> important feature of the design is the ability to express units.
> After all, a memory size of '10' is meaningless - '10MB' is the
> complete description.  I strongly suggest there is NOT a default unit.
>  Default units in this situation are a classic way for bugs to creep
> in.  Of course, implementors could decide otherwise.

I agree, it's too dangerous. This, however, does leave the problem how to
define _when_ a unit _must_ be specified in the filter (e.g. Memory would
require it in order not to be undefined). The simplest approach would be
to let the implementation decide and, if appropriate, return some error
(IllegalRequest? Something more specific: IllegalFilter?
UndefinedFilterArgument?)

Also we should think of additional use cases: for example a filter might
require either one of a set of properties to be present without however
requiring a specific content. E.g. a record MUST have either a LocalUserId
or a GlobalUserId, but their content can be arbitrary. This is currently
not covered but might be a realistic use case.

Can we think also of use cases where a specific property/attribute MUST
NOT be present? E.g. accept only records that have no fields to identify a
user (in case a RUS just wants anonymous info)?

>
> As the schema was developed, several issues crept up.  I believe they
> need consideration before a filter language can be finalised:
>
> UNITS
> What should be done when units are expressed inappropriately.  For
> example, what if someone specifies a phase unit for <Memory> when all
> usage records measure memory in storage units?  The same logic applies
> to charge as it is unlikely that a RUS will know how to translate from
> one type of charge unit to one used at another site.  I have tried to
> develop a schema that takes all this into account but the schema
> became very complicated and was still abusable.  I think the best
> approach is to allow any type of unit to be specified for any element
> and allow the implementation to decide if it makes sense and complain
> if it doesn't.

Yes, it should be best to let the implementation decide whether a filter
makes sense. We can think of allowing for a specific fault type for that
case (as mentioned above).

>
> MANAGING DIFFERENTIATED PROPERTIES
> A question of policy arises when dealing with attributes of
> differentiated properties.  Are <Memory> and <Memory metric='total'>
> the same property?  I would suggest the answer is no as both can exist
> within a usage record together.  This means that if a filter specifies
> an differentiated property element without constraining one or more of
> it's differentated attributes, the attribute should be assumed to be
> undefined.  This decision could be left to implementors, however this
> could create some nasty compatibility problems.

I think we should require that the default behavior be not to consider any
attributes if the filter doesn't specify so. E.g. if a filter requires the
Memory to be below 10 MB (as in your example) but without specifying
whether an average or maximum or whatever is intended, then all records
having either average below 10MB or maximum below 10MB or whatever below
10MB will be considered.

>
> WHERE TO HAVE AND/OR ELEMENTS - at <constraints> level or <filter> level
> The examples above place and/or combination elements within the
> <constraints> element of an <elementFilter>.  This is fine for
> declaring a set of mandatory elements.  However, the task how to
> combine two elementFilter elements is ambiguous.  Example 1 above
> implies usage records must hava a <urf:StartTime> element AND a
> <urf:Disk> element that match the specified constraints.  However this
> implication is implicit at present.  One could allow and/or
> combination elements to encircle <elementFilter>s.  This would make
> things clearer but then the need for such element inside <constraints>
> is somewhat reduced.  To have them in both places adds complexity to
> implementating an interpreter (although admitidly not much).  To have
> them outside <elementFilter>s means we must allow the same element to
> be filtered in multiple <elementFilter>s - one for each constraint.

I think there are important use cases that require them also oustide the
element filter (e.g. a record must have either LocalUSerId or
GlobalUserId).

> For example, with and/or combination filters outside <elementFilter>
> elements, the constrains for <urf:StartTime> in example 1 become:
>
> <andCombination>
>   <elementFilter name="urf:StartTime">
>     <constraints>
>       <greaterThan>2003-08-13T12:00:00Z</greaterThan>
>     </constraints>
>   </elementFilter>
>
>   <andCombination>
>   <elementFilter name="urf:StartTime">
>     <constraints>
>       <lessThan>2003-08-14T12:00:00Z</lessThan>
>     </constraints>
>   </elementFilter>
> </andCombination>
>
> This makes more sense logically but also makes the constraints much
> longer.  However, as these filters will probably be generated as
> opposed to manually written, this may not be a problem.  The most
> important thing to remember is the dialect must be simple to convert
> into the underlying dialect of any RUS implementation.  What ever
> decision we take, we must make sure the implementation is as easy as
> possible in as many dialects as possible.
>
> Having and/or combination elements at the <filter> level may be
> desirable in the the listMandatoryUsageRecordElements operation.

Yes, for the same use case I described above (Local or GlobalUserId), for
example, it would be important.

> As
> far as I understand, in the UR2 spec it is possible to have different
> types of usage records.  The listMandatoryUsageRecordElements
> operation could support this as follows:
>
> <filter>
> <orCombination>
>
> <andCombination>
> <!-- mandatory elements for storage usgage records -->
> ...
> </andCombination>
>
> <andCombination>
> <!-- mandatory elements for network usgage records -->
> ...
> </andCombination>
>
> </orCombination>
> </filter>
>
> This suggestion is not supported by my suggested schema at present.
> Unfortunately, this suggestion does not allow the specification of
> which <andCombination> goes with which type of usage record.  Perhaps
> it's insufficient.
>
> OTHER NON-FILTER RELATED STUFF
> Moving the modification stuff to an advanced spec is heading towards
> breaking the RUS into many specs.  While I don't think this is a bad
> idea I think we need to consider how many pieces we want the RUS to be
> in.  Upload, download, modification and query/aggregation are arguably
> all separate elements.  From that basis I count 4 levels at which an
> implementer may wish to stop because they don't need the more advanced
> functionality.  I would be inclined to tie upload and download
> together, however some implementers only want a uniform way of
> uploading usage records (the NGS for one).  This is probably something
> that should be discussed face to face at OGF25.

I don't know how far we should go with splitting the spec. up. The risk
would be exactly what the benefit would be: that single implementations
just use a sub-part of the RUS. E.g. just allow for RUS-compliant upload,
but have download, modification and so on implementation-specific. I can
understand the idea behind it, but the result can also be interoperability
problems. It would be wiser to have implementations implement a more
complete minimum set (upload+download?) and allow them to have
implementation-specific alternatives. E.g. you can download via the RUS
interface and via a more efficient implementation-specific solution, but
RUS clients won't go unanswered if they try to access records. This does
not necessarily force implementations to allow for everything via the RUS
interface, but they must at least return a RUS-compliant answer (fault?).
For example, the current spec does perfectly allow for implementations to
ignore all modification requests and just return a NotAuthorizedFault.
That's perfectly RUS-compliant but at least a RUS client that tries a
modification gets a meaningful response (although he isn't allowed to do
any modification).

I would more favor such an approach: don't force implementations to allow
for every operation to be executed, but at least make sure it will reply
in a RUS-compliant manner if the operation is not permitted. In your
previous mail you proposed to have a specific fault for that case
(something like OperationNotPermitted), and I'd say that might be better
than splitting the spec in many pieces. The final result is nearly the
same, but we would have only one measure of compliance (a service is
either RUS-compliant, even if it doesn't allow for all operations, or it
isn't; instead of saying it is RUS-upload-compliant,
RUS-download-compliant, but not RUS-modification-compliant).

>
> However, I see no reason why a RUS query language can't be expandable.
>  For example, modification could be accomplished with a filter element
> followed by a modification element (in some dialect we haven't defined
> yet).  The same could be done for querying with a query element.  I
> would suggest keeping selection, modification, query and return format
> as separate elements.  This makes the language much more modular and
> easy to plug together as needs be (elements used in different
> combinations for different operations).  From an implementation
> perspective, they are different facets anyway.

I agree, we could have a filter language that is used for: extraction,
modification, listing mandatory elements.

- extraction would require an additional specification of what exactly
should be returned back (entire records, or only a list of job IDs?). This
could be done by specifying a list of QNames pointing to the (complex)
elements that should be returned. The default would be a pointer to the
(Job)UsageRecord element, i.e. the entire record. But a user can also
specify that of the matching records (s)he wants only the job IDs and the
MachineName (or whatever).

- modification would require to additionally specify _what part_ of the
record should be modified (use QNames as well?) and _how_ (allow for
replacement or increment).

>
> I haven't sent this email to the rus-wg mailing list as it's massive
> and I'm not sure of the politics of sending lots of design ideas
> through a mailing list where a lot of passive observers are
> registered.  Feel free to forward it if you think it's appropriate.

I favor the idea to send design ideas through the mailing list. Noone is
forced to read through every bit, but those that want can make comments
and suggest different approaches.

>
> I hope you find this material useful.  See you at OGF25!
>
> Josh
>

Thanks for the work and the material, it is useful! I don't know yet
whether I can be at OGF25 since my institute is definitely not going to
pay for the trip ... :o(
Let's see ...

Rosario.