[RUS-WG] my comments to the proposals in "draft-19"

Thu Mar 29 13:25:49 CDT 2007

Hi all!

[attention: this is a monster email, but I hope it will help in getting 
started a fruitful discussion :o)
By the way, the length of this mail is one of the reasons why we should 
try to have _single_ or at least _limited_ proposals, not the entire 
document at once]

Here are my comments to the specificatin changes proposed by Xiaoyu 
(sorry for calling you with your family name in the last mails, I got 
mixed up :o) ("draft-19", but I still feel unhappy with naming the 
proposals a draft specification since this seems as if they would 
replace the previously agreed upon version (draft-17)) ...

I start with commenting a few things in the minutes of the last phone 
conference that regard the proposals:

[quote]
* Minutes for the Joint RUS/UR WG Telephone Conference, March 23rd, 2007
...
*** Draft 19 of the RUS specification

     Gilbert noted that the specification needs clarification on how to 
proceed
     if a user tries to extract a UsageRecord where she only has access 
to some
     of the elements in the UsageRecord. He suggests that access should be
     denied in this case. He also notes that this problem does not occur on
     writes.
[/quote]

Are there use cases that may include such fine-grained access rules, 
giving users only access to a part of a usage record but not all of it? 
If access is denied in this case, then the case is already handled by 
the RUS specification (both, draft-17 and the proposed modifications), 
since it is enough to return a RUSUserNotAuthorizedFault (or the 
equivalent in the proposed modifications) ... We might add a note that 
in this case the access to the entire record should be denied. But, if 
we should want to exploit the full possibilities of XPath (why return 
entire usage records if a user wants to have only a list of job IDs?) 
then this note wouldn't be necessary since in that case the user would 
get his results if she wants only accessible parts and would get a 
UserNotAuthorized if the XPath query tries to retrieve a part that is 
not accessible).

[quote]
     Gibert also suggest that a clarification should be inserted into the
     specification about how namespace mappings are passed to the XPath
     expressions used as search terms. He suggests that all namespace 
mappings
     that are in scope for the surrounding XML element should be available
     in the XPath expression.
[/quote]

I think they are automatically accessible with XPath (namespace mappings 
are simply attributes of the root element, aren't they?), but I might be 
wrong.

[quote]
     Michele pointed out that the RecordHistory element has been removed 
from
     the current version 19 of the draft.
     Xiaoyu explained that this was done to maintain compatibility with the
     published Usage Record Format that containes elements to record the
     create history, and that logging information about modification should
     be retrieved using implementation specific means.
     Gilbert suggested that a separate method should be used to extract 
audit
     data for a UsageRecord.
[/quote]

The current version with the wrapping RUS-UR is not incompatible with 
the UR format. If you receive a response document from a RUS server you 
have to extract the UR document from the response document anyway, what 
difference does it make to extract it directly from the response 
document or extract it from the RUS-UR document within the response 
document? After extraction, in both cases, what you get is a usage 
record compatible to the OGF-UR specification.
But generally the idea to remove the wrapping RUS-UR is good, since it 
removes the need to check that modifications are not made in the 
wrapping part, but only in the UR part. If we remove that, as Xiaoyu 
proposes, that will most probably simplify the implementation of a RUS, 
so I'm basically in favour of that.
But: I don't like the idea to put the record history into the UR itself 
instead, for two reasons: the RUS should avoid weherever possible to 
modify the received UR. UR documents should - wherever possible - remain 
in there original format, the RUS is only supposed to _store_ them, not 
to change them (even if that means just to add something). Second, each 
entry in the history requires (at least) two basic informations: WHO and 
WHEN, better even if there is a third information: WHAT has been 
modified. This is difficult to accomplish with the urf:Resource 
extensions, how to you specify that user A uploaded the record at date 
D1 and user B changes the CpuDuration (change C1) at date D2, just by 
using urf:Resource properties?
I prefer Gilberts idea of having a seperate method for retrieving the 
record history. That would allow to have the record history seprated 
from the UR in the database (that is in the database there would be a 
distinct record history document, mapped by the unique recordId), which 
has both advantages: RUS-UR removed as Xiaoyu suggested and original UR 
document not modified as I would prefer. It would also allow to be more 
detailed about WHAT has been changed in the seperate record history 
withour "overloading" the original UR with information most users are 
not interested in. And: It would allow to have clearly distinct access 
rules for records and modification histories. E.g. the record history 
might be accessible only to CE managers and VO-managers, while the URs 
might be accessible also to the grid users that submitted the jobs.

What do you think?

[quote]
     Xiaoyu asks about if the mapping of a possible
     <urf:Resource description="VOName" /> element should be specified 
in the
     RUS Core Specification and/or if it should be part of the advanced
     specification.
     Gilbert thinks that this property belongs into UR space and should be
     handled in the new URF version 2 and that a mapping should go into the
     advanced specification because the core specification does not concern
     itself with aggregation and therefor does not need to handle UR 
content.
[/quote]

I really badly want to be able to specify the VOName in a UR document, 
but as Gilbert pointed out this decision is up to the UR-WG. If we want 
to be 100% compatible with the OGF-UR format then we must explicitly 
require any elements that aren't in there. Unfortunately.
But: There are two "workarounds":
a) we still can _recommend_ the user of <urf:Resource 
description="VOName" /> for specifying the VO (but not require it)
b) a more sofisticated solution: we might change the way mandatory UR 
properties are specified.
Currently the method that informs RUS clients about mandatory elements 
(listMandatoryUsageRecordElements), allows to specify only a few 
standard UR elements. We might remove this restriction and allow also 
for attributes. For example, if a client asks the RUS server for 
mandatory UR elements, it might get a response like:

[...]
<urf:MachineName/>
<urf:CpuDuration />
<urf:WallDuration/>
<urf:Memory urf:metric="max"/>
<urf:Swap/>
<urf:Resource description="VOName"/>

But, be aware that this might easily compromise the interoperability 
between different implementations! So it is definitely preferable to 
push for changes in the UR format itself!

[quote]
     Xiaoyu asked also about the use of XUpdate in the 
RUS::modifyUsageRecords
     method.
     Gilbert suggested changing the XUpdate to XQuery update extensions 
since
     the XUpdate specification seems to not be maintained any more and never
     advanced beyond draft stage.
[/quote]

How do update extensions of XQuery work (sounds interesting)? Does 
anybody have a nice document in which I can read more about this?

No some comments directly on Xiaoyu's proposals:

First of all, the document itself has to state _clearly_ that it 
contains proposals and does not replace the currently agreed version 
(draft-17)! Otherwise people will be really confused. It is therefore 
preferable not to modify directly the specifications, but to write 
dedicated proposal documents instead (documents that describe the 
proposals, such that they are documented and can be discussed, but do 
not look like a specification themselves).
We should not make proposals BY modifying the specification, but make 
proposals TO modify the specification. The specification should be 
modified only when we have agreed on a modification, otherwise it will 
change very frequently and we won't have any stability. The 
specification should contain only modifications that are likely to 
remain there in the future, and not things that still have to be 
discussed (and maybe are rejected).
But as I already said in previous mail, by saying this I don't mean to 
scare anybody away, and actually I appreciate the time and energy that 
Xiaoyu is willing to invest in the RUS-WG!

In the following I discuss not only Xiaoyu proposals, but also some 
things that have been in draft-17 before, so please read carefully.

[I label my comments with number which makes it more easy to refer to 
them later]

1) comments on modified Abstract:

1.1) "... to accomodate ... grid economic model."

I would distinguish between billing and a grid economy model. In our 
case the RUS should be interesting for billing, but wether there is some 
economic model to balance the workload or demand and offer, is 
completely out of scope.

1.2) The last phrase of the Abstract in draft-17 is essential: "The RUS 
uploads (and provides) record of ...", maybe we can rephrase that to 
make it sound better, but we shouldn't remove it since the abstract 
should briefly discuss what a RUS actually does.

1.3) For the rest, the draft-17 abstract contained a lot of technical 
information that is out of scope for an abstract, and I think we should 
remove it as Xiaoyu intended.

2) comments on modified Section 1 - Introduction:

2.1) "... enables grid resource usage auditing and accounting as well as 
grid economic model.". The RUS doesn't enable grid economy models (see 
above), such models go far beyond that.

2.2) (in Section 1.1 Background): "The Resource Usage Service (RUS) is 
therefore being defined in this document to provide a basic 
infrastructure to support auditing, accounting and other high-level 
capabilities requiring usage information and to allow entities within 
the grid to extract information from the service on potentially 
aggregate resource use."

I wouldn't agree with that, we don't define the RUS as an infrastructure 
or with high-level capabilities that require usage records, those are 
issues of an implementation. A service that offers a RUS interface may 
be distributed, centralized, and might even switch on the coffee machine 
whenever job for James T. Kirk arrives :o)
What we define is an _interface_ to a service, we shouldn't go further 
than that otherwise it gets unlikely that already existing tools will 
adopt the RUS interface, if that means they have to change their 
behaviour, infrastructure, etc.
The same is true when, as you did before, talking about data 
replication. How, when and if the service replicates accounting data is 
an implementation business and not a matter of the interface to the 
service. We should clearly distinguish between what is the service and 
what is the interface to it (and we should be concerned above all with 
the latter, and should leave the decisions of the first to the developer 
that better knows his needs).

I think there is some confusion about what is the purpose of the RUS-WG 
because of the name Resource Usage Service (RUS). Maybe it would be 
better to talk about a Resource Usage Service _Interface_ (RUS-I) to 
make sure we're taking about defining an interface to a service, not how 
the service itself has to work. We might also name the specification 
RUS-I instead of RUS, which is a little bit clearer, what do you all 
think about that?

Also, the usage scenarios described in Section 1.3 are implementation 
issues (and publicity that shouldn't really be in there; if you can all 
agree we might think about an additional document in which we describe 
implementation efforts and lessons learned, LCG-RUS, MCS, DGAS, SGAS, 
Unicore, etc.)

3) comments to Section 2 - Overview:

3.1) in Section 2.1 Architecture: "The Resource Usage Service’s primary 
purpose is to normalise operations upon usage records relating to the 
consumption of resources as described through the OGF Usage Records 
specification [OGF-UR]."

Definitely not, normalizing resource usage may be your specific use-case 
with LCG-RUS, but that's an implementation specific feature and has not 
much to do what the RUS-WG should define. We shouldn't force everybody 
who implements a RUS interface for his service to always normalize data. 
And even if did so, then we whould have to define _how_ is normalized, 
you most probably know that there are huge discussions going on about 
normalization and that actually nobody has an all-fitting solution.

The primary purpose of RUS is what was written before (draft-17) in this 
Section: "... stores records relating the consumption of resources ...", 
whether a specific implementation normalizes that data or not is another 
issue. And actually I would even state that a RUS service _must not_ 
modify the original UR documents (except upon user request through the 
methods defined for modification). That doesn't mean that it cannot do 
normalization (e.g. for aggregation purpose), but it would have to do 
that without altering the original UR.

3.2) in Section 2.3 - Scope: "we describe the Web Service interface 
definitions and configuration requirements for implementation runtimes".

Actually I don't understand well what you mean by "implementation 
runtimes", but in any case our scope is not to define implementation 
issues (only as far as they concern the interface)

4) comments to Section 3 - Configuration:

4.1) in Section 3.1.1 - Resource Manager: "... Wildcard may be used to 
indicate pattern-matched usage properties, domain name for instance 
(“*.cfs.ac.uk” for machines in the “.cfs.ac.uk” domain) ..."

Whether to use wildcards or not is completely implementation-dependent, 
the interface to the service shouldn't be concerned with _how_  excactly 
the service decides whether a user is authorized to 
upload/retrieve/modify a record. Maybe I would declare the entire 
Section 3.1 (Users and Authorisation) as a recommendation, because we 
shouldn't restrict implementations to that. For example: Not only 
resource managers, but also the resources themselves should be allowed 
to upload records (i.e. the sensors installed on a CE using the CE's 
host certificate). It would be enough to specify that read/write access 
can and _should_ be restricted, but what roles (simple grid user, CE 
accounting sensors, VO Manager, VO group manager, Grid Operations 
Manager, Resource Manager, Site Manager, etc ...) are defined and how 
user identities are mapped to these roles is an implementation issue. 
The only thing the interface needs to define is the possibility to reply 
with a "permission denied" (the "RUSUserNotAuthorisedFault" is enough 
for all possible security models). We must _enable_ different security 
models, not restrict our potential developers to use a specific one 
whether they want that or not.

This is also true for Section 3.2 - Fine-granularity Access Control. We 
should mark that as a recommendation for how security issues _might_ be 
handled. For example, if a simple grid user or a VO Manager or a VO 
group manager or whatever has only read or also write permission is not 
an issue of the interface to the service.

4.2) in Section 3.3 - Mandatory Usage Properties
(and Section 5.5.1 - listMandatoryUsageRecordElements)

As I already described above when I talked about the 
listMandatoryUsageRecordElements method we might allow to declare also 
specific urf:Resource extension elements as mandatory (for example for 
VOName), but this may well undermine the interoperability of RUS 
implementations (what if you require <urf:Resource 
description="VOName"/> and I require <urf:Resource 
description="VirtualOrganization"/>?).
Eventually we might also think about specific _values_ to be mandatory.
For example:
A RUS that wants only job records for VO "alice" might require:

<urf:Resource description="VOName">alice</urf:Resource description="VOName">

I suggest we think very well if we want to extend the mandatory elements 
from the ordinary UR properties (as in draft-17) to Resource extensions.

5) comments to Section 4 - Usage Record Format in RUS

5.1) in Section 4: "The usage records that move in or out of the RUS 
MUST be in the exact format as defined by OGF Usage Record schema [OGF-UR]."

I generally agree that it is a good idea to strip off the wrapping the 
RUS-UR that was used so far, not because it is incompatible with the UR 
format (it is enough to extract the compatible UR that is a prat of the 
RUS-UR), but because it is not really necessary. The additional record 
history that was in the RUS-UR can be handled apart, but I would suggest 
to handled it in a distinct document and not as a part of the OGF-UR, 
since mostly users won't be interested in it and will consult it only in 
the (hopefully) rare case of disputes about the authenticity of a 
record. Therefore I support what Gilbert suggested, to have a method 
apart that can be used to retrive the record history (see my comments 
above).

Eventually we might think about whether the possibility to retrieve the 
record history should be in the core specification or if it is enough to 
have it in the advanced specification (maybe not everybody requires 
that, and some might be alright with having the info only in the log 
files of their RUS server instead of being able to provide it to remote 
RUS clients).

5.3) in Section 4.1 - Record History: "Each usage record retrieved or 
inserted into the RUS MUST have record history information represented 
by two properties defined in the Usage Record XML schema [OGF-UR]:
* The “createTime” property of usage record identity 
(urf:RecordIdentity#createTime) stating when the record is produced;
* The “keyInfo” property of usage record identity 
(urf:RecordIdentity#keyInfo) containing the “X509SubjectName” of the 
user entity that create the record"

This information is defintely not sufficient for a record history, for 
two reasons:
A) the UR might be uploaded by a third person/tool (e.g. the UR is 
created by a accounting sensor on the CE using the CE's host certificate 
-> keyInfo points to the CE's host certificate; and then uploaded by a 
resource manager RM1 (a person) a day later). In this example you would 
loose the info that RM1 uploaded the file a day later, since you ahev 
just info about the creation of the UR
B) the modification hisory is very important as well, above all if more 
than one user/client has write access for a record.

5.3) in Section 4.1 - Record History: "Implementations that require 
other historic information (e.g. modification history) out of the scope 
of usage record representation may use Usage Record XML schema’s 
extension framework and declare those properties as mandatory usage 
properties (see 3.3) or obtain historic information from runtime 
environment (e.g. logging system)."

That doesn't make much sense. If you really want to store the record 
history within the UR document itself, then the corresponding elements 
shoult NOT be declared mandatory because that would mean that the record 
history must be present _before_ the UR can be stored in the RUS ... but 
the record history should record the storage and modification of the UR 
within the RUS ... actually, if you want to store the record history 
within the UR document you would have to make sure that it is NOT 
present when it is stored, which means that you have to restrict the use 
of the OGF-UR specification. Another reason why I would prefer to have 
the record history seperated from the UR document, as suggested by Gilbert.

5.4) in Section 4.1 - Record History: "Considering unlimited number of 
usage records can be encapsulated within a single usage record file, the 
size of record instance may oversize the limitation of runtime system or 
database engine (e.g. less than 5MB per XML file for [Xindice]). It is 
the implementation’s responsibility to enforce the size limitation at 
runtime.  Implementations MAY alternatively restrict only one usage 
record per file but the usage record SHOULD also started with 
“urf:UsageRecords” element as a valid usage record instance."

This is a very difficult issue and if done the way you propose will 
easily lead to a lack of interoperability (how do I know I can upload 
only one record with each request???). The database is not a problem, 
since the RUS implementation can take the received file and then extract 
single records and store tham, this would be completely transparent for 
the interface (the user can upload 1000 URs at once but the RUS puts 
them as single records in the DB). Database handling must be an 
implementation handling and the RUS interface should be completely 
independent of that.

5.5) in Section 4.2 - Record Uniqueness: "For usage records stored in 
RUS, a record identifier SHOULD ensure the global uniqueness of a single 
usage record. This is realised by use of the mandatory attribute, 
“urf:RecordIdentity#recordId” defined in Usage Record XML schema [OGF-UR]."

Between these two phrase I would add: "The record identifier MUST ensure 
at least uniqueness within the RUS instance.". This is necessary since 
we allow the recordId to be used to retrive unique records.

5.5) in Section 4.2 - Record Uniqueness: "Implementations MAY optional 
transform the data type into numeric data type at runtime in order to 
obtain efficient record matching at runtime."

Implementations may optionally do whatever they want as long as it 
doesn't interfere with the interface, I don't think there is the need to 
say that.

6) comments to Section 5:

6.1) in Section 5.1.1 - OperationResult: "2. An optional “Processed” 
element (xsd:unsigned-long).  The number of records successfully processed."

I would prefer to call that element "Accepted", since also rejected 
records have been processed by the RUS. Additionally we might think 
about having a "Rejected" element as well, although it may not be necessary.

6.2) in Section 5.1.1 - OperationResult: "A sequence of faults that 
indicates the reason of unprocessed individual usage record"

draft-17 explicitly says that this sequence has to have the same order 
as the received usage records, otherwise you won't which Fault is for 
what UR if you sent many of them ... Either the single fualts need to 
specify the recordId or the result list has to make it clear by 
respecting the right order. This would however meant that not only 
faults but also success messages will have to be returned for single 
records.

I see you propose to have the recordId in the fault types (which is 
perfectly ok, it is more precise than using the order), but there is 
quite some consufion: In the processing fault it is a part of the 
processing fault message child element (page 18), in the invalid fault 
is it part of the fault elemetn itself (page 19), in the unauthorised 
fualt it is completely missing (page 19), ... in some cases there can be 
a tmost one recordId, in others the number is unbounded ... For a RUS 
client this is difficult to implement, it is better to use the same 
approach for all fault types.

6.3) in Section 5.1.1 - OperationResult: In the schema excerpt: We 
should use the UCC (Upper Camel Case) convention (see 
http://en.wikipedia.org/wiki/CamelCase) for all elements, except 
methods. for methods and attributes we should use the LCC (Lower Camel 
Case) convention:

ThisIsUpperCamelCase
thisIsLowerCamelCase

(the difference is in the first letter). Something similar is used in 
many programming languages to distinguish Classes from methods/functions

Then it would be "OperationResult", not "operationResult" (there are 
more cases where "draft-19" contains LCC names for elements, we should 
check them all and use always the same style).

Eventually we should even for method names/elemnts use UCC, since there 
is a general convention to use UCC for all XML elements and LCC for all 
XML attributes (at least in the US, but most others use the convention 
as well):

http://en.wikipedia.org/wiki/National_Information_Exchange_Model

6.4) in Section 5.1.1 - OperationResult: On page 16 you removed (among 
other things) the note that many fault types are "optional because a RUS 
may not want to say why something has failed". This note is not strictly 
required, but its better to state that.

6.5) in Section 5.1.2 - Record Identity: "This specification is not 
intended to define further record identifier for usage records 
maintained in RUS."

This statement is confusing and can be understood only by those who know 
draft-17 and the wrapping RUS-UR with its RUSRecordId.

6.6)  in Section 5.2 - Faults (and throughout the document): I prefer 
element names starting with "RUS" over those starting with "Rus", since 
RUS is not a word, but an acronym.

Additionally we might allow for a RUSUnspecifiedFault.

By the way: I prefer UserNotAuthorised (draft-17) ove Unauthorised, 
there is no nedd to change something that is perfectly clear. If you 
want to change, then I would suggest the usual PermissionDenied.

6.7)  in Section 5.2 - Faults: We might allow for an optional error code 
in the RUSFaultType.

6.8)  in Section 5.2 - Faults: What is the additional "runtimeFaultMsg", 
"RUSProcessingFault" meant for?

6.9)  in Section 5.2 - Faults: the single fault types have a "total" 
attribute, what is it meant for (it's not described in the text)? I 
suppose you meant the number of faults. But if you have one fault per 
record this is redundant.

6.10)  in Section 5.2 - Faults: the unauthorised fault has a "user" and 
a "role" attribute. Why are they "required"?

6.11)  in Section 5.2.4: "The “RusUnauthorisedFault” is thrown by RUS 
permission model, which acts as a gateway to RUS service endpoint. The 
“RusUnauthorisedFault” therefore does not support fine-granularity fault 
notification on individual usage records. As the information model 
below, the “RusUnauthorisedFault” indicates user identity and the 
mismatched role the user is claimed to be."

First, if returning single faults and successes per record (as in 
draft-17) it is not true that this fault doesn't support 
fine-granularity fault notification on individual records.
Second, the "role" that a user "claimes" is something very case-specific 
(VOMS certificates with userFqan extensions that allow to determine the 
role of the user in the VO). That shouldn't go into the interface 
specification, at least not as "required". Of course implementations are 
free to use the user FQAN information if they have it.

6.12)  in Section 5.2.9: output of extractUsageRecords:
"Mandatory: The single usage file (urf:UsageRecords XML string) 
encapsulating matched usage records (urf:UsageRecord or 
urf:JobUsageRecord element)"

We haven't yet decided whether we should allow full XPath features 
(allow the extraction of only parts of URs; why should be provide always 
full URs if a user, for example, just wants a list of recordIds?)

I would encapsulate whatever the query result is an a kind of 
"QueryResult" element. It doesn't matter if in there are only recordIds 
or entire URs.

Also: if we allow full XPath we don't a specific extractRecordIds method 
since the user can simply determine by the XPath query what should be 
sent back.

6.13) in Section 5.3.1:  This port type allows users to modify a set of 
usage records identified by record identity with XUpdate expression. The 
charge service, for example, could make use of this port type to insert 
charge information to a usage record as with usage information calculated."

This is an excellent example for a meaningful use of the modification 
feature! That requires that the record history doesn't only contain who 
and when did an update, but also what was the exact update query (to 
know if cost informatyion has been overwritten and was previously there).

By the way, having a seperate record history (not in the UR itself) also 
remove the necessity to ensure that a user can't overwrite the record 
history. Only the overwriting of the recordId itself would have to be 
prevented, and since this is one of the few mandatory things according 
to the OGF-UR spec, this is perfectly conform and compliant.

Well this is alot for now ... already too much.
Let me have your opinions, if see there is some interest in the things I 
have suggested I can write them more clearly in a proposal document that 
we can take then as reference.

Cheers,

Rosario.