[glue-wg] When is data stale?
Paul Millar
paul.millar at desy.de
Mon Apr 20 06:38:21 EDT 2015
Hi all,
Thanks for all your replies.
Let me try to summarise people's replies:
CreationTime:
1. this is the instant of time when the data represented
in the object was collected.
2. It is unclear what value is to be used when the
attributes come from multiple sources that are
collected at different times.
3. In GLUE-based infrastructure that aggregates data from
multiple sources, the aggregating agent must not
update CreationTime.
This is pretty straight forward, but the second and third points are
quite interesting.
It is claimed that GLUE 2 is agnostic on aggregation. Quoting Florido:
"the GLUE2 model [..] does NOT discuss aggregation".
Stephen, Jens and Florido were very clear on point 3., "shooting down" a
logically self-consistent alternative interpretation of CreationTime,
where a site- or top-level BDII adds a CreationTime if it is missing.
Rejecting this interpretation is fine (I rejected it, too). However,
the arguments doing so were (IMHO) not well thought out.
Stephen: "It would be incorrect, since it would not be the time at which
the information was created." -- circular argument fallacy: CreationTime
is the time information was created.
Jens: "No; that would make the value useless because it would not be
used consistently with those that set it." -- straw-man fallacy:
claiming inconsistency, but if GLUE 2 does not distinguish between
resource- and other BDIIs then there is not inconsistency.
Florido: "No. CreationTime refers to the record related to the entity
described. Only the resource provider that creates the datastructure can
know such time. BDII levels above resource just copy the data." Again,
straw-man fallacy as it requires GLUE 2 to distinguish between resource-
and higher-level BDIIs.
Again, let me state I'm happy with rejecting the idea that site- and
top-level BDIIs adding CreationTime.
However, the difficulty in describing the exact semantics of
CreationTime suggests (to me) that GLUE 2 _does_ include the concept of
aggregation, at least because it has CreationTime with different
processing models depending on whether the agent is a primary data
source (e.g., Resource BDII) or an aggregating source (Site- or Top- BDII).
Validity:
4. Stephen: "an estimate of how long the information can
be reasonably trusted, irrespective of how the
system updates".
5. Jens: [I wasn't sure from his response]
6. Florido: "the expected time when the actual provider that
_generates_ the information will run again and update
that information."
So, Stephen and Florido seem to have opposite views of Validity.
It seemed that Jens had a similar view to Stephen, at least he suggested
Florido's concept be published as a nextUpdate attribute rather than
Validity.
My question to Stephen: different clients may tolerate different levels
of error/uncertainty (is 1% "good enough"? how about 5%, 10%, or 20%?).
Given that "reasonably trusted" depends on the client, how to know
what value is to be published?
My question to Florido: in ARC, how exactly is Validity property set?
Is it hard-coded in the code, configured manually by the admin, passed
to the info-provider script, or overwritten by the cron/refresh job?
Just to add a little bit of "current usage", I did a quick survey using
lcg-bdii.cern.ch. Some 13% (13865 of 104853) GLUE2 objects currently
published have a Validity attribute. These objects have one of three
values: ~0% (34 objects) have Validity of 1 minute, 1% (1080 objects)
have Validity of 10 minutes, and 12% (12751 objects) have Validity of 1
hour.
So, to a good approximation, only one Validity value is set: 1 hour.
This narrow distribution suggests that, when set, Validity is hard-coded
to some value.
Here is a break-down of validity by object type:
1min 10min 1hr
AccessPolicy [X] [X] [X]
AdminDomain [ ] [X] [ ]
ApplicationEnvironment [ ] [ ] [X]
ComputingEndpoint [X] [X] [X]
ComputingManager [X] [X] [ ]
ComputingService [X] [X] [ ]
ComputingShare [X] [X] [ ]
Domain [ ] [X] [ ]
Endpoint [X] [X] [X]
Entity [ ] [ ] [X]
ExecutionEnvironment [X] [X] [ ]
Manager [X] [X] [ ]
MappingPolicy [X] [X] [ ]
Policy [X] [X] [X]
Resource [X] [X] [ ]
Service [X] [X] [X]
Share [X] [X] [ ]
StorageEndpoint [ ] [ ] [X]
Cheers,
Paul.
More information about the glue-wg
mailing list