[glue-wg] When is data stale?
Paul Millar
paul.millar at desy.de
Mon Apr 20 13:46:26 EDT 2015
Hi Stephen,
On 20/04/15 18:35, stephen.burke at stfc.ac.uk wrote:
> glue-wg-bounces at ogf.org [mailto:glue-wg-bounces at ogf.org] On
>> Behalf Of Paul Millar said: Sent: 20 April 2015 11:38 Stephen: "It
>> would be incorrect, since it would not be the time at which the
>> information was created." -- circular argument fallacy:
>> CreationTime is the time information was created.
>
> I don't see why it's circular. The BDII doesn't create anything,
> ergo it should not change the creation time, any more than
> translating it into XML.
It's circular because you define CreationTime in terms of the time
something is created. This is either circular argument or a
semantically null sentence -- you choose :)
The underlying problem is your definition uses the term "create",
without defining what this means. How do I know when information is
created: what is it like before? what is it like after? what has changed?
In part, the problem comes because GLUE-2 is completely mum on all the
machinery of maintaining information. There's no mention of
information-providers. There's no mention of information being added,
updated or removed.
This may be as-intended, however, it makes defining CreationTime difficult.
My usual exercise: try defining CreationTime without using the words
"creation" and "time".
>> Florido: "No. CreationTime refers to the record related to the
>> entity described. Only the resource provider that creates the
>> datastructure can know such time. BDII levels above resource just
>> copy the data." Again, straw-man fallacy as it requires GLUE 2 to
>> distinguish between resource- and higher-level BDIIs.
>
> I don't understand your point at all here. Let's try again: the
> creation time is the time the *information* represented in a GLUE
> object is created.
Again, a circular (or semantically null) defn: you're defining
CreationTime using the phrase "the time [something] is created".
Put another way, the concept of 'information being created' is too loose
a term: it could mean almost anything, so defines nothing.
> That information may be copied, translated or stored in many
> different formats, none of which has anything to do with the
> information itself, i.e. the values of the various attributes.
How do you distinguish between information being copied from some other
BDII and being copied from an info-provider?
In fact, these are basically the same. BDII even treats them the same:
they are just two potential sources of information.
The only distinction between being a resource-, site- or top-level BDII
is where it fetches its information.
>> However, the difficulty in describing the exact semantics of
>> CreationTime suggests (to me) that GLUE 2 _does_ include the
>> concept of aggregation, at least because it has CreationTime with
>> different processing models depending on whether the agent is a
>> primary data source (e.g., Resource BDII) or an aggregating source
>> (Site- or Top- BDII).
>
> The thing which sets the CreationTime is not the BDII at any level,
> it's the information provider.
"BDII" and "information provider" are not defined in GLUE 2 (except in
Appendix A.). Therefore, cannot contribute towards the definition of
CreationTime.
In case it isn't obvious, I agree that CreationTime should be set by the
info-provider and not modified by any BDII.
What is interesting is that (apparently) one cannot describe this
desired behaviour rigorously in GLUE-2, despite everyone being agreed
this is the desired behaviour and the other behaviour is plain wrong.
To me, this points to a deficiency in GLUE 2.
[...]
> As far as I know we have no clients which make use of it. The only
> use I'm currently aware of is the glue validator, where the goal is
> to spot services which are stuck or otherwise faulty.
Incidentally, I've noticed some old objects hanging around in
lcg-bdii.cern.ch, but only because I started publishing CreationTime and
was checking published values.
Do you know if the glue validator is being run against production
top-level BDII instances?
> If other use cases arise people would have to look at the details to
> decide what to do - bearing in mind the constraints of the system,
> e.g. that the top BDII can't manage freshness of much better than an
> hour.
One hour! Why doesn't someone fix this?
> I don't see that this is different to any attribute - what you
> publish needs to be driven by the use cases. It wouldn't be
> especially difficult to publish a different Validity for each object
> type, or even for e.g. different batch systems, but unless you have
> something to specify the use there's nothing to motivate such a
> varying choice.
My use-case was what you might expect: allowing detection of a
particular failure mode. Specifically, the information publishing "got
stuck" at one site. The details don't matter, but the result was old
("stale") data continued to be re-published.
What I'd like is for that to be detectable; even if that detection
doesn't come out-of-the-box.
GLUE-2 seems to support this, with CreationTime & Validity.
However, the devil's in the detail, and it seems Validity cannot be used
like this, without hard-coding some arbitrary numbers.
> In my info provider it is indeed hard-coded to 1 hour - as I say it
> would be easy enough to change it, but there's no current demand.
OK, but why 1 hour and not 1 minute or 1 day?
Cheers,
Paul.
More information about the glue-wg
mailing list