[glue-wg] When is data stale?

Paul Millar paul.millar at desy.de
Mon Apr 20 13:46:26 EDT 2015


Hi Stephen,

On 20/04/15 18:35, stephen.burke at stfc.ac.uk wrote:
> glue-wg-bounces at ogf.org [mailto:glue-wg-bounces at ogf.org] On
>> Behalf Of Paul Millar said: Sent: 20 April 2015 11:38 Stephen: "It
>> would be incorrect, since it would not be the time at which the
>> information was created." -- circular argument fallacy:
>> CreationTime is the time information was created.
>
> I don't see why it's circular. The BDII doesn't create anything,
> ergo it should not change the creation time, any more than
> translating it into XML.

It's circular because you define CreationTime in terms of the time
something is created.  This is either circular argument or a 
semantically null sentence -- you choose :)

The underlying problem is your definition uses the term "create",
without defining what this means.  How do I know when information is 
created: what is it like before? what is it like after?  what has changed?

In part, the problem comes because GLUE-2 is completely mum on all the 
machinery of maintaining information.  There's no mention of 
information-providers.  There's no mention of information being added, 
updated or removed.

This may be as-intended, however, it makes defining CreationTime difficult.

My usual exercise: try defining CreationTime without using the words 
"creation" and "time".

>> Florido: "No. CreationTime refers to the record related to the
>> entity described. Only the resource provider that creates the
>> datastructure can know such time. BDII levels above resource just
>> copy the data." Again, straw-man fallacy as it requires GLUE 2 to
>> distinguish between resource- and higher-level BDIIs.
>
> I don't understand your point at all here. Let's try again: the
> creation time is the time the *information* represented in a GLUE
> object is created.

Again, a circular (or semantically null) defn: you're defining
CreationTime using the phrase "the time [something] is created".

Put another way, the concept of 'information being created' is too loose 
a term: it could mean almost anything, so defines nothing.

> That information may be copied, translated or stored in many
> different formats, none of which has anything to do with the
> information itself, i.e. the values of the various attributes.

How do you distinguish between information being copied from some other
BDII and being copied from an info-provider?

In fact, these are basically the same.  BDII even treats them the same: 
they are just two potential sources of information.

The only distinction between being a resource-, site- or top-level BDII 
is where it fetches its information.

>> However, the difficulty in describing the exact semantics of
>> CreationTime suggests (to me) that GLUE 2 _does_ include the
>> concept of aggregation, at least because it has CreationTime with
>> different processing models depending on whether the agent is a
>> primary data source (e.g., Resource BDII) or an aggregating source
>> (Site- or Top- BDII).
>
> The thing which sets the CreationTime is not the BDII at any level,
> it's the information provider.

"BDII" and "information provider" are not defined in GLUE 2 (except in
Appendix A.).  Therefore, cannot contribute towards the definition of
CreationTime.

In case it isn't obvious, I agree that CreationTime should be set by the
info-provider and not modified by any BDII.

What is interesting is that (apparently) one cannot describe this 
desired behaviour rigorously in GLUE-2, despite everyone being agreed 
this is the desired behaviour and the other behaviour is plain wrong.

To me, this points to a deficiency in GLUE 2.

[...]


> As far as I know we have no clients which make use of it. The only
> use I'm currently aware of is the glue validator, where the goal is
> to spot services which are stuck or otherwise faulty.

Incidentally, I've noticed some old objects hanging around in
lcg-bdii.cern.ch, but only because I started publishing CreationTime and 
was checking published values.

Do you know if the glue validator is being run against production
top-level BDII instances?

> If other use cases arise people would have to look at the details to
> decide what to do - bearing in mind the constraints of the system,
> e.g. that the top BDII can't manage freshness of much better than an
> hour.

One hour!  Why doesn't someone fix this?

> I don't see that this is different to any attribute - what you
> publish needs to be driven by the use cases. It wouldn't be
> especially difficult to publish a different Validity for each object
> type, or even for e.g. different batch systems, but unless you have
> something to specify the use there's nothing to motivate such a
> varying choice.

My use-case was what you might expect: allowing detection of a 
particular failure mode.  Specifically, the information publishing "got 
stuck" at one site.  The details don't matter, but the result was old 
("stale") data continued to be re-published.

What I'd like is for that to be detectable; even if that detection 
doesn't come out-of-the-box.

GLUE-2 seems to support this, with CreationTime & Validity.

However, the devil's in the detail, and it seems Validity cannot be used 
like this, without hard-coding some arbitrary numbers.

> In my info provider it is indeed hard-coded to 1 hour - as I say it
> would be easy enough to change it, but there's no current demand.

OK, but why 1 hour and not 1 minute or 1 day?

Cheers,

Paul.


More information about the glue-wg mailing list