[glue-wg] When is data stale?

Tue Apr 21 11:17:49 EDT 2015

Hi Florido,

Thanks for your reply; my comments below.

On 21/04/15 11:53, Florido Paganelli wrote:
> I also have the feeling the discussion is becoming a bit sterile. We can
> make the GLUE2 spec better but I hardly understand how Paul definitions
> without using the actual terms we want to define could help.

Sorry, it was meant only as an aide towards writing good descriptions. 
It's certainly not a requirement.

> On 2015-04-20 19:46, Paul Millar wrote:
>> Put another way, the concept of 'information being created' is too loose
>> a term: it could mean almost anything, so defines nothing.
>>
>
> Well, this is a rhetorical game and not a scientific discussion anymore
> IMHO. I understand you want a definition out of the practical
> implementation, and since you seem to like riddles, I will avoid the
> words creation and time (at this point a mere exercise of wording)
> here it is:
>
> The CreationTime is the number of seconds elapsed since the Epoch
> (00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970)
> formatted as described in the GLUE2 document when BOTH these two are true:
> 1) the GLUE2 record for a GLUE2 entity is being generated
> 2) the data contained in the record, that is, the data that describes
> the entity the record refers to, is being collected.

Great, thanks for taking the time to define this.

 > I see no fallacy nor circularity. It's a definition. It does
 > NOT require the knowledge of provider, resource- whatever-BDII

Yes, absolutely.

> Of course, if you want to be really picky there is a time drift between
> 1) and 2) because a Turing machine is sequential. But we can avoid this
> discussion I hope...

Certainly, despite evidence to the contrary, I don't want to nitpick.

Now, I believe your definition also applies to a site-level BDII.  When 
it refreshes information, it generates a new record and populates this 
with information it collects from the resource-level BDII.  Conditions 
1) and 2) are satisfied, so the site-level BDII may set CreationTime.

There's a (translational?) symmetry between a site-level BDII fetching 
information from resource-level BDIIs, and a resource-level BDII 
fetching information from info-providers.

Having said that, the problem only appears in hierarchical systems, like 
BDII.  So, perhaps having a hierarchical profile document would be a 
better way of solving this.

> I can provide a similar definition for Validity if you like... but I
> will shift to Stephen's suggestion that this is community-driven, but
> it's not because of the model, it's because what is "Valid" is community
> driven, and by experience I can tell it will be even if you try to
> define it otherwise!

I guess it's unclear to me what should happen if CreationTime+Validity 
is in the past.  From what others have said, it seem we make no claims 
what this means; the client must decide.

My naïve thinking was that, if information is updated periodically and 
CreationTime+Validity is in the past then the data should be considered 
"stale" as it should have been updated by now.

> Maybe the only real outcome of this discussion is Jens' comment that
> 'Validity' was a bad name! :D

Yeah, I think that's true!

[..]
>> To me, this points to a deficiency in GLUE 2.
>
> I do not see the needs to describing it in the model. One describes that
> in an implementation of a hierarchical information system (today only
> BDII and maybe EMIR, which nobody uses)
>
> Otherwise we need a model that takes into account hierarchical
> propagation of information (as mentioned before, an aggregation model)
>
> But for me having the above in the GLUE2 model sounds like if physicist
> should describe the Standard Model in terms of the pieces of paper,
> emails, research papers, people, historical events needed to describe
> the physics in it...

:-D

OK, perhaps this could be in a separate document (a profile?) that 
describes a hierarchical GLUE system?  That could refine concepts, like 
CreationTime, describe how aggregation happens, etc.

This would avoid "polluting" GLUE-2 base document with these 
hierarchy-specific issues.

>>> [...]
>>> I don't see that this is different to any attribute - what you
>>> publish needs to be driven by the use cases. It wouldn't be
>>> especially difficult to publish a different Validity for each object
>>> type, or even for e.g. different batch systems, but unless you have
>>> something to specify the use there's nothing to motivate such a
>>> varying choice.
>>
>> My use-case was what you might expect: allowing detection of a
>> particular failure mode.  Specifically, the information publishing "got
>> stuck" at one site.  The details don't matter, but the result was old
>> ("stale") data continued to be re-published.
>>
>
> In ARC, we decided long time ago that the information system should NOT
> be used as a monitor for the information system itself. If one does that
> it does it at his own risk; the reason lies behind the fact that the
> information system is more like a business card. It presents services to
> users. It might fake some of the information to please the
> users/communities needs, or to hide faults
> in the system in a way that the overall system still works (and this is
> what actually happens!)
>
> Using the information system as a monitoring tool requires a different
> approach, namely, the information system itself must be able to
> self-diagnose. Apart from the philosophical question if this is even
> possible, for ARC this is difficult because the information system is
> part of/triggered by other parts of the middleware: if the middleware
> dies the infosys dies with it. This is not up to GLUE2 to define, and is
> not part of most current architectures, and to me it indicates that
> proper monitoring should be done with third party tools. As a matter of
> fact that claim applies to most software.
>
> So if you want to know if the information publishing "got stuck" you'd
> better be a good sysadmin and use a decent process monitoring tool, let
> it be Nagios or a simple cronjob that sends emails...

As with all things: hindsight is 20-20 and failure modes oft choose the 
gaps in monitoring.

In this particular case, the "mechanical" refresh process was working 
correctly, with the site-level BDII fetching data correctly.  Direct 
monitoring of BDII/LDAP object creation time (the built-in 
'createTimestamp' attribute) would not have revealed any problem.

Publishing CreationTime and Validity (with the semantics of 
now()>CreationTime+Validity => problem) would have allowed a script to 
detect the problem.

This isn't to say this is the only way of achieving this, nor that it is 
necessarily the best way; however, it did seem to fit with the idea of 
CreationTime and Validity.

Publishing just the CreationTime allows a script to detect the problem, 
provided it happens to know the refresh period.  Although this is less 
idea, it's probably the best I can do, given everyone else feels 
Validity has a different meaning.

Cheers,

Paul.