[glue-wg] When is data stale?

Wed Mar 25 13:41:05 EDT 2015

Hi all,

I had a recent discussion with a site over the meaning of 
Entity.CreationTime and Entity.Validity.

I wanted to share my thoughts and see whether others have the same view 
and whether this might be worth documenting: either as part of an 
updated GLUE document or as some auxiliary document (a "GLUE Processing 
Model" perhaps).

The core question is this: how to know when published data is stale?

Prima facie, this is easy: simply add Entity.Validity to 
Entity.CreationTime.  If the resulting time is in the past then the 
object is stale.

It becomes slightly tricky when considering an object that passes though 
several caching agents, with each agent pulling updated information 
periodically.

[For those that don't know, this is how the EGEE information system 
currently works; there are three caching levels ("resource-level", 
"site-level", "top-level").  Each level has the same software, "BDII", 
with different configuration.]

First off, I assert that the current info-provider model (a script that 
provides the current up-to-date information) cannot publish 
Entity.Validity.  With a pull-update model, the info-provider cannot 
know when the next request will come.  (To illustrate this, consider two 
agents calling the info-provider with different schedules, or an 
info-provider that supplies information "on demand" whenever a user 
clicks on some web-page).  As the info-provider cannot know when the 
next request will come, it cannot set an Entity.Validity value.

However, the info-provider can publish an Entity.CreationTime.  Let's 
suppose it does.  Let's also suppose that the first agent (the 
resource-level BDII) queries every two minutes.  Ideally, the BDII will 
read the data and calculates what Entity.Validity is needed so that 
Entity.Validity+Entity.CreationTime is two minutes in the future.  It 
then records this Entity.Validity along with the original 
Entity.CreationTime.

If (somehow) this data is observed after the two minutes has elapsed, it 
is clear that the data is stale.

Note that it is also possible (i.e., logically self-consistent) for the 
BDII to reset the Entity.CreationTime to the current time.  I suggest 
this isn't done as knowing the original creation time could be quite useful.

The same procedure would happen for the higher levels: each one would 
calculate a (potentially new) Entity.Validity so that the object will 
not be valid after it anticipates fetching fresh data.  This calculated 
Entity.Validity would replace any that already exists in the supplied data.

Assuming you agree with this approach, there are some open questions:

1. should the agent (BDII) reset Entity.CreationTime (I'd say "no"),

2. should the agent (BDII) add an Entity.CreationTime if the source does 
not provide one?

3. what should the agent (BDII) do when faced with stale data?  Should 
it simply log a warning or should it reject the data?

Any thoughts?

Cheers,

Paul.