[glue-wg] When is data stale?

Florido Paganelli florido.paganelli at hep.lu.se
Tue Apr 14 10:43:45 EDT 2015


Hi Paul,

Prologue: in the ARC information system model we always thought that
having intermediate stages with a lot if information between the
generation of information and its propagation is useless.
Time drifts are not negligible.
For this reason everything that doesn't come from the source is not
trusted in ARC. One must ask directly the resource level to be sure.
This might have performance drawbacks but keeps freshness of information
consistent. Our own information index was based on
these hypotheses, unfortunately is not easy to maintain and was never
used outside the NorduGrid Consortia.

Hence, for me as an ARC developer, but also for Stephen and Jens for
other reasons, validity has only one meaning: the expected time when the
actual provider that _generates_ the information will run again and
update that information. The fact that you read it in a top-bdii doesn't
change this, since all the BDII system is inherently asynchronous. It
was designed like that for performance and technology reasons.
If you want information about the time drift of hierarchical collection,
then you should have additional fields in GLUE2 to represent the
aggregation hierarchy and its steps.

As I stated many times, the GLUE2 model itself just models information
representation of a single entity/source, but does NOT discuss aggregation.
The three-level BDII architecture, presented also in the LDAP
realisation document, actually contains the first and only existing
description on how to do such aggregation.

Your arguments apply to information aggregation, that requires a
completely different approach to me. This is the reason why in the
EMI project we tried to develop the EMIR service as an alternative to
top-bdii, with an alternative architecture and a data model that was not
limited to GLUE2, but it included it. GLUE2 does not define aggregation
strategies, neither its fields take it into account.

In the light of the above, please read my comments on the three
questions below:


On 2015-03-25 18:41, Paul Millar wrote:
> Hi all,
> 
> I had a recent discussion with a site over the meaning of
> Entity.CreationTime and Entity.Validity.
> 
> I wanted to share my thoughts and see whether others have the same view
> and whether this might be worth documenting: either as part of an
> updated GLUE document or as some auxiliary document (a "GLUE Processing
> Model" perhaps).
> 
> The core question is this: how to know when published data is stale?
> 

when the client decides it is -- it may vary depending on information
and use cases

> [...]
> Assuming you agree with this approach, there are some open questions:
> 
> 1. should the agent (BDII) reset Entity.CreationTime (I'd say "no"),
> 

No. CreationTime refers to the record related to the entity described.
Only the resource provider that creates the datastructure can know such
time. BDII levels above resource just copy the data.

> 2. should the agent (BDII) add an Entity.CreationTime if the source does
> not provide one?
> 

No. That makes no sense for the same reason above -- CreationTime refers
to the record creation time for the described entity -- the record
existed BEFORE an intermediate BDII level collected it. Changing such
information would be faking it, as we don't know when it was created.

> 3. what should the agent (BDII) do when faced with stale data?  Should
> it simply log a warning or should it reject the data?
> 

Depends on use cases and nature of data. Example: an ComputingEndopint
is supposedly more persistent that the job statistics contained in a
ComputingShare. The former can generate a warning because the endpoint
MAY NOT be reached; the latter MUST be rejected and retrieved anew as it
could drive to an incorrect brokering/selection of a resource if outdated.

Handling stale data depends on the nature of the data itself IMHO.

Cheers,
Florido

-- 
==================================================
 Florido Paganelli
   ARC Middleware Developer - NorduGrid Collaboration
   System Administrator
 Lund University
 Department of Physics
 Division of Particle Physics
 BOX118
 221 00 Lund
 Office Location: Fysikum, Hus B, Rum B313
 Office Tel: 046-2220272
 Email: florido.paganelli at REMOVE_THIShep.lu.se
 Homepage: http://www.hep.lu.se/staff/paganelli
==================================================


More information about the glue-wg mailing list