[glue-wg] When is data stale?

Florido Paganelli florido.paganelli at hep.lu.se
Mon Apr 20 09:00:21 EDT 2015


Hi Paul,

Thanks for the nice summary. Comments inline.

On 2015-04-20 12:38, Paul Millar wrote:
> [...]
> Validity:
> 
>   4.    Stephen: "an estimate of how long the information can
>     be reasonably trusted, irrespective of how the
>     system updates".
> 
>   5.    Jens: [I wasn't sure from his response]
> 
>   6.    Florido: "the expected time when the actual provider that
>     _generates_ the information will run again and update
>     that information."
> 
> So, Stephen and Florido seem to have opposite views of Validity.
> 

It might look different views but in practice is the same thing. If you
expect the data not to be valid, you will know only at the next update.
The difference in ARC is that our clients always contacts and checks the
resource, hence my answer. ARC clients do not care about time drifts in
the bdii-family hierarchy -- what counts for us is what endpoint to
contact to get the freshest information. OF course this cannot apply for
sw that does not accept direct requests to resource level.
But Stephen's definition is better IMHO, because I got tickets from
admins requesting me to set that data. For example a ComputingService
can have a Validity bound to its average uptime, even if the machine is
down, the information will still report its existence if cached somewhere.

> It seemed that Jens had a similar view to Stephen, at least he suggested
> Florido's concept be published as a nextUpdate attribute rather than
> Validity.
> 
> My question to Stephen: different clients may tolerate different levels
> of error/uncertainty (is 1% "good enough"?  how about 5%, 10%, or 20%?).
>  Given that "reasonably trusted" depends on the client, how to know what
> value is to be published?
> 
> My question to Florido: in ARC, how exactly is Validity property set? Is
> it hard-coded in the code, configured manually by the admin, passed to
> the info-provider script, or overwritten by the cron/refresh job?
> 

It is hard coded at the moment, so you might as well think that
Stephen's definition applies. We plan to change this in the future for
ComputingActivities, but we have performance issues that we don't manage
to overcome so it will stay like that until we find another solution. In
short, "static" data will always have a fixed timeout that might be even
configured by the admin, but this is not implemented. Dynamic data might
have a variable Validity that cannot be configured as it depends on the
object statuses.

> Just to add a little bit of "current usage", I did a quick survey using
> lcg-bdii.cern.ch.  Some 13% (13865 of 104853) GLUE2 objects currently
> published have a Validity attribute.  These objects have one of three
> values: ~0% (34 objects) have Validity of 1 minute, 1% (1080 objects)
> have Validity of 10 minutes, and 12% (12751 objects) have Validity of 1
> hour.
> 
> So, to a good approximation, only one Validity value is set: 1 hour.
> This narrow distribution suggests that, when set, Validity is hard-coded
> to some value.
> 

It is the case for ARC. I can tell you I had to modify this value
because by the time BDII picked it up, Validity had already expired. We
never had this problem before because in our old information system,
only one hierarchy level, the information was so small that there was no
time drift between creation and aggregation.

If I ever had to redefine Validity against aggregation, it should be the
time the data is to be considered valid taking into account eventual
overhead/drift caused by aggregation. This is why I say GLUE2 does not
speak about it; these values must have a different definition if
considering aggregation. This si also why Stephen defintion is the most
correct.

Cheers,
Florido
-- 
==================================================
 Florido Paganelli
   ARC Middleware Developer - NorduGrid Collaboration
   System Administrator
 Lund University
 Department of Physics
 Division of Particle Physics
 BOX118
 221 00 Lund
 Office Location: Fysikum, Hus B, Rum B313
 Office Tel: 046-2220272
 Email: florido.paganelli at REMOVE_THIShep.lu.se
 Homepage: http://www.hep.lu.se/staff/paganelli
==================================================


More information about the glue-wg mailing list