[glue-wg] When is data stale?

Paul Millar paul.millar at desy.de
Mon Apr 20 06:38:21 EDT 2015


Hi all,

Thanks for all your replies.

Let me try to summarise people's replies:

CreationTime:

   1.	this is the instant of time when the data represented
	in the object was collected.

   2.	It is unclear what value is to be used when the
	attributes come from multiple sources that are
	collected at different times.

   3.	In GLUE-based infrastructure that aggregates data from
	multiple sources, the aggregating agent must not
	update CreationTime.

This is pretty straight forward, but the second and third points are 
quite interesting.

It is claimed that GLUE 2 is agnostic on aggregation.  Quoting Florido: 
"the GLUE2 model [..] does NOT discuss aggregation".

Stephen, Jens and Florido were very clear on point 3., "shooting down" a 
logically self-consistent alternative interpretation of CreationTime, 
where a site- or top-level BDII adds a CreationTime if it is missing.

Rejecting this interpretation is fine (I rejected it, too).  However, 
the arguments doing so were (IMHO) not well thought out.

Stephen: "It would be incorrect, since it would not be the time at which 
the information was created." -- circular argument fallacy: CreationTime 
is the time information was created.

Jens: "No; that would make the value useless because it would not be 
used consistently with those that set it." -- straw-man fallacy: 
claiming inconsistency, but if GLUE 2 does not distinguish between 
resource- and other BDIIs then there is not inconsistency.

Florido: "No. CreationTime refers to the record related to the entity 
described. Only the resource provider that creates the datastructure can 
know such time. BDII levels above resource just copy the data." Again, 
straw-man fallacy as it requires GLUE 2 to distinguish between resource- 
and higher-level BDIIs.

Again, let me state I'm happy with rejecting the idea that site- and 
top-level BDIIs adding CreationTime.

However, the difficulty in describing the exact semantics of 
CreationTime suggests (to me) that GLUE 2 _does_ include the concept of 
aggregation, at least because it has CreationTime with different 
processing models depending on whether the agent is a primary data 
source (e.g., Resource BDII) or an aggregating source (Site- or Top- BDII).

Validity:

   4.	Stephen: "an estimate of how long the information can
	be reasonably trusted, irrespective of how the
	system updates".

   5.	Jens: [I wasn't sure from his response]

   6.	Florido: "the expected time when the actual provider that
	_generates_ the information will run again and update
	that information."

So, Stephen and Florido seem to have opposite views of Validity.

It seemed that Jens had a similar view to Stephen, at least he suggested 
Florido's concept be published as a nextUpdate attribute rather than 
Validity.

My question to Stephen: different clients may tolerate different levels 
of error/uncertainty (is 1% "good enough"?  how about 5%, 10%, or 20%?). 
  Given that "reasonably trusted" depends on the client, how to know 
what value is to be published?

My question to Florido: in ARC, how exactly is Validity property set? 
Is it hard-coded in the code, configured manually by the admin, passed 
to the info-provider script, or overwritten by the cron/refresh job?

Just to add a little bit of "current usage", I did a quick survey using 
lcg-bdii.cern.ch.  Some 13% (13865 of 104853) GLUE2 objects currently 
published have a Validity attribute.  These objects have one of three 
values: ~0% (34 objects) have Validity of 1 minute, 1% (1080 objects) 
have Validity of 10 minutes, and 12% (12751 objects) have Validity of 1 
hour.

So, to a good approximation, only one Validity value is set: 1 hour. 
This narrow distribution suggests that, when set, Validity is hard-coded 
to some value.

Here is a break-down of validity by object type:

                 	1min	10min	1hr
AccessPolicy     	[X]	[X]	[X]
AdminDomain         	[ ]	[X]	[ ]
ApplicationEnvironment	[ ]	[ ]	[X]
ComputingEndpoint	[X]	[X]	[X]
ComputingManager	[X]	[X]	[ ]
ComputingService	[X]	[X]	[ ]
ComputingShare   	[X]	[X]	[ ]
Domain             	[ ]	[X]	[ ]
Endpoint        	[X]	[X]	[X]
Entity          	[ ]	[ ]	[X]
ExecutionEnvironment	[X]	[X]	[ ]
Manager          	[X]	[X]	[ ]
MappingPolicy    	[X]	[X]	[ ]
Policy           	[X]	[X]	[X]
Resource         	[X]	[X]	[ ]
Service          	[X]	[X]	[X]
Share            	[X]	[X]	[ ]
StorageEndpoint     	[ ]	[ ]	[X]

Cheers,

Paul.


More information about the glue-wg mailing list