[glue-wg] When is data stale?

Tue Apr 21 14:07:20 EDT 2015

Hi Stephen,

First, I must apologise if you felt my emails were in any way abusive 
--- they were certainly not intended that way; rather, I would like the 
effort we have all invested in GLUE and the grid infrastructure be used 
properly.

Currently, I see different groups developing their own information 
systems, running in parallel with GLUE+BDII, because of problems (both 
perceived and actual) with BDII.  I would like these problems addressed 
and find the very slow progress frustrating.

Onto the specific points...

On 21/04/15 13:00, stephen.burke at stfc.ac.uk wrote:
> Paul Millar [mailto:paul.millar at desy.de] said:
>> From your replies, you appear to have an internal definition of a
>> CreationTime that is to yourself clear, self-obvious and almost
>> axiomatic. Unfortunately, you cannot seem to express that idea in
>> the>> terms defined within GLUE-2.
>
> OK, let's have one more try. The concept which you seem to think is
> missing is "entity instance". That may not be explicitly defined but
> it's a general computing concept, and I find it hard to see that you
> could make much sense of the schema without it. The schema defines
> entities as collections of attributes with types and definitions; an
> instance of that entity has specific values for the attributes. One
> of those attributes is CreationTime.  Instances are created in a way
> completely unspecified by the schema document, but whatever the
> method the CreationTime is the time at which that creation occurs
> (necessarily approximate since creation will take a finite time). If
> a new instance is created it gets a new CreationTime even if all the
> other attributes happen to be the same. However, if an instance is
> copied the copy preserves *all* the attribute values including
> CreationTime - if you change that it's a new instance and not a
> copy.

Thanks, that makes sense.

Just to confirm: you define two general mechanisms through which data is 
acquired: creating an entity instance and copying an entity instance.

In concrete terms, resource-level BDII+info-provider creates entity 
instances while site- and top- level BDIIs copy entity instances.  This 
breaks the symmetry, allowing CreationTime to operate only on 
resource-level BDIIs.

Perhaps such a description is trivial or "well known", but it seems to 
me that GLUE-2 when used in a hierarchy (like the WLCG info system) 
would benefit from such a description.  This could go in GLUE-2 itself, 
or perhaps in a hierarchy profile document.

>> The validator should expose bugs, not hide them.  How else are
>> sites going to fix these bugs.
>
> The point is that sites can't fix middleware bugs [..]

What you say is correct.  I would also say that only sites can deploy 
the bug-fixes.

> and hence
> shouldn't get tickets for them. If tickets were raised for errors
> which would always occur and can't be fixed until a new middleware
> release is available the validator would have been rejected - sites
> must be able to clear alarms in a reasonably short time. That's also
> why only ERRORs generate alarms - ERRORs are always wrong, WARNINGs
> may be correct so a site may be unable to remove them. Of course, the
> validator can still be run outside the Nagios framework without the
> known issues mask.

Yes, it's always a bit fiddly dealing with a new test where the 
production instance currently fails.

>> It would be good if we could check this: I think there's a bug in
>> BDII where stale data is not being flushed.
>
> Maria has been on maternity leave for several months, so all this has
> been on hold. I think she should be back fairly soon, but no doubt it
> will take a while to catch up. A couple of years ago there was a bug
> where old data wasn't being deleted, but it should be out of the
> system by now. Also bear in mind that top BDIIs can cache data for up
> to four days.

Sure, I knew Maria was away; but I was hoping there would be someone 
covering for her, and that the process wasn't based on her heroic 
efforts alone.

>> If the validator is hiding bugs, and the policy is to do so
>> whenever bugs are found, then it is useless.
>
> The policy is to submit a ticket to the middleware developers and
> keep track of it. There's no point in repeatedly finding the same
> bug.

Yes, that is certainly a sound policy.

>> AFAIK, there's no intrinsic reason why there should be anything
>> beyond a 2--3 minute delay: the time taken to fetch the updated
>> information from a site-level BDII.
>
> The top BDII has to fetch information from several hundred site BDIIs
> and the total data volume is large. It takes several minutes to do
> that. And site BDIIs themselves have to collect information from the
> resource BDIIs at the site. Back in 2012 Laurence did some tests to
> see if the top BDII could scale to read from the resource BDIIs
> directly, but the answer was no, it can cope with O(1000) sources but
> not O(10000). Also the resource BDII runs on the service and loads it
> to some extent so it can't update too often - a particular issue for
> the CE, which is the service with the fastest-changing data.

I'm not sure I agree here.

First, the site-level BDII should cache information from resource-level 
BDIIs, as resource-level BDIIs cache information from info-providers. 
This means that load from top-level BDIIs is only experienced by 
site-level BDIIs.

Taking a complete (top-level) dump only takes a few seconds.

paul at celebrimbor:~$ /usr/bin/time -f %e ldapsearch -LLL -x -H 
ldap://lcg-bdii.cern.ch:2170 -b o=glue > /dev/null
4.49

paul at celebrimbor:~$ /usr/bin/time -f %e ldapsearch -LLL -x -H 
ldap://lcg-bdii.cern.ch:2170 -b o=grid > /dev/null
5.15

Lets say it takes about 10--15 seconds in total.

A top-level BDII is updating by this process (invoking the ldapsearch 
command).  Assuming the process is bandwidth limited, this should also 
take ~10--15 seconds as the total amount of information sent over the 
network should be about the same.  (Note that this doesn't take into 
account TCP slow-start, so it may be a slight underestimate, but see 
below for why I don't believe this is a real problem.)

Lets assume the problem isn't bandwidth limited, that the update 
frequency is limited by latency of the individual requests to site-level 
BDIIs.

I surveyed the currently registered site-level BDIIs:

for url in $(ldapsearch -LLL -x -H ldap://lcg-bdii.cern.ch:2170 -b 
o=glue $(ldapsearch -LLL -x -H ldap://lcg-bdii.cern.ch:2170 -b o=glue 
'(GLUE2ServiceType=bdii_site)' GLUE2ServiceID|perl -p00e 's/\n //'|awk 
'BEGIN{printf "(|"}/^GLUE2ServiceID/{printf 
"(GLUE2EndpointServiceForeignKey="$2")"}END{print ")"}') 
GLUE2EndpointURL|perl -p00e 's/\n //g' | sed -n 's%^GLUE2EndpointURL: 
\(ldap://[^:]*:[0-9]*/\).*%\1%p'); do /usr/bin/time -a -o times.dat -f 
%e ldapsearch -LLL -x -H $url -o nettimeout=30 -b o=glue > /dev/null; done

This query covered some 318 sites.  The ldapsearch command failed for 5 
endpoints and the query timed out for 3 endpoints.

Of the remaining 310 sites, the maximum time for ldapsearch to complete 
was about 19.21 seconds and the (median) average was 0.44 seconds.  For 
82% of sites, ldapsearch completed within a second; for 92% it completed 
within two seconds.

Repeating this for GLUE-1.3 showed similar statistics.

This suggests to me that information from responsive sites could be 
maintained with a lag of order 10 seconds to a minute (depending); 
information from sites with badly performing site-level BDIIs would be 
updated less often.

I haven't investigated injecting this information: BDII now generates a 
LDIF diff which is injected into the slapd.  This is distinct from the 
original approach, which employed a "double-buffer" with two slapd 
instances.

Still, I currently don't see why a top-level BDIIs must lag by some 30 
minutes.

>> Yeah, typical grid middleware response: rewrite the software rather
>> than fix a bug.
>
> I could say that your response is typical: criticism without
> understanding.

Perhaps, but I have reviewed the BDII code-base in the past and I know 
roughly how it works.

My simple investigation suggests maintaining a top-level BDII with 
sub-minute latencies is possible with at least 80--90% of site-level BDIIs.

Of course I may be missing something here, but it certainly seems 
feasible to achieve much better than is currently being done.

Cheers,

Paul.