[UR-WG] Aggregated usage records in SGAS

Thu Sep 23 05:59:33 CDT 2010

Hi

Here is an actual use case for how aggregation can be done. In SGAS we 
aggregate usage records into the followign format:

  execution_date | date
  insert_date    | date
  machine_name   | string
  user_identity  | string
  vo_issuer      | string
  vo_name        | string
  vo_group       | string
  vo_role        | string
  n_jobs         | integer
  cputime        | numeric
  walltime       | numeric
  generate_time  | timestamp

The combination of the following fields is considered unique:

execution_date, insert_date, machine_name, user_identity, vo_issuer, 
vo_name, vo_group, vo_role. Some of these can be null/non-existing 
(vo_group and vo_role). The reason for seperating insert and execution 
date as that some records arrive late when the registration process fails 
for some reason. Most admins care more about registrations, where as usage 
data usually uses execution time for statistics.

The three following fields are aggregated

number of jobs, summed cputime, summed walltime

The final field us when the record was generated.

This format aggregates quite well. 3.8M records aggregated into 18K 
records in the NDGF accounting database.

I'm an in no way implying that all aggregations should look like that and 
that all queries can be answered by such an aggregation. But for us, in 
can answer most of the common queries, and time resolution per day is 
typical enough.

Currently we don't use AUR in SGAS (we do use UR for all job records), but 
might in the future (currently the methods for querying data is somewhat 
limited).

     Best regards, Henrik

  Software Developer, Henrik Thostrup Jensen <htj at ndgf.org>
  Nordic Data Grid Facility. WWW: www.ndgf.org