[UR-WG] Aggregated usage records in SGAS

Thu Sep 23 06:20:16 CDT 2010

Hello Henrik,

How do you handle jobs that use more cpu's or cores (MPI  jobs) in aggregated usage records? 

For DEISA we are currently using the following summary or aggregate record which was defined with AUR in mind, but we introduced a the JobTime which is defined as the WallTime times the number of CPU's or cores, so that we can meaningfully add jobs that use different number of cpu's. Ad it was renamed to SummaryRecord because it seemed more descriptive.

For these summary records we currently sum all there records for which StartTime >= (EndTime of UsageRecord) < Endtime.  StartTime of period is inclusive and Endtime is exclusive. I think it is import to define this. 

<SummaryRecord>
<RecordIdentity recordId="SARA-b81ff3a539a32d3f2e55a3eb58e8b3ce7bf59edc" createTime="2010-09-23T13:11:47.000Z"/>
<StartTime>2010-08-01T00:00:00.000Z</StartTime>
<EndTime>2010-09-01T00:00:00.000Z</EndTime>
<NumberOfJobs>43</NumberOfJobs>
<UserIdentity>
<GlobalUserName>username</GlobalUserName>
<ProjectName>project</ProjectName>
</UserIdentity>
<ResourceIdentity>
<SiteName>SARA</SiteName>
<MachineName>HUYGENS</MachineName>
</ResourceIdentity>
<JobTime>P4893DT5H17M52S</JobTime>
<CpuDuration>P4907DT21H1M15S</CpuDuration>
</SummaryRecord>

Best regards,

Bart Heupers

-----Original Message-----
From: ur-wg-bounces at ogf.org [mailto:ur-wg-bounces at ogf.org] On Behalf Of Henrik Thostrup Jensen
Sent: donderdag 23 september 2010 13:00
To: ur-wg at ogf.org
Subject: [UR-WG] Aggregated usage records in SGAS

Hi

Here is an actual use case for how aggregation can be done. In SGAS we 
aggregate usage records into the followign format:

  execution_date | date
  insert_date    | date
  machine_name   | string
  user_identity  | string
  vo_issuer      | string
  vo_name        | string
  vo_group       | string
  vo_role        | string
  n_jobs         | integer
  cputime        | numeric
  walltime       | numeric
  generate_time  | timestamp

The combination of the following fields is considered unique:

execution_date, insert_date, machine_name, user_identity, vo_issuer, 
vo_name, vo_group, vo_role. Some of these can be null/non-existing 
(vo_group and vo_role). The reason for seperating insert and execution 
date as that some records arrive late when the registration process fails 
for some reason. Most admins care more about registrations, where as usage 
data usually uses execution time for statistics.

The three following fields are aggregated

number of jobs, summed cputime, summed walltime

The final field us when the record was generated.

This format aggregates quite well. 3.8M records aggregated into 18K 
records in the NDGF accounting database.

I'm an in no way implying that all aggregations should look like that and 
that all queries can be answered by such an aggregation. But for us, in 
can answer most of the common queries, and time resolution per day is 
typical enough.

Currently we don't use AUR in SGAS (we do use UR for all job records), but 
might in the future (currently the methods for querying data is somewhat 
limited).

     Best regards, Henrik

  Software Developer, Henrik Thostrup Jensen <htj at ndgf.org>
  Nordic Data Grid Facility. WWW: www.ndgf.org
--
  ur-wg mailing list
  ur-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/ur-wg