[UR-WG] Aggregated usage records in SGAS
Bart Heupers
bart at sara.nl
Thu Sep 23 06:20:16 CDT 2010
Hello Henrik,
How do you handle jobs that use more cpu's or cores (MPI jobs) in aggregated usage records?
For DEISA we are currently using the following summary or aggregate record which was defined with AUR in mind, but we introduced a the JobTime which is defined as the WallTime times the number of CPU's or cores, so that we can meaningfully add jobs that use different number of cpu's. Ad it was renamed to SummaryRecord because it seemed more descriptive.
For these summary records we currently sum all there records for which StartTime >= (EndTime of UsageRecord) < Endtime. StartTime of period is inclusive and Endtime is exclusive. I think it is import to define this.
<SummaryRecord>
<RecordIdentity recordId="SARA-b81ff3a539a32d3f2e55a3eb58e8b3ce7bf59edc" createTime="2010-09-23T13:11:47.000Z"/>
<StartTime>2010-08-01T00:00:00.000Z</StartTime>
<EndTime>2010-09-01T00:00:00.000Z</EndTime>
<NumberOfJobs>43</NumberOfJobs>
<UserIdentity>
<GlobalUserName>username</GlobalUserName>
<ProjectName>project</ProjectName>
</UserIdentity>
<ResourceIdentity>
<SiteName>SARA</SiteName>
<MachineName>HUYGENS</MachineName>
</ResourceIdentity>
<JobTime>P4893DT5H17M52S</JobTime>
<CpuDuration>P4907DT21H1M15S</CpuDuration>
</SummaryRecord>
Best regards,
Bart Heupers
-----Original Message-----
From: ur-wg-bounces at ogf.org [mailto:ur-wg-bounces at ogf.org] On Behalf Of Henrik Thostrup Jensen
Sent: donderdag 23 september 2010 13:00
To: ur-wg at ogf.org
Subject: [UR-WG] Aggregated usage records in SGAS
Hi
Here is an actual use case for how aggregation can be done. In SGAS we
aggregate usage records into the followign format:
execution_date | date
insert_date | date
machine_name | string
user_identity | string
vo_issuer | string
vo_name | string
vo_group | string
vo_role | string
n_jobs | integer
cputime | numeric
walltime | numeric
generate_time | timestamp
The combination of the following fields is considered unique:
execution_date, insert_date, machine_name, user_identity, vo_issuer,
vo_name, vo_group, vo_role. Some of these can be null/non-existing
(vo_group and vo_role). The reason for seperating insert and execution
date as that some records arrive late when the registration process fails
for some reason. Most admins care more about registrations, where as usage
data usually uses execution time for statistics.
The three following fields are aggregated
number of jobs, summed cputime, summed walltime
The final field us when the record was generated.
This format aggregates quite well. 3.8M records aggregated into 18K
records in the NDGF accounting database.
I'm an in no way implying that all aggregations should look like that and
that all queries can be answered by such an aggregation. But for us, in
can answer most of the common queries, and time resolution per day is
typical enough.
Currently we don't use AUR in SGAS (we do use UR for all job records), but
might in the future (currently the methods for querying data is somewhat
limited).
Best regards, Henrik
Software Developer, Henrik Thostrup Jensen <htj at ndgf.org>
Nordic Data Grid Facility. WWW: www.ndgf.org
--
ur-wg mailing list
ur-wg at ogf.org
http://www.ogf.org/mailman/listinfo/ur-wg
More information about the ur-wg
mailing list