[glue-wg] ComputingService and Endpoints, a point of view
Florido Paganelli
florido.paganelli at hep.lu.se
Mon Aug 27 06:08:57 EDT 2012
Hi Stephen,
On 2012-08-25 12:12, stephen.burke at stfc.ac.uk wrote:
> Florido Paganelli [mailto:florido.paganelli at hep.lu.se] said:
>>> What would you propose to do with Share, Resource and Manager?
>>
>> Same approach. As I said, this depends if we want to override the
>> associations or not. This cannot be represented in UML, but makes
>> sense in realizations.
>
> And what about the relations between them? And the same for the
> storage classes? I think this would be quite a big change which would
> need a significant advantage to be worthwhile, and so far I don't
> think you've given one.
>
There is no changes. As I said, UML cannot express inheritance so well
as implementation is straightforward.
But we have the opportunity to fix it in the realization documents that
are not final yet.
I did not spend time reasoning about the other associations, but if we
agree on a composition-driven approach (every specification adds, does
not overload) rather than a bare inheritance-driven approach (every
specification overloads associations) I see no problem whatsoever. We're
still fully consistent with the model, everything works as expected.
>> In LDAP, I would scope the search for endpoints starting from the
>> ComputingService,
>
> You can do that if you have chosen one specific ComputingService, but
> in your own example of a delegation endpoint which could serve
> computing or others kind of service, the current definition lets you
> search for all the endpoints which serve computing services but not
> the others.
>
Yes I understand what you mean. But what if I have a delegation endpoint
that can be used both for computing and for storage? should I replicate
such an endpoint in a ComputingService and in a StorageService? an in
that case the same delegation endpoint would be a ComputingEndpoint and
a StorageEndpoint, two different IDs. but in the end is the same endpoint!
How to express is the same endpoint? same ID? but then the record would
have different objectclasses and associations... It's kinda bad to have
differen records with the same ID.
I would rather call it Endpoint, add associations pointing to both the
StorageService and ComputingService it serves, give it the same ID and
place it in both Computing and Storage services.
>> but then give me something to relate a local information service
>> and its endpoints (some OpenLDAP service), or an independent
>> delegation Service to the box where the ComputingService is,
>> otherwise I run the
>
> As I already said, I think an information endpoint should be a
> separate Service. For a delegation service I can't say, it would
> depend on how closely it's bound to the computing service and what
> the use cases are.
>
>> risk of quering twice the information system(s) for no reason, and
>> submit jobs twice to the same endpoint because I cannot
>> distinguish between them.
>
> Queries are normally very lightweight compared with real service
> interactions like job submission, unless you're doing a very large
> number of them - querying twice is not a problem. Being able to
> recognise that you have the same Endpoint multiple times obviously is
> important, but I don't see why it would be difficult to recognise
> duplicates.
>
querying twice is a problem on big numbers. say I have 20 information
endpoints and 40 submission endpoints in an index, such as EMIR, in
which every Endpoint record has also the Service.ID of the Service the
endpoint belongs to.
A client retrieves all the 60 of them. Then, it might want to query
information endpoints to scan for submission endpoints.
Scenario 1)
I have Endpoints and ComputingEndpoints in a ComputingService.
I'll make it easy here. A single box might have more than one
information/submission endpoint, that means deciding which
information/submission endpoints belonging to the same box one doesn't
want to query. So, let's simplify the scenario and suppose submission
endpoints belong to different boxes and information endopoints belong to
different boxes.
BUT there might be information endpoints on the same box of at least one
submission endpoint.
Then, since Endpoints and ComputingEndpoints are in the same
ComputingService, IF the information endpoint has the same Service.ID of
a submission endpoint, the client might decide not to query it.
Operation cost: one comparison for each information endpoint and
submission endpoint at most, 20*40 = 800 ops
Scenario 2) Different services,
Endpoints in a Information Service and ComputingEndpoints in a
ComputingService.
We then have different Service.IDs for each endpoint, because
information endpoints belong to different services than submission
endpoints.
The client cannot know which relationship exists between services, and
then it must query information endpoints.
Suppose every information endpoint outputs 10 submission endpoints, some
registered to the index (i.e. belonging to the set of 40 taken from the
index) and some not (i.e. not in those 40 present in the index), ~200
endpoints.
As said, since there is no information on how information and submission
endpoints are coupled, I need to scan the information endpoints as I can
gather more submission endpoints there. A client cannot just suppose
that all the useful submission endpoints are in the index.
Hence I must check all the 40 submission endpoints in the index against
the 200 retrieved from the information endpoints , in order not to
submit twice to the same endpoint.
In the worst case is 20 queries to information endpoints + 40*200 = 8000
comparison operations, 8020 operations in total, and we're gone to the
next order.
The numbers are arbitrary, but I can tell you that ARC will have at
least 3 submission endpoints per box and you know what happens if you
take a site-bdii as an information endpoint (one might easily reach 10
there on big sites)
It is easy to see that as the number of job requests increases we might
occur in an incredible amount of work just to submit a single job. Of
course clients can use fancy ranking algorithms and or dynamic
programming to solve the problem better.
>> In my initial implementation I wanted to use the
>> service-to-service association described in GFD1.47 (page 7, page
>> 13); however I was told that this was not the purpose for it to be
>> there, but it was more to reflect some hierarchy between Services.
>
> I don't see how it could represent a hierarchy unless you had some
> other way to express it - Service-Service is a peer relation, there
> is no directionality (unlike e.g. Domain-Domain). In any case, as
> I've said repeatedly, the question is not what the purpose was when
> the schema was defined (none in particular as far as a I remember)
> but whether it can be used to satisfy whatever requirements you have
> now in a specific case. For the things you're describing this may
> well be sufficient.
>
It might be worth then pushing these associations records into an index.
Many developers are underestimating these associations in
implementations and I tend not to consider them reliable.
I can see that they were meant as an approach to database integrity with
a relational DB in mind.
These things nowadays are better realized via graph databases. Maybe the
IDs in the associations might be used as a foundation to query and build
a graph database of relationships between services, but this is dreaming
of the future :)
>> I think the flaw in such an association based approach would be
>> that the unique ID might be wrong at a certain point in time (for
>> example because of ID renewal) and not refer anymore to the record
>> it points to.
>
> Persistency of IDs is a separate question, and a general one - IDs
> must be persistent for as long as necessary for all the possible
> uses. ServiceIDs in particular should probably change only when
> services are reconfigured in a major way. If references to IDs can't
> be followed the whole schema will be unusable!
>
I agree on both these two comments! we must push for those IDs to be
crucial for implementations. Their value and importance for distributed
deployments to work has been underestimated, especially regarding the
rules regulating their persistence. I guess it is already part of you
EGI profile, Stephen.
Cheers,
--
Florido Paganelli
Lund University - Particle Physics
ARC Middleware
EMI Project
More information about the glue-wg
mailing list