[ogsa-rss-wg] Re: [ogsa-wg] Teleconference minutes - 2 November 2005

Mon Dec 12 04:27:00 CST 2005

Karl Czajkowski wrote:
> Dave, I think the interesting point of collision with "candidate set
> generation" and "planning", e.g. traditional scheduling, is when you
> consider non-trivial QoS regimes.  For example, when services do not
> give best effort service to all callers, and you cannot just assume
> statistical averaging to predict future service based on past, etc.

I prefer to think of this in terms of resource selector services having
a non-trivial dependency on the security context, and in particular upon
the user's role. The key is that optimization from a user's perspective
is only possible within the context of the relevant user role(s), since
it is that which determines what resources are visible to those users.
Trying to use information given to one role with another isn't going to
be helpful at all.

> Then, there is a significant difference between a scheduler or broker
> who returns advisory information and one who returns authoritative
> information.

I don't see a major difference, or at least, not until you start having
nailed-down reservations attached. Without any committed reservations, a
candidate execution plan is indistinguishable in the case of it being
issued by an advisory or an authoritative broker. Furthermore, it was
agreed at the last GGF that neither of the RSS services would cause any
reservations to be entered into. By that, I mean that no consumption of
resources would start that could be charged to the user; systems are
naturally free to try to pre-stage configurations in response to queries
for candidates if they so choose, but they will be doing the service
grid equivalent of speculative execution (and, for example, won't be
able to start staging files using the user's identity) and if things
fall through, it will be the provider who will have to swallow the
costs. (I forsee this sort of risk being the foundation of a viable
business model in some circumstances, but would not want to force anyone
to adopt it.)

> In the advisory case, he is essentially a discovery service helping
> you locate services with nice properties observed in the usefully
> recent past.  However, until you act on that information by attempting
> to acquire service, you have no idea whether the service is actually
> nice or not for your goals.  He may be suddenly congested, or have
> differentiated policies which make your request impossible to
> fullfill despite his overall nice status.
> 
> In the authoritative case, the scheduler actually has some control
> over those remote services, i.e. (part of) their capacity is reserved
> to be used at his discretion.  In this case, he can actually
> determined when and how much service you should obtain, and inform you
> of a stateful adjustment to his overall resource plans that includes
> you.

I see that as part of an advance-reservation protocol, and not as part
of the CSG/EPS. The reason for this is that different applications will
need different strategies for recovery from this sort of failure. Some
will want to just go straight back to the user and say "it can't be
done" but others will wish to try some number of execution plans first.
Because of the wide variety of possibilities involved, it looks like a
generalized workflow problem (not just a job workflow problem) and it is
therefore outside the scope of RSS and instead inside the "Job Manager"
wooly cloud^W^Wblack box.

The other point is that there is no way to stop a resource from going
away unexpectedly or a client from failing to consume its allocation.
The resource might get hit by a disaster (natural or man-made), the
client might die, etc. But this isn't a new problem; it occurs in the
world of business every day and they deal with it. We should learn from
them (e.g. by describing the consequences of such failures in terms of
finanicial penalties) and not reinvent this particular wheel.

> There might be protocol differences to support these cases, e.g. an
> authoritative answer might carry some rights assertion that you can
> present to the service, or the broker may have to go update remote
> services behind the scenes but before you manage to contact them.
> Also, the authoritative broker may demand that you "decline"
> allocations he has issued you, while you can presumably walk away from
> an advisory information source since no stateful allocation has been
> made.

What you think of as brokering is more than what I think of as brokering
it seems. In my view of things, what you call brokering is what I would
describe as a higher-level service built on top of brokering (i.e.
brokering plus reservation). While this is an interesting topic to work
on, I'm not convinced that enough of the answers exist in more than one
place for it to be worth pushing forward with this wider stuff at this
stage. Once we (as a community) have a bit more implementation
experience, the time for standardizing this stuff will be a lot riper.

> I do think you are on the right track to call a ranking function
> "policy", and in non-trivial scheduling regimes I think you will
> always have to present such policy rather than being able to obtain a
> sufficient picture of the environment to make decisions yourself.
> This is because of several things:
> 
>    1. You will not get an atomically consistent view of the environment
>       to act on, while the remote manager may have such a mechanism.

That's true. An atomically consistent view would require locking of
databases (or equivalent) across organizations, and is therefore a total
non-starter.

>    2. You will most likely not get a complete picture of the future
>       service allocation plans, in the event of advance reservation.
>       It would be too much data to exchange for each request; it might
>       include confidential information; and it might not be clearly
>       separable from the scheduling algorithm that might include
>       statistical approximations and/or other hueristics.

At the University of Manchester we're looking into these things in more
detail. At the moment, it looks like the brokering world is going to be
categorizable into multiple types of service depending on the quality of
information available. This work is still actively ongoing though, so it
is a bit too soon to report on it.

>    3. You will not get a complete picture of the differentiated polcicies
>       that different brokers or services apply to specific individuals,
>       because it may be confidential and is also meaningless without a
>       global view of other competing activities.

(I think I've covered this point already.)

> Thus, I think a significant step of ANY distributed planning exercise
> with non-trivial QoS will be the sort of handshake captured in
> WS-Agreement: make an "offer" bearing policy expressions that describe
> what you want; allow the authoritative resource manager to consider
> this offer among others and its stateful policy and resource
> availability models; obtain an answer of whether the manager can
> arrange services according to your offer; and (optionally) introspect
> to find out HOW the manager will provide you service.
> 
> All interface "refactorings" are not equivalent across protocol
> boundaries between parties with different authority and trust roles...

True, but I'm not at all convinced that WS-Ag is quite in the sweet spot
either. It would help a lot if it was easier to tackle the document
parts of the spec separately from the service parts because at the
moment, the sheer size of the overall spec and the feeling that you have
to read virtually all of it to understand it[*] is scareing some people off.

Donal.
[* FWIW, I found I had to read not just the spec but also some of the
    presentations about it too to understand what was going on. To me,
    that indicates that the spec document itself has not yet captured all
    that you intend it to. ]