[glue-wg] Updated thoughts...

Wed Apr 9 12:48:31 CDT 2008

Hi Stephen, all,

On Tuesday 08 April 2008 16:45:48 Burke, S (Stephen) wrote:
> I haven't made comments yet, but this seems like a good place to start
> ... sorry there are quite a lot but I think it's worth trying to nail
> things down as much as possible.

Absolutely, this is a "trying to nail things down" document :)

In case it isn't obvious, I'm trying to describe only the object classes, not 
the attributes (not always successful here).  The hope is, once the object 
classes are clearly defined, the attributes become more obvious.

> > UserDomain:
> >
> >   A collection of one or more end-users.  All end-users that interact
> >   with the physical storage are a member of a UserDomain.
>
> Perhaps opening a can of worms, but it may also be possible for a
> UserDomain to include services, i.e. you might have services registered
> in VOMS as well as users (even with delegated credentials you may want
> to give privileges to services which the users don't have).

As a proposal, if we decide to include non-carbon-based entities as members of 
UserDomain, we could use "agents" instead of end-users.  An end-users being 
an example of agent.

For example, various production CAs are experimenting with issuing robot 
certificates.  These allow programs to achieve a certain amount of autonomy, 
but in an accountable (and, in some sense, controlled) way.  I believe these 
robot certificates are currently always tie to a specific person (or, at 
least, to that person's identity).

So, as a suggestion, we could replace "end-user" with "agent" and have an 
informative description saying something like:

	| The end-users are one possible agent.  A grid, may choose to allow
	| only end-users as agents, or it may decide to allow other 
	| agents, such as semi-autonomous programs that interact with the grid.

Alternatively, we could postpone this until 2.1.

> > StorageCapacity:
> >
> >   A StorageCapacity object describes the ability to store data within
> >   a homogeneous storage technology.  Each object provides a view of
> >   that physical storage medium with a common access latency.
>
> It isn't necessarily just the latency that matters, for example it may
> be useful to publish the Capacity of the disk cache in front of a tape
> system (see further comments below) - the latency is Online but the
> functionality is different from Disk1 Online storage.

I think we discussed this during the phone call.  The proposal was that
type be an open enumeration with "cache" as one option.

If that's OK with everyone, I'll try to update StorageCapacity accordingly.

> (Similarly a Disk1 
> storage system might make extra cache copies to help with load
> balancing.)

True, they might well do this (dCache certainly does under various 
conditions); but I'd say that this is purely an internal issue and shouldn't 
be published in GLUE.  I don't think we have any use-cases for publishing 
this information.

> I think the phraseology should be something like "a common 
> category of storage" (although maybe "category" still isn't the right
> word).

Well, maybe.  I personally find "category" is too vague and a somewhat 
circular definition.  I feel we need to be more precise than that here: these 
object classes must represent some clearly identifiable concepts if people 
are going to implement info-providers that are interoperable.

At the risk of sounding like a broken record: I'm currently understanding of 
StorageCapacity is as a light-weight view of some homogeneous storage, 
providing only the minimal amount of information needed within a certain 
context (StorageShare, ...).  I guess everyone else views Capacities as 
a "hack" to get around UML.

>   I'd also like to go back to the question I posed in one of the
> meetings ... say that a site implements Custodial/Online by ensuring
> three distinct disk copies, how would we represent that? What about
> mirrored RAID, how much space do we record?

Err, I don't see the problem here; they should report the numbers that make 
sense, no?  I guess I'm missing something...

Using RAID storage as a specific example, any RAID system (0,1,5,6,1+0, etc) 
is (ultimately) just a block-device, albeit one with some slightly odd 
properties.  The RAID system stores data as one or more blocks each of a 
fixed size, so the total is just nBlocks * sizeOf(Block).  The filesystem 
will exert some overhead, so the totalSpace reported in the StorageCapacity 
will likely be a smidgen less than this, but the correct value is easily 
discoverable: just do "df" on the filesystem.

For the case where three disks provide a RAID system with online latency and 
which a grid considers sufficient for custodial storage, I feel GLUE should 
report a single StorageCapacity for the three-disk-system.  The person in 
Timbuktu should neither know nor care that the online-custodial system is 
built from three disks in a RAID configuration or from some other technology.

>   Another thing is that I think there is some mission creep going on in
> the Capacity concept. When I suggested introducing it it was really as a
> complex data type, i.e. as an alternative to putting maybe 20 separate
> attributes into each object that can have a size you would effectively
> have one multivalued "attribute" with type "Capacity" rather than int.

Yes, but there is a slightly deeper question:  why do we find ourselves doing 
this?  Why do we have a StorageCapacity object class?

From the phone confersation, it seems Stephen you view this as simply a 
work-around because UML doesn't support complex data types (is that a fair 
summary?)

Not wishing to be seen promoting or defending UML particularly, but I suspect 
this is omission is deliberate: if one is modelling something that needs a 
complex data-type then that complex data-type *is* representing something.

> However, your descriptions suggest that you're thinking more in terms of
> a Capacity representing a real thing (a bunch of storage units) which
> indeed have sizes but may have other attributes too.

Yes, I'm currently thinking about this as a (light-weight) view of some 
physical storage.  It may be a view of all the storage those physical devices 
make available (e.g., under StorageEnvironment), or only a subset (e.g. under 
a StorageShare).

> That isn't necessarily a bad thing, but we should probably be clear in our
> minds about what we intend.

Yes, absolutely...  I agree this needs to be clear.

FWIW, I don't think there's any mission creep here; rather, what we've got is 
a more precise definition of what a StorageCapacity *is*.  The description is 
not extending the concept, but better defining it; so, rather than 
saying "its a bunch of numbers we might want to record", the document offers 
an explanation of why the object class exists.

> >   The context is determined by an association between the
> >   StorageCapacity object and precisely one other higher-level object.
>
> What was the decision about Shares for different VOs which share the
> same physical space? (I haven't really read all the mails yet so this
> may already be answered ... actually there is more on this further
> down.)

Is this not supported by representing this as different StorageMappingPolicies 
pointing to the same StorageShare?

There is even support for per-VO space-utilisation information through the 
StorageCapacity attached to the StorageMappingPolicy object.

> > | The underlying storage technology may affect which of the
> > | context-specific attributes are available.  For example,
> > tape storage
> > | may be considered semi-infinite, so the total and free
> > attributes have
> > | no meaning.  If this is so, then it affects all
> > StorageCapacity objects with
> > | the same underlying technology, independent of their context.
>
> I'm not quite sure what you're saying here. It seems to me that the
> schema itself should not be defining this

It doesn't define this: the (semi-) infinite tape is meant as an informative 
example where not publishing totalSize might make sense.

> I would still maintain that tape systems do in fact have a finite capacity
> at any given time so it isn't conceptually absurd (and "nearline" may not
> necessarily mean "tape" anyway")

Both are, in general, true.  However, from chatting with our tape people here:

	a. not all tape systems provide an easy (or sometimes, any) mechanism for 
discovering the current totalSize,

	b. some places have operation practise that they "just add more tapes" when 
the space looks like its filling up,

The argument for making totalSize optional is that a) sometimes it's 
impossible to discover, b) sometimes it's a meaningless concept.

> Individual Grids may wish to make their own decisions 
> about what to publish, and equally it seems possible that, say, dcache
> may decide not to publish something but Castor may. All the schema
> should do is say that the attributes are optional, but *if* they are
> published the meaning should be well-defined and common across all
> Grids/implementations/...

Yes, absolutely!

> (and maybe we also want a special value to mean quasi-infinite?)

Yes, we could.  My preference would be simply not publishing totalSize.  I 
think this is more in keeping with the model adopted elsewhere in GLUE: not 
publishing information where it doesn't make sense.

> > | that the underlying storage technology be homogeneous. Homogeneous
> > | means that the underlying storage technology is either identical or
> > | sufficiently similar that the differences don't matter.
>
> I think the real point is more that it's treated uniformly by the SRM
> (or other storage manager) - even if the differences do matter there
> won't be anything you can do about it if the SRM gives you no control
> over it! (e.g. to put your file on RAID 6 rather than RAID 0.)

True; although, ideally, the definition should go beyond current SRM protocol 
definition.

The RAID-6 vs RAID-0 is an interesting one.  Suppending disbelief for a 
moment, and suppose that the SE is configured so it labels RAID-6 as suitable 
Replica-Online and RAID-0 as suitable for Custodial-Online (whether anyone 
would do this is a separate question!).

Since the RAID-6 and RAID-0 storage have different management properties 
(Replica vs Custodial), they must be in different StorageEnvironments.  Since 
a StorageCapacity is associated with only one StorageEnvironment, the two 
RAID systems would be represented as two StorageCapacities --- both with 
Online latency (a property of their underlying storage) but as part of 
different StorageEnvironment.

If you like, because SRM can distinguish between them that they must be 
separated into two different StorageCapacity objects.

If both RAID devices were considered only good enough for Replica storage, 
both RAID systems could be represented within the same StorageEnvironment.  
They should be considered "sufficiently similar that the differences don't 
matter", so represented by a single StorageCapacity.

> >   A StorageEnvironment is a collection of one or more
> >   StorageCapacities with a set of associated (enforced) storage
> >   management policies.
>
> Hmm ... I could suggest that the Environment now also looks more like a
> data type than a real object (and is also rather SRM2-specific as it
> stands).

I guess all GLUE object classes are logical abstractions of something that 
(hopefully) makes sense.  So, a member of a UserDomain (an end-user) stores 
data in a StorageShare.  A StorageShare is some part of a StoreEnvironment 
(maybe all of).  A StorageEnvironment is built from one or more 
StorageCapacities (yes, I'm considering StorageCapacities as real objects 
here) and functions due to one or more StorageResources.

> And why are the attributes optional, 

I believe, in the actual draft spec., these are optional because they may not 
make sense within all grids.

> i.e. what would it mean if  one or both is missing?

I would imagine the answer would be grid specific.

An alternative would be to make the Retention policy mandatory, but as an open 
enumeration.

> Should there be an OtherInfo attribute?

Perhaps, yes.  Providing for it (probably) wouldn't hurt.

> What would we do for classic SEs, or SRB, or for that 
> matter SRM 1? 

I believe a classic SE has a single StorageEnvironment.

I don't know SRB well enough (perhaps Jens could comment?).

SRM1 doesn't understand spaces, so one could publish a "default space" the 
same size as the StorageEnvironment and link the SRMv1 interface to that 
space.

>   [What actually seems to have happened here is that things have
> gradually turned inside out.  We started with the SA as the main 
> representation of a set of hardware, with size, policy and ACL
> information embedded in it and subsequently with the VOInfo added as a
> dependent object. Now the size (Capacity), ACL (MappingPolicy) and
> VOInfo (Share) are getting carved out as separate objects with an
> independent "life" and most of the policy attributes have been
> obsoleted, so we're left with something that carries almost no
> information and a role which, to me at least, is not totally clear. I'm
> not saying there's anything wrong with this, but it may lead to
> misconceptions derived from trying to relate the Glue 2 objects to their
> Glue 1 equivalents.]

I think I've a fairly concrete idea of what a StorageEnvironment is.  In 
plain(-ish) language, its represents the complete ability to store data with 
certain management policies.  In general, it is built from combining hardware 
with different access latencies.  It has some implicit and explicit 
management policies that result in it being described as "custodial" 
or "volatile".

Users may be allocated some portion of this StorageEnvironment as a 
StorageShare.  A StorageEnvironment has a lifetime equal to or greater than 
its StorageShares.

> > | Examples of these policies are Type (Volatile, Durable, Permanent)
> > | and RetentionPolicy (Custodial, Output, Replica).
>
> Except that Type (or ExpirationMode) doesn't seem to be an attribute in
> the current draft ... what about other policies, e.g. the old schema had
> MinFileSize - if we ever wanted to implement such a thing would it go
> here?

(BTW, is this a though experiment, or an actual proposal?)

Mostly like MinFileSize would go in the StorageCapacity, but it depends on the 
actual use-case for recording this information.

If MinFileSize comes from a limitation of the underlying storage, then it 
should go in the corresponding StorageCapacity object.  If it is 
an "arbitrary" management policy, then it should go in StorageEnvironment.

> Conversely Latency isn't a policy, it's a feature of the hardware. 

True.  This is why i felt it belongs in the StorageCapacity and not the 
StorageEnvironment: the StorageEnvironment would necessarily be "nearline" if 
it has an attached StorageCapacity with nearline latency.

But, this wasn't accepted during the phone conference, so I can't give an 
answer to this.

> If we really want a Policy object should we call it that rather than
> Environment?

Well, maybe.  The names don't bother me too much, provided the concepts, 
attributes and relationships are very precisely defined.

One could say that: "the data is stored in an environment defined by the 
management policies and the underlying hardware."  So, perhaps 
StorageEnvironment isn't perfect, but it isn't too bad.

> > | In general, a StorageEnvironment may have one or more
> > | RetentionPolicy values.
>
> Not what it says in the current draft (0..1).

True ... I thought we had agreed that StorageEnvironment had 0..* 
multiplicity, but the docs don't seem reflect this.

> Does this correspond with 
> SRM usage, i.e. can you have spaces with multiple RPs?

From memory, I believe this was a request from Maartin: that a 
StorageEnvironment could have multiple RPs.  I'm not sure precisely why: 
perhaps indicating that a StorageShare may have different RPs, but this would 
be covered by the (potentially) one-to-many link between StorageShare and 
StorageEnvironment.

> > | GLUE does not record a default RetentionPolicy.
>
> Should it?

No.  Yes.  Who can say?  What use-cases result in us storing multiple RPs for 
a single StorageEnvironment?  Under those circumstances, do we need to record 
a primary/default?

> What about defaults for other things, e.g. ExpirationMode? 

I'm not sure having multiple ExpirationMode makes sense.  If, for some reason, 
two ExpirationModes need to be indicated, could this be done with two 
different StorageEnvironments?

> > | It is the associated StorageCapacities that allow a
> > | StorageEnvironment to store data with its advertised policies; for
> > | example, to act as (Permanent, Custodial) storage of data.
>
> But can you tell how that works, i.e. which Capacity serves which
> policy?
> This is another case where our mind tends to think Custodial -> 
> tape -> Nearline, but intrinsically it doesn't have to be like that.

Would dropping the "Permanent" from the above example (it shouldn't have been 
there) fix this problem?

If a StorageEnvironment is advertised as having a RetentionPolicy  Custodial 
(only) and has two StorageCapacity (a nearline one and an online one), would 
that be OK?

> > | Since a StorageEnvironment may contain multiple StorageCapacities,
> > | it may describe a heterogeneous environment.  An example of this is
> > | "tape storage", which has both tape back-end and disk front-end into
> > | which users can pin files.  Such a StorageEnvironment would have two
> > | associated StorageCapacities: one describing the disk storage and
> > | another describing the tape.
>
> But can you have more than one Capacity of the same type? (see the
> comments earlier).

This is currently an open question.  I believe most people feel the answer 
is "no" ("sufficiently similar").

> Anyway I think we removed the storage type from the 
> Capability so at the moment you can't really tell what it is.

Sorry, I think I may have suggested putting it back: I felt it didn't really 
sit well in the StorageEnvironment.  This comes back to the question of what 
is the StorageCapacity?  My personal feeling is "a hack to get around UML" is 
not a satisfactory answer ;-)

> Maybe we 
> should look back at the proposal for Storage Components made by Flavia,
> Maarten et al in the 1.3 discussion, or has someone already done that?

I'm not sure ... I don't think I've seen it.  Do you a copy somewhere?

> > | StorageCapacities associated with a StorageEnvironment must be
> > | non-overlapping with any other such StorageCapacity and the set of
> > | all such StorageCapacities must represent the complete storage
> > | available to end-users.
>
> Conceptually that may be true, but there's no guarantee that all of them
> are actually published.

True.  But, no such guarantees are required in the doc.

A site (or an information provider) may choose to publish everything 
(including unallocated space), or it may choose to publish only allocated 
space.

This might be an operational decision made by a grid.

> You could also wonder about space which is installed but not currently
> allocated to any VO ... 

Yes, but do we have a use-case for this?

> > | Nevertheless, the StorageCapacities associated with
> > | StorageEnvironments may be incomplete as a site may deploy physical
> > | storage devices that are not directly under end-user control; for
> > | example, disk storage used to cache incoming transfers.  GLUE makes
> > | no effort to record information about such storage.
>
> Actually part of my reason to introduce Capacity objects is that they
> can do just that if people want them to (as they may since it can be
> useful to know about cache usage). For such cases the CapacityType would
> be Cache, or maybe something else if you wanted to distinguish more than
> one kind of cache. As always there's no compulsion to publish that if
> you don't want it, but the schema makes it possible.

OK, but I think there are two specific things here:

 a) cache storage specifically for a StorageEnvironment

 b) general cache storage available to multiple StorageEnvironments

An example of a) is the disks that "front" a D0T1-like storage, and example of 
b) is the general cache for storing all incoming WAN transfers as they are 
being written to tape.

Whilst one could represent a) as a StorageCapacity (e.g., with type "cache"), 
one could not do so for b), as it is not exclusively part of any one 
StorageEnvironment.

When I said GLUE makes no effort to record information... this is specifically 
part b) ... disk cache that is common between multiple StorageEnvironments.

> > | GLUE makes no attempt to record which physical storage (as
> > | represented by StorageCapacity objects) is under control of which
> > | StorageResource.
>
> Should it?

Dono: what are the use-cases for it doing so?

> As it stands you might not care, but if you wanted to 
> consider monitoring use cases (whether the software is running at the
> most basic!)

True, and this is covered: software StorageResource(s) are available.

> it would probably be useful to know how that relates to the 
> actual storage.

Sure, I agree it might be interesting.  However, I don't think this is covered 
by any use-case / requirements.

> > StorageShare:
> >
> >   A StorageShare is a logical partitioning of one or more
> >   StorageEnvironments.
>
> Maybe I'm missing something, but how could you have more than one
> Environment for a single Share?

I think this isn't something useful to EGEE and it comes from one of the other 
grids.

> Certainly our current structure doesn't 
> allow it (one SA per many VOInfos but not vice versa), although as I
> said above that might be misleading.

Yes, I believe this was to allow a new, non-WLCG use-case.  I have a vague 
memory of Maarten mentioning this, but I could be wrong.

> > | The StorageCapacities within the StorageShare context need not
> > | describe all storage: the number of StorageCapacities associated
> > | with a StorageShare may be less than the sum of the number of
> > | StorageCapacities associated with each of the StorageShare's
> > | associated StorageEnvironments.
>
> Err, why? As always you may choose not to publish everything, but
> conceptually the space is all there somewhere ...

Well, as you say, you may choose not to publish everything, simply that.

A concrete example would be:

  Consider a StorageEnvironment (StorEnv1) that (using WLCG terminology) is 
D0T1.  When published, it two associated StorageCapacities: one with 
type="nearline" and one with type="cache" (which has online access latency).

  A grid (not WLCG) decides that they are not publishing the cache information 
for StorageShares.  This is because they don't want to record that level of 
detail for D0T1, they only want to record the actual tape usage.

  In this example, all StorageShares associated with the StorEnv1 would have 
only one StorageCapacity.  Each one would have type="nearline" and describe 
the tape usage of that StorageShare.

> > | A pair of StorageShares may be partially shared, that is, they have
> > | at least one pair StorageCapacities that are shared and at least one
> > | that is not.  Partially shared StorageCapacities could represent two
> > | UserDomain's access to a tape store, where they share a common set
> > | of disk pools but the tape storage is distinct.
>
> I'm not sure I like this bit. In general I would assume that storage
> (SAs in the current parlance) is either shared or not - allowing the
> disk part of a custodial/online space to be shared and the tape part not
> sounds rather weird to me, and I don't think that's how SRM works.

Perhaps, but custodial/nearline storage (D0T1) might have a shared diskpool 
for staged files.

> Do we 
> really have such cases? Bear in mind that the point is not about sharing
> the physical disks, but having a shared allocation (and for Disk1/Online
> permanent storage, not cache).

No, I'd say it's more about whether we describe the cache space.  If we do, 
then a site may choose to share its cache amongst all StorageShare objects.

With D1T1-like storage, it doesn't make sense to have a pair of partially 
shared StorageShares.

> If the system is guaranteeing to store, 
> say, 100 Tb on both disk and tape (custodial/online) there is no way it
> can do that if the disk part of the reservation is shared, and if it
> doesn't guarantee it overall then having a reserved tape pool is
> pointless, in general it would just mean that some tapes are unusable.

Yes, this is true for D1T1.  Partially shared StorageSpaces, if they have a 
place, are more for D0T1 with a shared staging area.

>   Another question, what do we do about hierarchical spaces? At the
> moment we at least have the case of the "base space" or whatever you
> call it from which the space tokens are reserved, and in future I
> believe we're considering being able to reserve spaces inside spaces.
> How could that be represented?

Currently I believe we can't store this information.

If we want to do this, it might be possible to do so by allowing a 
StorageShare to link to another StorageShare in place of a 
StorageEnvironment.  For example, one could create an abstract class 
StorageProvision (err, better name anyone?) that has two subclasses: 
StorageShare and StorageEnvironment.  A StorageShare is linked to a 
StorageProvision, providing a hierarchy of objects that may (should?) end 
with a StorageEnvironment.

> (There are also questions we've discussed 
> in the past about things like dynamic spaces and default spaces which
> tend to produce more heat than light :)

From what I've heard, this is true!  :-)

> > StorageMappingPolicy:
> >
> >   The StorageMappingPolicy describes how a particular UserDomain is
> >   allowed to access a particular StorageShare.
>
> Should we say how this relates to the AccessPolicy? (which doesn't seem
> to appear explicitly in either the Computing or Storage diagrams but is
> presumably there anyway.)

Well, I thought the idea was that we could get by without the access policies 
being published.  For example:

	(0.  An end-user is a member of a UserDomain)

	1.a A UserDomain has access to a StorageShare (discovered via a 
StorageMappingPolicy)
	1.b A user already knows the ID of the StorageShare
	1.c The user asks the StorageEndpoint for the StorageShare ID.

	2.  The StorageShare may have an associated StorageEndpoint.

	3.a if so, ask the StorageEndpoint what protocols are available.
	3.b if not, try each advertised AccessProtocol in user's order of preference.

> > | No member of a UserDomain may interact with a StorageShare except as
> > | described by a StorageMappingPolicy.
>
> As stated I don't think that can really be true, the SRM could
> potentially allow all kinds of things not explicitly published.

Sure, I've no problem changing this to a weaker statement, but...

> The things which should be true are that there is an agreed set of things
> (maybe per grid?) which are published, and that the published values
> should be a superset of the "real" permissions - i.e. the SRM may in
> fact not authorise me even if the published value says that it will, but
> the reverse shouldn't be true.

I think your example here doesn't contradict the above statement "no member of 
a UserDomain may interact with a StorageShare except as [...]".

If I've understood you correctly (with "published value should be a superset 
of the "real" permissions") then the SRM / TransferProtocols may choose not 
to honour the StorageMappingPolicy (e.g., ban certain users), but they won't 
allow some apparently random person in.

> > | The StorageMappingPolicies may contain information that is specific
> > | to that UserDomain, such as one or more associated
> > | StorageCapacities.  If provided, these provide a UserDomain-specific
> > | view of their usage of the underlying physical storage technology as
> > | a result of their usage within the StorageShare.
>
> I don't think I understand how this can be different from the Share to
> Capacity relation ...

We are present VO-specific information here.  The Share-Capacity relation 
provides Share-centric view of storage.  The MappingPolicy-Capacity relation 
(if present) provides Share-VO-centric view of storage, as needed by 
Nordogrid (iirc).

> if you are saying that the Share can be multi-VO 
> then I think something has gone wrong somewhere given that the Path and
> Tag can be VO-specific.

Yes, and the VO-specific Path and Tag *are* present in the 
StorageMappingPolicies for exactly this reason.

> In the 1.3 schema the whole point of the VOInfo 
> (which has become the Share) was to split out the information specific
> to each mapping policy (ACBR) from the generic information in the SA ...

Perhaps the assertion VOInfo has become the Share is not correct in this 
instance?

> > | The access policies describing which users of a UserDomain may use
> > | the StorageEndpoint are not published.
>
> Are you sure? (see comment above)

Currently, yes it's true: they're not published.  One may deduce them, in some 
(most) cases, but not always.  If no StorageSpaces are published, then the 
mapping cannot be deduced.

> >   A StorageAccessProtocol describes one method by which end-users may
> >   sent data to be stored, received stored data, or undertake both
> >   operations.
>
> sent -> send, received -> retrieve

Ta!

> > | Access to the interface may be localised; that is, only available
> > | from certain computers.  It may also be restricted to specified
> > | UserDomains.
>
> It might also only apply to certain storage components ...

"storage components" == StorageShares, right?

> Phew .. I spent over two hours writing that, I hope someone reads it :)

(I'm reminded of a story about a student going for his (so the story goes) PhD 
viva / defence.  The student walks into the room, places a bottle of 
champaign on the table and sits down.

After successfully completing the viva he stands up, picks up the bottle and 
is about to leave the room when one of the examiners asks about the bottle.  
The student referred the examiner to approx. half way through the thesis, 
which included a short paragraph saying "if you mention this paragraph you 
get to keep the champaign" :-)

Cheers,

Paul.