[gridrel-rg] Meeting of the Reliability and Robustness Research Group at OGF21

Matti Hiltunen hiltunen at research.att.com
Thu Nov 1 14:52:16 CDT 2007


Chris,

Here are some comments on the report.

1. Grid computing evolved from computing, distributed computing, cluster 
computing, cycle scavenging (Condor), and meta/heterogeneous computing. 
Each of these grid predecessors developed reliability methods and 
techniques that are directly or indirectly still used today in grid 
computing. Therefore, I believe the document should point out the 
origins of each of the reliability techniques. I have attached a paper 
from 1984 that surveys fault tolerance techniques starting from early 
1960s.

I believe document that points out the origins of the techniques would 
not only be fair but also more useful for grid practitioners.

As a side note, we did start the discussion on if there is anything 
specific to grid computing reliability (compared to what has been done 
before). Some of the candidates involved scale, heterogeneity, multiple 
administrative domains, etc. Such discussion would be useful addition to 
the paper. Also, it would be interesting to investigate if the papers 
with "grid" in their title actually address the new issues introduced by 
grid computing.

2. I would like to have a clearer separation between the different 
layers (hardware+OS, grid "middleware", and grid applications) carried 
throughout the paper. The general concept of "grid resource" makes it 
harder to talk about reliability techniques specifically.

3. There are different types of grid applications (relatively 
independent jobs, parallel MPI-type applications, interactive/batch, 
maybe "services"). Some of the specific reliability techniques only 
apply to one/some of these application types. The presentation could be 
clarified by structuring the techniques based on application types. Data 
grids already have their own section and I think that is good.

4. I'm not convinced about sections 5 and 6. Especially 
metrics/reliability analysis could be just one of the subsections (since 
work has been done in the area).

    Matti

Christopher Dabrowski wrote:
> Dear all,
>
> On October 17, there was a meeting of the Reliability and Robustness 
> Research
> Group at OGF21.  At this meeting we reviewed the draft OGF informational
> document titled *Reliability in Grid Computing Systems, *which is 
> intended to be
> the primary output of the RG. The draft summarizes the state of 
> current work on
> Grid system reliability and describes requirements for capabilities 
> needed to ensure
> high levels of reliability in current and future large-scale grid 
> systems. The draft is
> based on work presented at two earlier workshops (GGF16 in Athens and 
> OGF19 in
> Chapel Hill) and includes a substantial amount of additional work that 
> many of us have
> identified as being relevant.
>
> At this point, the informational document is scheduled for 
> finalization by February of
> next year. A shorter version of the document is planned for submission 
> to a special issue
> of a journal publication dedicated to OGF work--by the end of this 
> year. Since the time frame
> is short, we request that you provide comments/review of the posted 
> draft informational
> document by November 2. (To obtain a copy, please see below).
>
> At the meeting, different sections of the document were discussed. 
> Matti Hiltunen
> volunteered to propose revisions to the deflations section and 
> possibly portions of the
> introduction. Dominic Battre also agreed to provide a relevant 
> reference for the document,
> which he has kindly provided. If anyone else has comments and/or 
> contributions, they
> would be most welcome.
>
> The draft and copy of a powerpoint presentation given at the meeting 
> are posted on the
> RG grid forge web site* or can be obtained upon request. 
>
> Sincerely,
> Chris Dabrowski.
>
> *Please see https://forge.gridforum.org/sf/projects/gridrel- 
> <https://forge.gridforum.org/sf/projects/gridrel-rg>rg 
> <https://forge.gridforum.org/sf/projects/gridrel-rg>. For the draft 
> informational document,
> go to section labeled "documents". For the presentation go to "Meeting 
> Materials."
>
>
> Christopher Dabrowski
> National Institute of Standards and Technology
> 100 Bureau Drive, Stop 8970
> Gaithersburg, MD 20899-8970
> Phone: +1 301 975-3249
> FAX:      +1 301 948-6213
> cdabrowski at nist.gov
> ------------------------------------------------------------------------
>
> _______________________________________________
> gridrel-rg mailing list
> gridrel-rg at ogf.org
> http://www.ogf.org/mailman/listinfo/gridrel-rg
>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01676390.pdf
Type: application/pdf
Size: 4719033 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/gridrel-rg/attachments/20071101/8bb56b64/attachment-0001.pdf 


More information about the gridrel-rg mailing list