From hiltunen at research.att.com Thu Nov 1 13:54:26 2007 From: hiltunen at research.att.com (Matti Hiltunen) Date: Thu, 01 Nov 2007 14:54:26 -0400 Subject: [gridrel-rg] Meeting of the Reliability and Robustness Research Group at OGF21 In-Reply-To: <4.3.1.2.20071024164031.01917158@email.nist.gov> References: <4.3.1.2.20071024164031.01917158@email.nist.gov> Message-ID: <472A20E2.1030105@research.att.com> Chris, Here are some papers on the analysis of grid reliability: *Y.S. Dai* and M. Xie, ?Hierarchical Markov Reward Model for Availability Analysis and Optimization of the Grid Computing System?, /Fourth International Conference on Mathematical Methods in Reliability (MMR04)/, June 21-25, 2004, Santa Fe, USA. Y.S. Dai, M. Xie and K.L. Poh, ?A Fast Algorithm for Grid System Reliability?, / Regional Inter-University Electrical & Electronic Engineering Conference/ / 2003 /(/RIUPEEEC 2003/), Hong Kong. Y.S. Dai, M. Xie, K.L. Poh,* ?*Reliability Analysis of Grid Computing Systems?,/ 2002 Pacific Rim International Symposium on Dependable Computing/ (/PRDC2002/)/,/ IEEE Computer Press, pp. 97-103,/ /2002, Japan. Matti Christopher Dabrowski wrote: > Dear all, > > On October 17, there was a meeting of the Reliability and Robustness > Research > Group at OGF21. At this meeting we reviewed the draft OGF informational > document titled *Reliability in Grid Computing Systems, *which is > intended to be > the primary output of the RG. The draft summarizes the state of > current work on > Grid system reliability and describes requirements for capabilities > needed to ensure > high levels of reliability in current and future large-scale grid > systems. The draft is > based on work presented at two earlier workshops (GGF16 in Athens and > OGF19 in > Chapel Hill) and includes a substantial amount of additional work that > many of us have > identified as being relevant. > > At this point, the informational document is scheduled for > finalization by February of > next year. A shorter version of the document is planned for submission > to a special issue > of a journal publication dedicated to OGF work--by the end of this > year. Since the time frame > is short, we request that you provide comments/review of the posted > draft informational > document by November 2. (To obtain a copy, please see below). > > At the meeting, different sections of the document were discussed. > Matti Hiltunen > volunteered to propose revisions to the deflations section and > possibly portions of the > introduction. Dominic Battre also agreed to provide a relevant > reference for the document, > which he has kindly provided. If anyone else has comments and/or > contributions, they > would be most welcome. > > The draft and copy of a powerpoint presentation given at the meeting > are posted on the > RG grid forge web site* or can be obtained upon request. > > Sincerely, > Chris Dabrowski. > > *Please see https://forge.gridforum.org/sf/projects/gridrel- > rg > . For the draft > informational document, > go to section labeled "documents". For the presentation go to "Meeting > Materials." > > > Christopher Dabrowski > National Institute of Standards and Technology > 100 Bureau Drive, Stop 8970 > Gaithersburg, MD 20899-8970 > Phone: +1 301 975-3249 > FAX: +1 301 948-6213 > cdabrowski at nist.gov > ------------------------------------------------------------------------ > > _______________________________________________ > gridrel-rg mailing list > gridrel-rg at ogf.org > http://www.ogf.org/mailman/listinfo/gridrel-rg > From hiltunen at research.att.com Thu Nov 1 14:52:16 2007 From: hiltunen at research.att.com (Matti Hiltunen) Date: Thu, 01 Nov 2007 15:52:16 -0400 Subject: [gridrel-rg] Meeting of the Reliability and Robustness Research Group at OGF21 In-Reply-To: <4.3.1.2.20071024164031.01917158@email.nist.gov> References: <4.3.1.2.20071024164031.01917158@email.nist.gov> Message-ID: <472A2E70.9000902@research.att.com> Chris, Here are some comments on the report. 1. Grid computing evolved from computing, distributed computing, cluster computing, cycle scavenging (Condor), and meta/heterogeneous computing. Each of these grid predecessors developed reliability methods and techniques that are directly or indirectly still used today in grid computing. Therefore, I believe the document should point out the origins of each of the reliability techniques. I have attached a paper from 1984 that surveys fault tolerance techniques starting from early 1960s. I believe document that points out the origins of the techniques would not only be fair but also more useful for grid practitioners. As a side note, we did start the discussion on if there is anything specific to grid computing reliability (compared to what has been done before). Some of the candidates involved scale, heterogeneity, multiple administrative domains, etc. Such discussion would be useful addition to the paper. Also, it would be interesting to investigate if the papers with "grid" in their title actually address the new issues introduced by grid computing. 2. I would like to have a clearer separation between the different layers (hardware+OS, grid "middleware", and grid applications) carried throughout the paper. The general concept of "grid resource" makes it harder to talk about reliability techniques specifically. 3. There are different types of grid applications (relatively independent jobs, parallel MPI-type applications, interactive/batch, maybe "services"). Some of the specific reliability techniques only apply to one/some of these application types. The presentation could be clarified by structuring the techniques based on application types. Data grids already have their own section and I think that is good. 4. I'm not convinced about sections 5 and 6. Especially metrics/reliability analysis could be just one of the subsections (since work has been done in the area). Matti Christopher Dabrowski wrote: > Dear all, > > On October 17, there was a meeting of the Reliability and Robustness > Research > Group at OGF21. At this meeting we reviewed the draft OGF informational > document titled *Reliability in Grid Computing Systems, *which is > intended to be > the primary output of the RG. The draft summarizes the state of > current work on > Grid system reliability and describes requirements for capabilities > needed to ensure > high levels of reliability in current and future large-scale grid > systems. The draft is > based on work presented at two earlier workshops (GGF16 in Athens and > OGF19 in > Chapel Hill) and includes a substantial amount of additional work that > many of us have > identified as being relevant. > > At this point, the informational document is scheduled for > finalization by February of > next year. A shorter version of the document is planned for submission > to a special issue > of a journal publication dedicated to OGF work--by the end of this > year. Since the time frame > is short, we request that you provide comments/review of the posted > draft informational > document by November 2. (To obtain a copy, please see below). > > At the meeting, different sections of the document were discussed. > Matti Hiltunen > volunteered to propose revisions to the deflations section and > possibly portions of the > introduction. Dominic Battre also agreed to provide a relevant > reference for the document, > which he has kindly provided. If anyone else has comments and/or > contributions, they > would be most welcome. > > The draft and copy of a powerpoint presentation given at the meeting > are posted on the > RG grid forge web site* or can be obtained upon request. > > Sincerely, > Chris Dabrowski. > > *Please see https://forge.gridforum.org/sf/projects/gridrel- > rg > . For the draft > informational document, > go to section labeled "documents". For the presentation go to "Meeting > Materials." > > > Christopher Dabrowski > National Institute of Standards and Technology > 100 Bureau Drive, Stop 8970 > Gaithersburg, MD 20899-8970 > Phone: +1 301 975-3249 > FAX: +1 301 948-6213 > cdabrowski at nist.gov > ------------------------------------------------------------------------ > > _______________________________________________ > gridrel-rg mailing list > gridrel-rg at ogf.org > http://www.ogf.org/mailman/listinfo/gridrel-rg > -------------- next part -------------- A non-text attachment was scrubbed... Name: 01676390.pdf Type: application/pdf Size: 4719033 bytes Desc: not available Url : http://www.ogf.org/pipermail/gridrel-rg/attachments/20071101/8bb56b64/attachment-0001.pdf