[communities] GGF Proposal Submission

adm35 at georgetown.edu adm35 at georgetown.edu
Fri Aug 12 12:31:52 CDT 2005


proposers_name: Arnie Miles 
 
affiliation: Georgetown University 

email: adm35 at georgetown.edu 

proposed_title: Globus and OGSA-DAI in practice: A report from Georgetown University on participation in the National Cancer Institute Center for Bioinformatics caBIG project Architecture workspace, with a live demonstration of the caBIG grid. 

session_type: Individual with Demo 

proposed_duration: 60 minutes 

target_audience: Managers and Technical Experts.  Discussion will be geared towards the life sciences, but the impact of the technologies used applies to a wider audience. 

num_attendees: 30? 

abstract: The caBIG project of the National Cancer Institute Center for Bioinformatics is an effort to establish a data and analytical grid between several dozen cancer centers around the country.  Georgetown’s participation in the project has been multifaceted, and this presentation will cover three things.  First, we will discuss our experience in the ‘cross-cutting’ architectural workspace generally.  Then, the architecture of the production reference will be covered, focusing on how and why ‘off-the-shelf’ products like Globus and OGSA-DAI were selected as infrastructure elements.  Finally, we will detail the work done to set up and run a grid-node, and using the grid-enabled caArray application, we will submit live queries to the grid, demonstrating various ways it can be used be scientists right now. 

synopsis: The cancer Biomedical Informatics Grid, or caBIG – sponsored by the National Cancer Institute\'s Center for Bioinformatics. will enable the sharing of cancer research data and tools by connecting individuals and institutions in the grid space. The caBIG goal is to speed the delivery of innovative approaches for the prevention and treatment of cancer. The infrastructure and tools created by caBIG also have broad utility outside the cancer community. 

In this proposed one hour presentation, we will discuss the current state of all caBIG project grid work, with an emphasis (demonstration) on the caARRAY medical application. 

As a member of the caBIG Architectural Working Group, GU is the initial adopter of the caGRID tools for the microarray database caArray.  These tools add the ability to address objects to the standard OGSA-DAI framework.   Georgetown University has now successfully connected two instances of caArray together using the caGRID tools, and will be demonstrating this technology, as well as discussing the path to this success and the lessons learned to the Architectural Working Group on August 17.  

The GU team is one of the first adopters of the new National Cancer Institute caGRID 0.5, and has connected a local instance of the Microarray database caARRAY to an instance at the National Cancer Institute. We have also connected the Proteomics resource PIR to caGRID 0.5, and are working on connecting this microarray technology to clinical trials and proteomics databases on the caGRID.

GU’s  Advanced Research Computing team has been selected to create the new grid portal, which will allow researchers to consume all the grid-enabled resources that NCI, and all other caBIG members are creating and connecting to caGRID.

Arumani Manisundaram is the project manager for the Architectural Working Group, and will be working closely with Arnie Miles, project manager for Georgetown\'s participation in caBIG and Colin Freas, database programmer to present an overview of the current state of caGRID.  This discussion will briefly recap the presentation Arumani made at GGF14, to refresh the memories of those who were attendance then, and build upon his presentation to bring the audience up to date with what is the premier grid project in the life sciences world today.  

We will discuss how Globus and OGSA-DAI have been employed and extended to allow the advertisement of objects to a grid framework.  Prior to NCI\'s involvement, OGSA-DAI was only able to expose relational databases and flat files, the sharing of objects on the grid is an important addition to the grid world.  All data sources in caBIG are based on object models.  

We will continue to discuss some of the more esoteric requirements of standing up a complex grid environment, including semantics and identifiers issues, and how NCI has addressed these issues.  To be able to share data across such a diverse environment, it is necessary to agree upon a language to describe our objects, as well as a thesaurus to serve as a reference.  A mechanism also has to be in place to identify objects, discovering multiple instances of an object accurately, with associated provinance.

A demonstration of the state of caGRID at that moment in time will be given.  Development is steady and on-going, so it\'s hard to predict exactly what will be demonstrated, but at a minimum we will show for the first time outside the caBIG project a demonstration of the grid-enabled caARRAY application for exposing microarray experiments.

No presentation of this nature would be complete unless it included  a summary of lessons learned in the process.  There have been many stumbling blocks along the way, and we will close our presentation with a discussion of some of the obstacles that would apply in the larger grid world.  

This presentation will be relevant to anyone working in medical research, but the importance of this extends far beyond medical research. The caBIG model used in the creation of this massive grid project directly should relate to any organization with widely disparate data sharing needs.

Our presentation will be sufficiently lay to keep the attention of non-technical management who have data problems to solve, while having sufficient content to engage technical experts who want to satisfy data grid needs. 

tech_requirements: Internet access and data projector. 

prereq_participants: None 

advertise_suggestion: This will be openly discussed on the caBIG mailing lists, and could be posted to Bioinformatics lists. 





More information about the communities mailing list