[GIR-WG] GIR-WG @ OGF22: transition to RG?

Greg Newby newby at arsc.edu
Mon Feb 25 07:21:48 CST 2008


Greetings from Boston!  The Grid Information Retrieval
group (GIR) will meet at OGF22:
  
  Monday
  February 25
  1:45 - 2:30
  Crispus Attucks room at the Cambridge Hyatt Hotel

Agenda: 
  1. IP statement; introductions 
  2. Review of activity since OGF21 
  3. Discussion of rechartering GIR as a research
     group (RG) rather than working group (WG) 
  4. Document status and progress 
  5. Implementation status and progress 
  6. Any other business

The revised charter is not live at www.ogf.org yet, but
I have one drafted for our consideration.  Your input
is welcome!

* Group name:
  Grid Information Retrieval RG

* Abbreviation:
  GIR-RG

* Area:
  Applications

* Brief group summary:

The GIR RG is focused on search functionality based on computational
grids.  Search, also known as information retrieval, is a
data-intensive activity intended to match documents (of all types) to
human information needs.

* Group focus & scope:

The GIR RG is focused on search functionality based on
computational grids.  The ability to distribute search
functions across computational grids is commonly used
to handle very large datasets, intensive search
capabilities, or large quantities of concurrent data
operations (queries, indexing, or data movement).

Search, also known as information retrieval, is a
data-intensive activity intended to match documents (of
all types) to human information needs.  While search
functionality is a critical aspect of many datasets for
many organizations and individuals, standardization has
been elusive.

The RG will perform research and develop interoperable
applications for search on grid-enabled and grid-like
systems.  This work is soundly based in current and
near-future state of the art information retrieval
systems, but with more emphasis on the relations among
different systems and techniques than is typical for
mainstream systems.

Drawing on GFD-I.027, GIR RG looks at grid-based search
as having three main elements: collection management,
indexing/searching, and query processing services to
grid users and applications.

* Exit strategy:

Either transition to a WG to develop standards for GIR, or a decision
to terminate research activities in grid-enabled search.

* Upcoming documents (not on prior charter)

Grid Information Retrieval Environmental Scan (Informational)

Enumeration and analysis of software, systems,
standards, usage and data types for search.  The
extent to which these activities are amenable to
search on computational grids will be examined, as
well as the potential for grid standardization.


Grid Information Retrieval with OGSA-DAI (Experimental)

Experience with performing information retrieval
activities with OGSA-DAI version 2.2.  The Lucene
system was utilized with .gov data from TREC, using
OGSA-DAI as middleware and Tomcat as the web services
interface.


Peer to Peer Grid Information Retrieval (Experimental)

Implementation of search for lightweight systems (such
as cell phones) in a peer to peer environment.  


* Seven questions: Answers to questions posted in GFD-C.034 for
group formation.

1. Is the scope of the proposed group sufficiently focused?

Answer: Yes.  Focus is on information retrieval, which has ties
to research & application areas such as databases, algorithms, storage,
human-computer interaction, and data mining.  But IR is a separate
discipline, as evidenced by conferences and associations (such as
ACM's SIG/IR), journals, textbooks and college courses.  IR is
a focused research area with a well-defined core deliverable
(relevant documents in response to user queries).


2. Are the topics that the group plans to address clear and relevant
for the Grid research, development, industrial, implementation, and/or
application user community?

Answer: Yes.  Focus is on three core functions of GIR, with attention
to security considerations (including access control) at each step.
First is query processing, including result set transport.  Second is
indexing and retrieval, which is the main IR system function.  Third
is collection management, which is the ingest of documents for
indexing.

These three areas are described in GFD-I.021, "Grid Information
Retrieval Requirements."


3. Will the formation of the group foster (consensus-based) work that
would not be done otherwise?

Answer: Yes.  There is no other logical gathering point for IR focused
on computational grids.  Attempts at IR forums such as SIG/IR and TREC
to get more IR scholars working with grid-enabling IR systems have not
yielded many new GIR group members.  While IR is well-understood
(though not a solved problem, by any means), grid-enabling IR systems
is challenging and requires focus on grid characteristics.


4. Do the group's activities overlap inappropriately with those of
another GGF group or to a group active in another organization such as
IETF or W3C?

Answer: No.  There is no standardization of IR in IETF, ISO, W3C
or elsewhere.  The main standardization of IR has happened historically
through the U.S. Library of Congress.  Z39.50 is an older standard,
which is cited in GFD-I.021.  Contemporary work is on a newer standard
and working group in the LoC, called SRW.  The GIR chairs are well
connected to this work, and have built on it as appropriate.


5. Are there sufficient interest and expertise in the group's topic,
with at least several people willing to expend the effort that is
likely to produce significant results over time?

Answer: Yes.  The three chairs, along with some past chairs and
collaborators, have done significant work.  Attendance at GIR sessions
at GGF/OGF meetings from GGF5 through OGF21 has included many people
interested in GIR, though few have stayed to contribute to documents.


6. Does a base of interested consumers (e.g., application developers,
Grid system implementers, industry partners, end-users) appear to
exist for the planned work?

Answer: Yes.  The market for GIR is insatiable, though the need for
standards is not as clear.  Google's system is very much a GIR system,
and grid-based IR is implemented by numerous commercial systems, but
without following a standard.  A goal for the GIR group is to foster
an Apache module for IR, to grid IR-enable millions of Web servers.
Search is, and will be, very important.  GIR is key to enabling
standards-based approaches to search on the computational grid.


7. Does the GGF have a reasonable role to play in the determination of
the technology?

Answer: Yes.  OGF offers a rallying point for sharing information about
grid-related work.  Publications and software related to GIR's efforts
will be well-served by being centrally located within OGF.


  -- Greg

Dr. Gregory Newby, Chief Scientist of the Arctic Region Supercomputing Center
Univ of Alaska Fairbanks-909 Koyukuk Dr-PO Box 756020-Fairbanks-AK 99775-6020
e: newby AT arsc.edu v: 907-450-8663 f: 907-450-8603 w: www.arsc.edu/~newby



More information about the gir-wg mailing list