[communities] GGF Proposal Submission

pkovatch at sdsc.edu pkovatch at sdsc.edu
Wed Nov 30 19:06:52 CST 2005


proposers_name: Patricia Kovatch 
 
affiliation: San Diego Supercomputer Center 

email: pkovatch at sdsc.edu 

proposed_title: eraGrid\'s Terabyte Moving Machines 

session_type: Tutorial 

proposed_duration: Half Day 

target_audience: Anyone who is interesting in learning how to build a robust, high performance data 

num_attendees: 30 

abstract: Sharing Terabyte-sized data sets across the geographically distributed
resources of the TeraGrid posed certain challenges.  Using standards-
based grid tools, tuning and scripting expertise along with a grid-enabled
wide area file system, TeraGrid set up a rich environment for data movement.
This tutorial will explain the data infrastructure, user tools, performance
tuning, policy issues and future plans.  It will also present case studies of
applications making use of the environment. 

synopsis: Session Goals:

Attendees will learn how to build and tune a grid with a rich data infrastructure using standards-based grid tools.

Outline:

The TeraGrid Project (15 mins)
        History
        Sites
        Compute and Instrument Resources
        The TeraGrid Network
        Application motivation
        Data-oriented resource map of TG

Standards-based TeraGrid Data Resources and User Tools
GridFTP (1 hour)
        GridFTP dedicated data transfer nodes
        GridFTP performance capabilities (multiple threads, striped)
        TeraGrid CoPy (tgcp)
        Reliable File Transfer (RFT)
        Other data transfer tools
        Batch moving of data
        Bandwidth delay product
        Performance tuning
        Interoperability testing
        Lessons Learned

Grid-enabled wide area parallel file systems (GPFS, PVFS, Lustre) (1 hour)
        Metadata and dedicated hardware considerations
        Cluster authentication and authorization
        Grid-enabled user identification and UID/GID Mapping
        Bandwidth delay product
        Performance tuning
        Co-scheduling of resources
        Interoperability testing
        Lessons Learned

Archival storage and data collection management (15 mins)
        Archival storage (Unitree, HPSS, SAMQFS, TSM, DXUL)
        Archival storage clients (uberftp, hsi, others)
        Performance tuning
        Data Collection Management Servers (SRB/RLS)
        Data Collection Management (scopy, etc.)
        GridFTP interface to SRB

Database offerings (15 mins)
        Database client access

User Considerations (15 mins)
        Common TeraGrid Software Stack (CTSS) and Environment Variables
        When should I use which tool or approach?

Monitoring (15 mins)
        System Tools
        Diagnosing problems
        Monitoring
        Inca Test Harness

Policy Issues (15 mins)
        Policies per site and how to use the policy command
        Grid-enabled wide are file systems - allocations, quotas, purging
        Data Allocation Committee
        CTSS modification procedures

Case Studies/Usage Scenarios (15 mins)
        BIRN
        ENZO
        NVO
        SCEC

The Future (15 mins)
        GridFTP Futures
        Grid-enabled wide area parallel file system enhancements
        Off-site backups for disaster recovery
        Batch transfer of data
        Grid Data Portals
        Metascheduling, co-scheduling and data workflow for data 

tech_requirements: None. 

prereq_participants: Basic understanding of grid technologies. 

advertise_suggestion: Email lists, web pages, word of mouth 





More information about the communities mailing list