[saga-rg] Two more use cases from LSU

Andre Merzky andre at merzky.net
Mon Jul 4 09:24:53 CDT 2005


Zhou, Andrei, 

thanks for the use cases.  I included them into the UseCase
document.  We will consider them as we start with the
requirements doc (really soon now... :-P )

Thanks again, 

  Andre.


Quoting [Andrei Hutanu] (Jun 26 2005):
> 
> Hi!
> 
> For your consideration, two late (hopefully not too late)
> use cases for SAGA from CCT, one from the UCoMS project,
> and one for interactive visualization.
> 
> Andrei

> SAGA Use:
> =======================
> 
> Name of use case:   UCoMS Project
> Contact (name and address):    
> 	Zhou Lei (zlei at cct.lsu.edu)
> 	Andrei Hutanu (ahutanu at cct.lsu.edu)
> Authors (if different form contact) : 
> 	Zhou Lei (zlei at cct.lsu.edu)
> 
> 
> 1. General Information:
> -----------------------
> 
>   This section consists of check-boxes to provide some context in
>   which to evaluate the use case.
>   
>   1.1 Which best describes your organization:
>   
>     Industry                     [ ]
>     Academic                     [x]
>     Other                        [ ]
>        Please specify:           ...................................
>                        
>                        
>   1.2 Application area:    
>                        
>     Astronomy                    [ ]
>     Particle physics             [ ]
>     Bio-informatics              [ ]
>     Environmental Sc.            [ ]
>     Image analysis               [ ]
>     Other                        [x]
>        Please specify:           Petroleum engineering applications 
>   
>   
>   1.3 Which of the following apply to or best describe this use case
>       Multiple selections are possible, please prioritize with numbers
>       from 1 (low) to 5 (high):
>   
>     Database                     [ ]
>     Remote steering              [2]
>     Visualization                [4]
>     Security                     [4]
>     Resource discovery           [5]
>     Resource scheduling          [5]
>     Workflow                     [5]
>     Data movement                [5]
>     High Throughput Computing    [ ]
>     High Performance Computing   [5]
>     Other                        [ ]
>         Please specify:          ...................................
>   
>   
>   1.4 Are you an:
>   
>     Application user             [ ]
>     Application developer        [ ]
>     System administrator         [ ]
>     Service developer            [ ]
>     Computer science researcher  [x]
>     Other                        [x]
>         Please specify:          Middleware developer
> 
>         
> 2. Introduction:
> ----------------
>   
>   2.1 Provide a paragraph introduction to your use case.  Background
>       to the project is another alternative. (E.g. 100 words).
> 
>   The UCoMS research aims are to develop and deploy a Ubiquitous Computing and Monitoring System (UCoMS) for oil/gas exploitation and management. Being a nationally unique research cluster in IT for energy, UCoMS addresses key research issues in the areas of wireless network systems, grid computing, and application software. It includes three cohesive and interrelated areas of projects which together enable the construction of a useful UCoMS prototype for discovery and management of energy resources. The UCoMS proof-of-concept prototype will be developed and deployed utilizing several existing (possibly decommissioned) well platforms in the Gulf of Mexico. The technical solutions will be generally applicable to sensor networks, wireless communications, and grid computing. These solutions will effectively facilitate drilling and operational data logging and processing, on-platform information distribution and displaying, infrastructure monitoring/intrusion detection, seismic processing and inversion, and management of complex surface facilities and pipelines. Decommissioned well platforms can be monitored and safeguarded using UCoMS, with a potential of fostering new industries as well in the future.
> 
>   2.2 Is there a URL with more information about the project ?
> 
>   http://www.ucoms.org
>   
>   
> 3. Use Case to Motivate Functionality Within a Simple API:
> 
> A general application in our use case includes:
> 	Computational resource brokering
> 	Data collection from largely-distributed data sources
> 	Task migration and farming 
> 	Result attainment
> 	Visualization
> We need simple API for the following operations: stage executable, find data location, describe resources, visualization. A simple API will be the interface to resource brokering, data collection, task migration, and visualization on devices. All these processes are executed automatically. 
> 
> Also, we may need a simple API to manage workflow operations. 
> 
>   
> 4. Customers:
> -------------
> 
>   Describe customers of this use case and their needs. In particular,
>   where and how the use case occurs "in nature" and for whom it occurs.  
>   E.g. max 40 words
> 
>   Energy companies, energy analysts, government agency
>   
> 
>   
> 5. Involved Resources:
> ----------------------
> 
>   5.1 List all the resources needed: e.g. what hardware, data,
>       software might be involved.
> 
> 	Hardware involved in:
> 	a. Large-scale compute facilities
> 	b. Data storage system
> 	c. Visualization devices for good graphics support
> 	d. Wireless network and sensors network
> 
> 	Software:
> a.GAT
> b.Globus toolkit
> c.Storage management, such as CCT Archive tool based on GAT or SRB.
> d.Petroleum engineering simulation software, such as UTChem, IPARS, VIP, and data processing, such as Seismic Unix and Delivery.
> e.Task farming using CACTUS (or Condor)
> f.Triana for workflow
> 
>   
>   5.2 Are these resources geographically distributed?
> 
>   	Absolutely. All parts of the system, such as storage system, compute facilities, and visualization devices, are distributed.
> 
>   
>   5.3 How many resources are involved in the use case?  E.g. how
>       many remote tasks are executing at the same time?
> 
>   Usually, there are three kinds of resources involved in: compute facilities, sensor network, data storage system, and visualization devices. A large number of tasks could be executed simultaneously.
> 
>   
>   5.4 Describe your codes and tools: what sort of license is
>       available, e.g. open or closed source license; what sort 
>       of third party tools and libraries do you use, and what is 
>       their availability;  do you regularly work from source 
>       code, or use pre-compiled applications; what languages 
>       are your applications developed in (if relevant), e.g. 
>       Fortran, C, C++, Java, Perl, or Python.
> 
> In the project, C/C++/Fortran are involved in compute-intensive data analysis. XML is used for data description. Java/Perl are used for system management and monitoring. OpenGL and VRML will be used for visualization and rendering.
> 
> Some of the simulation software packages are open source, such as UTChem, while others require licenses, such as VIP. All middleware that will be used is open source.
> 
>   
>   5.5 What information sources do you require, e.g. certificate
>       authorities, or registries.
> 
> Because of spatial variability, GIS based information management is needed. Seismic data and well log are critical. GSI-based simple CA is used. Since security of data and results is crucial, certificates are important.
> 
>   
>   5.6 Do you use any resources other than traditional compute 
>       or data resources, e.g. telescopes, microscopes, medical 
>       imaging instruments.
> 
>  Yes, because we include IBW, BGM for sensor/wireless devices. 
> 
>   
>   5.7 How often is your application used on the grid or grid-like 
>       systems?
> 
>       [x] Exclusively
>       [ ] Often (say 50-50)
>       [ ] Occasionally on the grid, but mostly stand-alone
>       [ ] Not at all yet, but the plan is to.
> 
> 
> 
>   
> 6. Environment:
> ---------------
> 
>   Provide a description of the environment your scenario runs in,
>   for example the languages used, the tool-sets used, and the user
>   environments (e.g. shell, scripting language, or portal).
> 
> a.C/C++/Fortran for reservoir and drilling simulation, for well logging and processing, and for seismic processing and inversion.
> b.Java/shell/perl/xhtml/portal for information display by browser. It should be web-service based.
> c.VRML and OpenGL for visualization
> d.All kinds of computational Grid middleware, such as GAT, Condor-G, Globus toolkit, Triana, and so on, are used.
> e.Development IDE may be eclipse for Java and C/C++.
> f.Commercial petroleum engineering simulation software.
> 
> 
>   
> 7. How the resources are selected:
> ----------------------------------
> 
>   7.1 Which resources are selected by users, which are inherent 
>       in the application, and which are chosen by system
>       administrators, or by other means?  E.g. who is specifying 
>       the architecture and memory to run the remote tasks?  
> 
>  Some resources, such as large-scale compute facilities and data storage system, are inherent in our project. Others, such as for system management, depend on applications.
>   
>   
>   7.2 How are the resources selected? E.g. by OS, by CPU power,
>       by memory, don't care, by cost, frequency of availability
>       of information, size of datasets?
> 
> CPU power, memory, and size of datasets are critical.
>   
>   
>   7.3 Are the resource requirements dynamic or static?
> 
>   Basically, resource requirements are static. It is decided in advance what resources a particular task will require.
>   
> 
>   
> 8. Security Considerations:
> ---------------------------
> 
>   8.1 What things are sensitive in this scenario: executable code, 
>       data, computer hardware?  I.e. at what level are security 
>       measures used to determine access, if any?
> 
>   	Commercial level security.
> 
>   
>   8.2 Do you have any existing security framework, e.g. Kerberos 5,
>       Unicore, GSI, SSH, smartcards?
>   
> 	GSI and SSH
>   
>   
>   8.3 What are your security needs: authentication, authorization,
>       message protection, data protection, anonymisation, audit 
>       trail, or others?
> 
> 	Mostly, authentication is our first concern. We also need authorization, and data protection.
>   
>   8.4 What are the most important issues which would simplify your
>       security solution?  Simple API, simple deployment, integration
>       with commodity technologies.  
> 
> 	Simple API, deployment, and integration with commodity technologies.
>   
> 
>   
> 9. Scalability:
> ---------------
> 
>   What are the things which are important to scalability and to what
>   scale - compute resources, data, networks ?
> 
>   Compute resources, network, and data.
> 
> 
>   
> 10. Performance Considerations:
> -------------------------------
> 
>   Explain any relevant performance considerations of the use case.
> 
> High performance for data processing, such as seismic processing & inversion, reservoir simulation, real-time drilling control, is very important, which requires high performance compute facilities and high-speed access to data storage.
> 
> 
>   
> 11. Grid Technologies currently used:
> -------------------------------------
> 
>   If you are currently using or developing this scenario, which grid
>   technologies are you using or considering?
> 
>   Firstly, we use GAT over Globus toolkit, Condor, and other data management services.
>   Secondly, some grid middleware, such as VDT, SBR, DRMAA, IBP, NWS/Ganglia, would like to use with GAT.
> 
>   
>   
> 12. What Would You Like an API to Look Like?
> --------------------------------------------  
>   
>   Suggest some functions and their prototypes which you would like
>   in an API which would support your scenario.
> 
>   Here is a scenario with a pseudo code:
> 	// for data collection
> 	char **sourceData = {???protocol://hostname:000/data1???,
> 				???protocol://hostname2:000/data2???,
> 				???protocol://hostname3:000/data???
> 				}
> 	DataLocation dl = provisionData(sourceData);
> 	JobDescription job = createJobInstance(hardwareRequirement,
> 						softwareRequirement,
> 						executable_Location,
> 						dl,
> 						output_location,
> 						errorno)
> 	// automatically resource brokering and data moving.
> 	SubmitJob(job);
> 
> 	// Maybe there is job checkpointing and migration in background.
> 
> 	// optional for visualization
> 	PostProcessForRender(output_location);  
> 
> 
>   Also, we maybe need workflow control using simple API.
> 
>   
> 13. References:
> ---------------
> 
>   List references for further reading.
> 
>   	http://www.ucoms.org
> 	http://www.gridlab.org
> 	http://www.lgc.com
> 
> 
> 

>  SAGA Use Case Template:
>   =======================
>   
>   Name of use case:        Interactive Visualization Services
>   Contact (name and address):         
> 			   Andrei Hutanu        <ahutanu at cct.lsu.edu>
>   
>   
>   1. General Information:
>   -----------------------
>   
>     This section consists of check-boxes to provide some context in
>     which to evaluate the use case.
>     
>     1.1 Which best describes your organization:
>     
>       Industry                     [ ]
>       Academic                     [X]
>       Other                        [ ]
>          Please specify:           ................................
>                          
>                          
>     1.2 Application area:    
>                          
>       Astronomy                    [ ]
>       Particle physics             [ ]
>       Bio-informatics              [ ]
>       Environmental Sc.            [ ]
>       Image analysis               [ ]
>       Other                        [ ]
>          Please specify:           Distributed visualization, Grid computing
>     
>     
>     1.3 Which of the following apply to or best describe this use
>         case Multiple selections are possible, please prioritize 
>         with numbers from 1 (low) to 5 (high):
>     
>       Database                     [ ]
>       Remote steering              [4]
>       Visualization                [5]
>       Security                     [1]
>       Resource discovery           [4]
>       Resource scheduling          [4]
>       Workflow                     [ ]
>       Data movement                [3]
>       High Throughput Computing    [ ]
>       High Performance Computing   [ ]
>       Other                        [ ]
>           Please specify:          ................................
>     
>     
>     1.4 Are you an:
> 
>       Application user             [ ]
>       Application developer        [ ]
>       System administrator         [ ]
>       Service developer            [ ]
>       Computer science researcher  [X]
>       Other                        [ ]
>           Please specify:          ................................
>     
>   
>           
>   2. Introduction:
>   ----------------
>     
>     2.1 Provide a paragraph introduction to your use case.
>         Background to the project is another alternative. 
>         (E.g. 100 words).
> 	
> 	Interactive visualization of large datasets needs resources.
> 	The memory, CPU and GPU cycles, software licenses, 
> 	display and interaction resources needed to
> 	manage the ever-growing data sizes are not available
> 	on every researcher's desktop. 
> 	
> 	The advances in Grid computing technologies through
> 	projects like Globus, Condor, Unicore and Awaki
> 	to name just a few have made the sharing
> 	and aggregation of computational resources possible.
> 	Compute clusters connected by fast optical networks
> 	can be used together to solve larger and more complicated problems
> 	than the ones that can be solved by the largest supercomputer
> 	in the world.
> 	
> 	Unfortunately, the visualization component
> 	needed to analyse the generated output is lagging behind.
> 	Part of the problem is that there are many very complex grid 
> 	tools that can be used to distribute a visualization application.
> 	Both the complexity and diversity of these tools
> 	are barriers in the way of getting them adopted by
> 	developers of visualization tools.
> 	
> 	Collaborative visualization involving any type of sharing 
> 	between two or more users adds new requirements
> 	to the visualization of large data sets. 
> 	
>   
>     2.2 Is there a URL with more information about the project ?
>   
> 	Currently only : http://www.cct.lsu.edu/Visualization/VizServiceDemo/
>         
>   3. Use Case to Motivate Functionality Within a Simple API:
>   ----------------------------------------------------------  
>     
>     Provide a scenario description to explain customers' needs.
>     E.g.  "move a file from A to B," "start a job."
>     
>     Please include figures if possible.
>   
>     If your use case requires multiple components of functionality,
>     please  provide separate descriptions for each component,
>     bullet points of 50 words per functionality are acceptable.
>     
>     * A simulation has produced a large dataset ( > 500 Gb),
>       and stored it on a high-performance data archive on a remote machine A.
>       We have two users, B and C using mid-range visualization hardware
>       (dual-processor, high-end graphics card, etc..)
>       large displays and a fast network connections (e.g. 10Gig).
>       The users and the data are in three distinct locations,
>       and a high-end visualization machines D is 
>       accessible for both of them. The users want to start and share
>       a visualization session of the above mentioned data.
>       A, B, C and D and are interconnected by optical networks.
> 
>       The proposed scenario looks as following :
>       One of the users decides to start the visualization software on D
>       The visualization software will need to spawn off a process on a 
>       machine E "near" the data archive. 
>       This will be a data service used to load select data of interest for the 
>       visualization session as requested by the users. On D (the 
>       machine where the visualization software is running),
>       data flowing from A through E is transformed in images and 
>       video signal and using
>       specialized hardware and software (to avoid read-back
>       from the graphics card) the video signal is transformed in 
>       HD digital video streams.
>       The users consume the video streams and share the interaction
>       with the visualization service.
>       The networks between A, B, C, D, and E
>       have to be provisioned before or at the time 
>       the visualization job is started. The networks, compute
>       and visualization resources are part of a complex
>       resource description that is used to submit the visualization job.
> 
>       Simple APIs are required for the following operations:
> 	* Scheduling and submitting the visualization job
> 	  on machines D and E and the networks interconnecting
> 	  all the machines
> 	* Accessing the data from the data source A on the data service E
> 	* Interprocess communication between D and E and between D and B and C respectively. (asynchronous)
>     
>      * A very similar scenario is valid for one user only. User B
> 	would like to use the high-end visualization machine D
> 	to be able to interactively render more data than he's able to 
>     	render on his local machine. Data service running on machine E
> 	is performing compute or memory extensive operations
> 	that cannot be performed on the local machine.
> 
>   4. Customers:
>   -------------
>   
>     Describe customers of this use case and their needs. In
>     particular, where and how the use case occurs "in nature" and
>     for whom it occurs.  E.g. max 40 words
>   
>     Users that want to collaboratively or individually visualize
>     data so large that it cannot be interactively visualized on their
>     local machines.
>   
>     
>   5. Involved Resources:
>   ----------------------
>   
> 
>     5.1 List all the resources needed: e.g. what hardware, data,
>         software might be involved.
> 
> 	Data source is a large capacity storage system,
> 	the data service runs potentially on multiple
> 	machines. In the simplest case, the data service serves as
> 	a cache for the data, in more complex scenarios
> 	uses CPU power to compute visualization data 
> 	on-the-fly. Visualization service
> 	runs on a visualization machine with at least
> 	one graphics pipe (potentially multiple cards).
> 	  
>     
>     5.2 Are these resources geographically distributed?
>   
> 	Yes, all the components are on separate machines
> 	interconnected by high-speed networks.
>   
>     
>     5.3 How many resources are involved in the use case?  E.g. how
>         many remote tasks are executing at the same time?
>   
> 	At least two (data service and visualization service)
> 	but these can use multiple processors.  
>     
>     5.4 Describe your codes and tools: what sort of license is
>         available, e.g. open or closed source license; what sort of
>         third party tools and libraries do you use, and what is
>         their availability;  do you regularly work from source
>         code, or use pre-compiled applications; what languages are
>         your applications developed in (if relevant), e.g.
>         Fortran, C, C++, Java, Perl, or Python.
>   
> 	C/C++ visualization applications. Looking
> 	for general framework that can be applied to 
> 	a wide range of applications. The framework 
> 	will be open source, working from source code.
> 	Using a SOAP engine for inter-task communication.  
>     
>     5.5 What information sources do you require, e.g. certificate
>         authorities, or registries.
>   
> 	Need information about the resources.
>   
>     
>     5.6 Do you use any resources other than traditional compute 
>         or data resources, e.g. telescopes, microscopes, medical 
>         imaging instruments.
>   
>     	Not currently.
>   
>     
>     5.7 How often is your application used on the grid or grid-like
>         systems?
>   
>         [X] Exclusively
>         [ ] Often (say 50-50)
>         [ ] Occasionally on the grid, but mostly stand-alone
>         [ ] Not at all yet, but the plan is to.
>   
>   
>   
>     
>   6. Environment:
>   ---------------
>   
>     Provide a description of the environment your scenario runs in,
>     for example the languages used, the tool-sets used, and the
>     user environments (e.g. shell, scripting language, or portal).
>   
>     Shell. Portal interaction possible in limited.
>   
>     
>   7. How the resources are selected:
>   ----------------------------------
>   
>     7.1 Which resources are selected by users, which are inherent
>        in the application, and which are chosen by system
>        administrators, or by other means?  E.g. who is specifying 
>        the architecture and memory to run the remote tasks?  
>   
>     	The task of this service framework is
> 	to automatically select a set of resources
> 	that makes the solving of the task possible.
> 	If there are many choices, user selection
> 	is possible.
>     
>     
>     7.2 How are the resources selected? E.g. by OS, by CPU power, 
>         by memory, don't care, by cost, frequency of availability 
>         of information, size of datasets?
>   
> 	Network connectivity, memory and CPU.
>     
>     
>     7.3 Are the resource requirements dynamic or static?
>   	
> 	Once the resources are selected they shouldn't need
> 	to be changed.
>   
>     
>   8. Security Considerations:
>   ---------------------------
>   
>     8.1 What things are sensitive in this scenario: executable
>         code, data, computer hardware?  I.e. at what level are
>         security measures used to determine access, if any?
>   
> 	Data might be sensitive, as well as software resources
> 	(licenses).  Hardware resources are always sensitive.
> 	
>   
>     8.2 Do you have any existing security framework, e.g.
>         Kerberos 5, Unicore, GSI, SSH, smartcards?
>     
> 	Using GSI in various components.
> 
>     
>     8.3 What are your security needs: authentication,
>         authorisation, message protection, data protection,
>         anonymisation, audit trail, or others?
>   	
> 	Need, authentication and authorization and audit trail
> 	for all the components. Data protection
> 	possibly needed.   
>     
>     8.4 What are the most important issues which would simplify
>         your security solution?  Simple API, simple deployment,
>         integration with commodity technologies.  
>   	
> 	User-friendliness. User should only input a password
> 	requested by the application when needed. An
> 	operation should not fail until it explicitly asks the user
> 	for the required credential information.
>   
>     
>   9. Scalability:
>   ---------------
>   
>     What are the things which are important to scalability and to
>     what scale - compute resources, data, networks ?
>   
>     	Data size can scale indefinitely. Network capacity
> 	up to the current optical network capacities -- tens of
> 	Gbit/s. Compute resources -- up to 5 components
> 	in the entire scenario.
>   
>   
>     
>   10. Performance Considerations:
>   -------------------------------
>   
>     Explain any relevant performance considerations of the use
>     case.
>   
>     	Asynchronous transfer is required for efficient
> 	bandwidth usage (need ability to pipeline
> 	multiple transport operations). 
> 	Network transfer mechanism to scale
> 	up to the current fastest capacity networks.
>   	Should be able to choose more security vs. more
> 	performance. 
> 	API for remote file access should not limit
> 	the performance of the storage system.
>   
>     
>   11. Grid Technologies currently used:
>   -------------------------------------
>   
>     If you are currently using or developing this scenario, which
>     grid technologies are you using or considering?
>   
>     	Web (Grid) services, grid job submission.
>     
>     
>   12. What Would You Like an API to Look Like?
>   --------------------------------------------  
>     
>     Suggest some functions and their prototypes which you would
>     like in an API which would support your scenario.
> 
>     For data transport:
>   
> 	//starts send operation. callback is called at
> 	//the end of the operation
>   	AsyncResult beginSend (void* buf, int bufsize, AsyncResultCallback cb, void* userData);
> 
> 	//releases resources, if called before callback blocks
>         endSend (AsyncResult handle);   
>     
>         AsyncResult beginReceive (AsyncResultCallback cb, void* userData);
>         
>         Block* endReceive (AsyncResult handle);
> 
> 	void releaseBlock (Block* block);
> 
> 	AsyncResult beginConnect (AsyncResultCallback cb, void* userData);
> 
>         void endConnect (AsyncResult handle);
> 
> 	class AsyncResult {
> 		//cancel the operation
> 		void cancel ();
> 	}
> 
> 	typedef struct Block{
>           const void* const buffer;
>           size_t size;
> 	} Block;
> 
> 
>     For job submission:
> 	
> 	No detailed API but should be able to specify
> 	something like:
> 	//Unspecified resource -- only requirements
> 	ResourceDescription A = ResourceDescription(10 CPU, 40 Gbyte RAM, X Gflops);
> 	ResourceDescription B = ResourceDescription(4 GraphicPipes, 4 Gbyte Video Ram) + A;
> 	//semi-specified resource : One of the machines that
> 	//holds a copy of my data file
> 	ResourceDescription C = ResourceDescription(LogicalFile("my_interesting_file"));
> 	//specified resource : the two users
> 	ResourceDescription D = ResourceDescription("client1.lsu.edu");
> 	ResourceDescription E = ResourceDescription("client2.lsu.edu");
> 	//hybrid resource .. some specified, some unspecified resources
> 	//that need to be connected by a 10Gbit/s network
> 	ResourceDescription F = ResourceDescription(Network(A, B, C, D, E, F, 10 Gbit/s)) +
> 				
> 	JobSubmit(myJob, F);    
> 
>   13. References:
>   ---------------
>   
>     List references for further reading.
>   
>     	http://wiki.cct.lsu.edu/wiki/space/Andrei+Hutanu/Prohaska_et_al_MardiGras2005.pdf
>   	http://www.cct.lsu.edu/Visualization/VizServiceDemo/
> 	http://www.zib.de/visual/projects/gridlab/hdf5/
> 	more to come..
> 
> 
> 
> 
> 
> 
> 




-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+





More information about the saga-rg mailing list