[saga-rg] Use Case Mapping...
Andre Merzky
andre at merzky.net
Thu Sep 8 03:09:22 CDT 2005
Hi all,
below are some notes for mapping the current SAGA API spec
to one of the use cases, i.e. the GridLab use case for
application migration. I attach the use case for reference
as well.
Cheers, Andre.
+-----------------------------------------------------------------+
The SAGA API allows to migrate any job it can handle with the
job class, using the migrate method. That provides an easy
solution for the GridLab migration use case if supported by the
implementation/middleware/backend:
--------------------------------------------------------------
#include <saga.hpp>
#include <vector>
#include <string>
using namespace std;
int main ()
{
saga::job_server js;
saga::job j = js.run_job ("remote.host.net", "my_app");
job_definition jd = j.get_job_definition ();
vector <string> hosts;
vector <string> files;
hosts.push_back (string ("near.host.net"));
files.push_back (string
("http://remote.host.net/file > http://near.host.net/file"));
jd.set_vector_attribute ("SAGA_HostList", hosts);
jd.set_vector_attribute ("SAGA_FileTransfer", files);
j.migrate (jd);
cout << "Heureka!" << endl;
return (0);
}
--------------------------------------------------------------
(Question: does the SAGA migrate call move checkpoint files
automatically, or do they need to be specified in the new
job description as above?)
However, for the complete use case to be implemented on
application level, a number of steps cannot be implemented in
SAGA. The call sequence would be:
In the application instance which performs the migration on
the other job:
- trigger migration for the remote job
- discover new resource
+ move checkpoint data to new resource
+ schedule application on new resource
+ continue computation (and discontinue old job)
In the application instance which gets migrated
- get tirggered from checkpointing
= perform application level checkpointing
- report checkpoint file location(s)
Items marked with + are possible to implement in SAGA, items
marked with - aren't. The item marked with '=' is (currently)
not related to SAGA.
For the complete implementation of the use case, SAGA misses:
1) means to communicate with the remote application instance
2) means to discover new resources
Notes:
1) means of communcation are actually given, but not per se
usable for this use case. E.g. streams are a definite
overkill for signalling checkpointing requests. Signals
(as in job.signal (int signal)) would work, but only if the
remote job uses signal handling as a checkpoint trigger.
That also might be difficultato use if the job is running
in a wrapper script, or in a virtual machine etc - that
might not be transparent to SAGA, and would require direct
communications. Also, the signalling method misses
feedback about success of the operation, and cannot return
information such as the location of checkpoint files.
2) the current SAGA API covers job submission to specific
hosts, or lets the middleware choose a suitable host for
submission. However, the brokering result is not exposed
on API level, as would be neccessary for this specific use
case, and possibly for other dynamically active Grid
applications.
One way to implement that is to provide a direct interface
to Grid information systems, and on that way expose
information about available resources. That would actually
be more flexible, as is e.g. also allows the discovery of
specific services, but would also require additional
semantic knowledge on application lelvel.
+-----------------------------------------------------------------+
--
+-----------------------------------------------------------------+
| Andre Merzky | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science | mail: merzky at cs.vu.nl |
| De Boelelaan 1083a | www: http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands | |
+-----------------------------------------------------------------+
-------------- next part --------------
SAGA Use Case Template:
=======================
Name of use case: Application Migration
Contact: Andre Merzky <merzky at cs.vu.nl>
1. General Information:
-----------------------
This section consists of check-boxes to provide some context in
which to evaluate the use case.
1.1 Which best describes your organisation:
Industry [ ]
Academic [x]
Other [ ]
Please specify: ................................
1.2 Application area:
Astronomy [ ]
Particle physics [ ]
Bio-informatics [ ]
Environmental Sc. [ ]
Image analysis [ ]
Other [ ]
Please specify: astrophisics, but the use case is generic
1.3 Which of the following apply to or best describe this use
case Multiple selections are possible, please prioritize
with numbers from 1 (low) to 5 (high):
Database [ ]
Remote steering [3]
Visualization [1]
Security [1]
Resource discovery [5]
Resource scheduling [5]
Workflow [3]
Data movement [5]
High Throughput Computing [ ]
High Performance Computing [1]
Other [ ]
Please specify: ................................
1.4 Are you an:
Application user [ ]
Application developer [ ]
System administrator [ ]
Service developer [ ]
Computer science researcher [ ]
Other [ ]
Please specify: Middleware Developer (higher levels)
2. Introduction:
----------------
2.1 Provide a paragraph introduction to your use case.
Background to the project is another alternative.
(E.g. 100 words).
One of the major scenarios targeted by the GridLab project is
the ability to migrate a running application in a VO. The
migration process may get triggered by various means:
- running out of time on the original resource
- a more powerful resource comes available
- a resource with more memory or local disk space is needed
- user prefers a different resource and triggers migration
- migration as part of a larger work flow scenario
The migrations includes following well defined steps:
- trigger migration
- discover new resource
- perform application level checkpointing
- move checkpoint data to new resource
- schedule application on new resource
- continue computation (and discontinue old job)
Several of these operations need to be done on application
level - the use cases specifically describes those operations
in respect to an Grid API.
2.2 Is there a URL with more information about the project ?
http://www.gridlab.org/
3. Use Case to Motivate Functionality Within a Simple API:
----------------------------------------------------------
Provide a scenario description to explain customers' needs.
E.g. "move a file from A to B," "start a job."
Please include figures if possible.
If your use case requires multiple components of functionality,
please provide separate descriptions for each component,
bullet points of 50 words per functionality are acceptable.
Following the list from 2.1:
- trigger migration
If the application triggers the migration process itself,
it needs means to communicate with the resource management
system it got started with, or with any other one which
knows about its execution environment requirements (exe,
input files, output files... -> job description). The
request basically is:
rms = Grid.getResourceManagementSystem ();
self = rms.getMyJobDescription ();
// perform checkpoint
// save state
If the application migration gets triggered from outside
the application, the application needs to have means to
getified about this - it needs to know when to perform
checkpointing and to shut down. There are many ways to do
that - application steering like mechanisms seem the most
convenient ones:
sub mycallback (userdata) {
// perform checkpoint
// save state
}
result = Grid.announceCheckpointCallback (mycallback,
userdata);
- discover new resource
If that operation is not performed by the resource
management system itself, the application needs to
discover new resources where itself can run on. It
needs to provide its own job description.
host = GriResourceManager.discoverNewHost (self);
- perform application level checkpointing
The checkpointing process itslef does not need Grid
support per se, but the application needs to be able to
announce the location of it's checkpoint files. These
could be put into a replica catalog, or onto a global
file system - but the resource manager needs to know
about them, in order to make them available on the new
resource:
app.checkpoint (filename);
grid.replicaCatalog.addFile (replicaname, filename);
rms.announceCheckpointFile (replicaname);
- move checkpoint data to new resource
If that operation is not performed by the resource
management system itself, the application needs to be
able to mograte its checkpoint files to the new resource:
grid.copyFiles (filename, host);
or
grid.replicate (replicaname, host);
- schedule application on new resource
If that operation is not performed by the resource
management system itself, the application needs to be
able to start a copy of itself on the remore resource:
copy = GriResourceManager.runJobOnHost (self, host);
- continue computation (and discontinue old job)
Both are straight forward.
4. Customers:
-------------
Describe customers of this use case and their needs. In
particular, where and how the use case occurs "in nature" and
for whom it occurs. E.g. max 40 words
The cusomers of the use case are scientific communities
with jobs
a) running for a very long time (~weeks)
b) with varying comuting demands (peeks requiring more
powerful resource, or more disk space)
c) which are part of larger dynamic systems
Grand Challenge Simulations are specific target applications
for that use case.
5. Involved Resources:
----------------------
5.1 List all the resources needed: e.g. what hardware, data,
software might be involved.
- compute resources
- data storage systems
- resoure management systems
- data replication/movement systems
- remote steering or monitoring systems
5.2 Are these resources geographically distributed?
potentially yes.
5.3 How many resources are involved in the use case? E.g. how
many remote tasks are executing at the same time?
minimum: 2, maximum: unlimited, only one compute resource
at the same time.
5.4 Describe your codes and tools: what sort of license is
available, e.g. open or closed source license; what sort of
third party tools and libraries do you use, and what is
their availablility; do you regularly work from source
code, or use pre-compiled applications; what languages are
your applications developed in (if relevant), e.g.
Fortran, C, C++, Java, Perl, or Python.
Application: C/Fortran code, open source
http://www.cactuscode.org
API: C api binding to Grid Services, open source
http://www.gridlab.org/gat/
Services: C and Java Services, open source, mostly basing
on globus
http://www.gridlab.org/
5.5 What information sources do you require, e.g. certificate
authorities, or registries.
Resource Discovery and state preservation (repolica systems
or similar) are the main requirements to information
management.
5.6 Do you use any resources other than traditional compute or
data resources, e.g. telescopes, microscopes, medical
imaging instruments.
No.
5.7 Please link all the above back to the functionalities
described in the use case section where possible.
...
5.8 How often is your application used on the grid or grid-like
systems?
[ ] Exclusively
[ ] Often (say 50-50)
[x] Ocassionally on the grid, but mostly stand-alone
[ ] Not at all yet, but the plan is to.
The application is actually used in Grids, but does not make
full use of Grid capabilities (as the one described here).
6. Environment:
---------------
Provide a description of the environment your scenario runs in,
for example the languages used, the tool-sets used, and the
user environments (e.g. shell, scripting language, or portal).
Users work mostly on shells, portals are uder development.
Programmers work on open source solutions, unix only, C, C++,
Fortran.
7. How the resources are selected:
----------------------------------
7.1 Which resources are selected by users, which are inherent
in the application, and which are chosen by system
administrators, or by other means? E.g. who is specifying
the architecture and memory to run the remote tasks?
Compute Resources are selected manually or automatically (job
description by users).
7.2 How are the resources selected? E.g. by OS, by CPU power,
by memory, don't care, by cost, frequency of availability
of information, size of datasets?
OS, Architecture, Memory, disk space, runtime (when, how
long)
7.3 Are the resource requirements dynamic or static?
Vary from run to run, but mostly static, sometimes dynamic.
In the future more dynamic.
8. Security Considerations:
---------------------------
8.1 What things are sensitive in this scenario: executable
code, data, computer hardware? I.e. at what level are
security measures used to determine access, if any?
Data should get only accessed by owner or group. Resources
are not to be compromised of course. --> standard academic
security requirements.
8.2 Do you have any existing security framework, e.g. Kerberos
5, Unicore, GSI, SSH, smartcards?
GSI for all communication and resource access.
8.3 What are your security needs: authentication,
authorisation, message protection, data protection,
anonymisation, audit trail, or others?
authentication, authorisation, basic data protection
8.4 What are the most important issues which would simplify
your security solution? Simple API, simple deployment,
integration with commodity technologies.
simple deployment
9. Scalability:
---------------
What are the things which are important to scalability and to
what scale - compute resources, data, networks ?
The scenario is not bound by scalability (the application of
course is).
10. Performance Considerations:
-------------------------------
Explain any relevant performance considerations of the use
case.
Full time to migrate to a better must result in a benefit if
compared to having the computation simply continue on the old
resource. However, on ocasions where simply continuation is
not possible, performance penalties are acceptable.
In general: performance requirements depend on specific
application/simulation.
11. Grid Technologies currently used:
-------------------------------------
If you are currently using or developing this scenario, which
grid technologies are you using or considering?
- globus based services from the GridLab project
- Grid Application Toolkit from the GridLab project
12. What Would You Like an API to Look Like?
--------------------------------------------
Suggest some functions and their prototypes which you would
like in an API which would support your scenario.
An example of a migtration in GAT is included in the GAT
release.
13. References:
---------------
List references for further reading.
http://www.gridlab.org/gat/
More information about the saga-rg
mailing list