[glue-wg] Problems in using CSV for enumerations, some solutions

Florido Paganelli florido.paganelli at hep.lu.se
Fri Apr 4 09:58:12 EDT 2014


Hi JP,

Thanks for reading my comments!

On 2014-04-04 15:31, Navarro, John-Paul F. wrote:
> Florido,
> 
> The solution to Problem 1 makes sense to me.
> 
> However, for Problem 2 I think we need to keep the process of
> maintaining enumerations as simple as possible with as little effort
> as possible (since we are all volunteers). Rather than add
> enumeration update steps to publishes the same data in multiple
> formats, I would recommend people volunteering scripts that transform
> the information and putting those in our shared repository.  Will
> require little maintenance and don't add overhead to the daily
> updating of enumerations.  It's trivial to write a script that
> downloads all the enumerations from the official location and merges
> them while adding the InterfaceName_t column to the merge. That's my
> recommendation.
> 

You might be right in the sense that I foresee new fields adding up
to the CSVs in the long term, depending on the enumeration (see
discussion about Capabilities in "New Endpoint and Service types")

But this just means that we should define a merge of all files as a big
join of all csv table, i.e. conceptually a big sparse matrix. Then it's
easy just to sincronize all CSV files to have the same headers. This
will allow merge as a simple file concatenation, and a consistent
representation across all files.

Of course one can do better by using json and other database
representation that is more efficient, but maybe that will come later.

Otherwise we have to preserve a different description of the CSV in some
external document, this I think is more work than needed. Spotting an
inconsistency in a CSV header is easier than maintaining external
information.

So I vote for adding fields incrementally on the right side of each
table/CSV as the need for additional information comes.

> JP
> 
> On Apr 4, 2014, at 3:08 AM, Florido Paganelli
> <florido.paganelli at hep.lu.se> wrote:
> 
>> Hi all,
>> 
>> Today, while fixing the enumerations as decided last time, I found
>> myself facing two problems:
>> 
>> 
>> Problem 1) how to communicate what-obsoletes-what or
>> what-is-recommended-instead. In short, how does a consumer find out
>> what to use if a string she's using is deprecated.
>> 
>> I'll give you an example with these records:
>> 
>> IntefaceName_t  | Description                                 |
>> Status     | 
>> =============================================================================
>>
>> 
org.globus.gram	| job submission service for Globus	      | Recommended |
>> -----------------------------------------------------------------------------
>>
>> 
GRAM5  	        | job submission service for Globus version   |
Deprecated  |
>> | 5.x (GRAM5)	                              |             |
>> 
>> Problem: I search for GRAM5, i see is deprecated, how do I find out
>> what to use instead?
>> 
>> Solution: My solution will be to enrich the CSV this way:
>> 
>> IntefaceName_t  | Description                                 |
>> Status      | Recommended     | Deprecates | 
>> ============================================================================================================
>>
>> 
org.globus.gram	| job submission service for Globus	      | Recommended
|                 | GRAM5      |
>> ------------------------------------------------------------------------------------------------------------
>>
>> 
GRAM5  	        | job submission service for Globus version   |
Deprecated  | org.globus.gram |            |
>> |  5.x (GRAM5)	                              |             |
>> |            |
>> 
>> that is, adding two fields (that can be multilined)
>> 
>> This will also allow me to speedup the process of reviewing
>> existing InterfaceName_t overlaps, as I will send to the list a
>> list of services with multiple names with a proposed
>> deprecated/recommended set.
>> 
>> Problem 2) Warren once requested me to provide a single file
>> containing all the enumerations. The problem with the current CSV
>> is that, if I just merge them, one will loose the reference to what
>> Enumeration Type the merged document refers to. Thus, I need to add
>> an additional field with the enumeration type name for that.
>> 
>> Two ways:
>> 
>> Solution a) Keep the single files as they are, and generate a
>> merged file. If you consider InterfaceName in Problem 1 above, an
>> example of merged file including ServiceType_t would be:
>> 
>> EnumerationType | EnumerationName | Status      | Recommended     |
>> Deprecates | 
>> ================================================================================
>>
>> 
InterfaceName_t | GRAM5           | Deprecated  | org.globus.gram |
       |
>> --------------------------------------------------------------------------------
>>
>> 
ServiceType_t   | egi.GRIDVIEW    | Recommended |                 |
       |
>> 
>> 
>> Solution b) Change all existing files to the above format.
>> 
>> I would prefer solution b), as it makes consistent to use either
>> the big file or a sigle file with the same parser. There is no
>> strictural difference between the partial files and the big file.
>> This will require also a change in the document I wrote on
>> enumerations.
>> 
>> Your opinions are very welcome. If no objections I will follow the
>> solutions shown up here and document all of them in the changelog
>> file.
>> 
>> Cheers, Florido
>> 
>> 
>> -- ================================================== Florido
>> Paganelli ARC Middleware Developer - NorduGrid Collaboration System
>> Administrator Lund University Department of Physics Division of
>> Particle Physics BOX118 221 00 Lund Office Location: Fysikum, Hus
>> B, Rum B313 Office Tel: 046-2220272 Email:
>> florido.paganelli at REMOVE_THIShep.lu.se Homepage:
>> http://www.hep.lu.se/staff/paganelli 
>> ================================================== 
>> _______________________________________________ glue-wg mailing
>> list glue-wg at ogf.org https://www.ogf.org/mailman/listinfo/glue-wg
> 


-- 
==================================================
 Florido Paganelli
   ARC Middleware Developer - NorduGrid Collaboration
   System Administrator
 Lund University
 Department of Physics
 Division of Particle Physics
 BOX118
 221 00 Lund
 Office Location: Fysikum, Hus B, Rum B313
 Office Tel: 046-2220272
 Email: florido.paganelli at REMOVE_THIShep.lu.se
 Homepage: http://www.hep.lu.se/staff/paganelli
==================================================


More information about the glue-wg mailing list