[glue-wg] Problems in using CSV for enumerations, some solutions

Fri Apr 4 10:03:21 EDT 2014

Hi GLUE-ers,

I am not sure if this is relevant, but OGF would also be willing to
set up a simple database of your choice to maintain enum information
or similar, which would make retrieving those data in a structured way
and different formats simple.  Let me know if this is interesting to
you, and I'd be happy to do the deed, and to contribute some scripts
for data retrieval.

Cheers, Andre.

On Fri, Apr 4, 2014 at 3:58 PM, Florido Paganelli
<florido.paganelli at hep.lu.se> wrote:
> Hi JP,
>
> Thanks for reading my comments!
>
> On 2014-04-04 15:31, Navarro, John-Paul F. wrote:
>> Florido,
>>
>> The solution to Problem 1 makes sense to me.
>>
>> However, for Problem 2 I think we need to keep the process of
>> maintaining enumerations as simple as possible with as little effort
>> as possible (since we are all volunteers). Rather than add
>> enumeration update steps to publishes the same data in multiple
>> formats, I would recommend people volunteering scripts that transform
>> the information and putting those in our shared repository.  Will
>> require little maintenance and don't add overhead to the daily
>> updating of enumerations.  It's trivial to write a script that
>> downloads all the enumerations from the official location and merges
>> them while adding the InterfaceName_t column to the merge. That's my
>> recommendation.
>>
>
> You might be right in the sense that I foresee new fields adding up
> to the CSVs in the long term, depending on the enumeration (see
> discussion about Capabilities in "New Endpoint and Service types")
>
> But this just means that we should define a merge of all files as a big
> join of all csv table, i.e. conceptually a big sparse matrix. Then it's
> easy just to sincronize all CSV files to have the same headers. This
> will allow merge as a simple file concatenation, and a consistent
> representation across all files.
>
> Of course one can do better by using json and other database
> representation that is more efficient, but maybe that will come later.
>
> Otherwise we have to preserve a different description of the CSV in some
> external document, this I think is more work than needed. Spotting an
> inconsistency in a CSV header is easier than maintaining external
> information.
>
> So I vote for adding fields incrementally on the right side of each
> table/CSV as the need for additional information comes.
>
>> JP
>>
>> On Apr 4, 2014, at 3:08 AM, Florido Paganelli
>> <florido.paganelli at hep.lu.se> wrote:
>>
>>> Hi all,
>>>
>>> Today, while fixing the enumerations as decided last time, I found
>>> myself facing two problems:
>>>
>>>
>>> Problem 1) how to communicate what-obsoletes-what or
>>> what-is-recommended-instead. In short, how does a consumer find out
>>> what to use if a string she's using is deprecated.
>>>
>>> I'll give you an example with these records:
>>>
>>> IntefaceName_t  | Description                                 |
>>> Status     |
>>> =============================================================================
>>>
>>>
> org.globus.gram | job submission service for Globus           | Recommended |
>>> -----------------------------------------------------------------------------
>>>
>>>
> GRAM5           | job submission service for Globus version   |
> Deprecated  |
>>> | 5.x (GRAM5)                                      |             |
>>>
>>> Problem: I search for GRAM5, i see is deprecated, how do I find out
>>> what to use instead?
>>>
>>> Solution: My solution will be to enrich the CSV this way:
>>>
>>> IntefaceName_t  | Description                                 |
>>> Status      | Recommended     | Deprecates |
>>> ============================================================================================================
>>>
>>>
> org.globus.gram | job submission service for Globus           | Recommended
> |                 | GRAM5      |
>>> ------------------------------------------------------------------------------------------------------------
>>>
>>>
> GRAM5           | job submission service for Globus version   |
> Deprecated  | org.globus.gram |            |
>>> |  5.x (GRAM5)                                     |             |
>>> |            |
>>>
>>> that is, adding two fields (that can be multilined)
>>>
>>> This will also allow me to speedup the process of reviewing
>>> existing InterfaceName_t overlaps, as I will send to the list a
>>> list of services with multiple names with a proposed
>>> deprecated/recommended set.
>>>
>>> Problem 2) Warren once requested me to provide a single file
>>> containing all the enumerations. The problem with the current CSV
>>> is that, if I just merge them, one will loose the reference to what
>>> Enumeration Type the merged document refers to. Thus, I need to add
>>> an additional field with the enumeration type name for that.
>>>
>>> Two ways:
>>>
>>> Solution a) Keep the single files as they are, and generate a
>>> merged file. If you consider InterfaceName in Problem 1 above, an
>>> example of merged file including ServiceType_t would be:
>>>
>>> EnumerationType | EnumerationName | Status      | Recommended     |
>>> Deprecates |
>>> ================================================================================
>>>
>>>
> InterfaceName_t | GRAM5           | Deprecated  | org.globus.gram |
>        |
>>> --------------------------------------------------------------------------------
>>>
>>>
> ServiceType_t   | egi.GRIDVIEW    | Recommended |                 |
>        |
>>>
>>>
>>> Solution b) Change all existing files to the above format.
>>>
>>> I would prefer solution b), as it makes consistent to use either
>>> the big file or a sigle file with the same parser. There is no
>>> strictural difference between the partial files and the big file.
>>> This will require also a change in the document I wrote on
>>> enumerations.
>>>
>>> Your opinions are very welcome. If no objections I will follow the
>>> solutions shown up here and document all of them in the changelog
>>> file.
>>>
>>> Cheers, Florido
>>>
>>>
>>> -- ================================================== Florido
>>> Paganelli ARC Middleware Developer - NorduGrid Collaboration System
>>> Administrator Lund University Department of Physics Division of
>>> Particle Physics BOX118 221 00 Lund Office Location: Fysikum, Hus
>>> B, Rum B313 Office Tel: 046-2220272 Email:
>>> florido.paganelli at REMOVE_THIShep.lu.se Homepage:
>>> http://www.hep.lu.se/staff/paganelli
>>> ==================================================
>>> _______________________________________________ glue-wg mailing
>>> list glue-wg at ogf.org https://www.ogf.org/mailman/listinfo/glue-wg
>>
>
>
> --
> ==================================================
>  Florido Paganelli
>    ARC Middleware Developer - NorduGrid Collaboration
>    System Administrator
>  Lund University
>  Department of Physics
>  Division of Particle Physics
>  BOX118
>  221 00 Lund
>  Office Location: Fysikum, Hus B, Rum B313
>  Office Tel: 046-2220272
>  Email: florido.paganelli at REMOVE_THIShep.lu.se
>  Homepage: http://www.hep.lu.se/staff/paganelli
> ==================================================
> _______________________________________________
> glue-wg mailing list
> glue-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/glue-wg

-- 
Nothing is really difficult.