[SAGA-RG] Python bindings: Buffer class issue

Mon Nov 9 16:10:10 CST 2009

Quoting [Manuel Franceschini] (Nov 09 2009):
> 
> Hi all,
> 
> Quick summary from GFD.90: the SAGA I/O Buffer encapsulates a sequence
> of bytes to be used for I/O operations, e.g. read()/write() on files
> and streams, and call() on rpc instances. The recent removal of the
> buffer class from the Python bindings of the C++ SAGA implementation
> led us to think again about this issue. The GFD is C/C++ oriented 

Well, it should not be C/C++ oriented, but the bias of the authors
probably shows :-)  The intent was to support binary I/O on any
language, as that was mentioned in many use cases.

> and therefore the Python implementation is all but clear in this regard.
> 
> Given that that memory management is automatic in Python, the notion
> of application-managed and implementation-managed Buffer disappears.

From what I learned during the discussion in Banff, this is not
really true: one *can* allocate an array in user space and pass it
to an API by-reference, which actually makes it a application
managed memory segment.  The point in python seems to be that nobody
is doing that...

> There is no need for a Python SAGA user to tell the bindings who
> manages the Buffer, since it is managed by the underlying Python VM.
> 
> Another more critical issue is the data type used to hold binary data
> in Python. In Python 2.x the immutable 'str' type is used whereas
> Python 3.x has a newly introduced immutable 'bytes' type. Let's forget
> about 3.x for a moment, since 2.x will be around for at least a couple
> of more years. In order to manipulate large binary datasets, the mmap
> class [0] could be used, which basically transforms a immutable 'str'
> into a mutable mmap object. In other words it provides the ability to
> efficiently modify binary data.
> 
> In the VU Python bindings the buffer class is still present, while, as
> previously said, in the C++ Python bindings it was removed recently. I
> do not see any issues with the removal of the Buffer class in the
> Python bindings. However, I'm not sure whether I am forgetting some
> corner cases (e.g. async) that would require a dedicated Buffer class.
> When removing the Buffer class, the user would simply deal with 'str'
> type data to pass data back and forth to a SAGA file, stream or rpc.

If the bindings decide to go for strings, then that should pose no
problem for the async calls, as far as I can tell: semantics of sync
and async calls is identical (apart from synchronization obviously).

> Now, I identified the following crucial questions:
> 1) Can the Buffer class be safely removed from the Python bindings?

According to the original SAGA use cases: no
According to current SAGA users: yes

So, tough call ;-)

> 2) Is handling of large binary datasets a primary concern? If yes, how
> to handle them?

See above.  How to handle: dunno - that is the question, innit?

> 3) Is compliance to Python 3.x a concern right now? In other words, is
> the eventual migration to 3.x to take into consideration?

If 3.x makes something easier, it might be good to be aware of it at
least.  I think all agree that 2.x will be around for a long time,
and that limiting the bindings to 3.x is not an option.  OTOH, it
should be possible to have slightly differing bindings for 2.x and
3.x, depending on the changes in the language itself.

One proposal which came up a couple of times, and which I find
appealing, is to have support for strings (simple, solves many use
cases, pythonesque), and to add binary buffers for python-3.x
(natively supported, covers the remaining use cases, stays close to
spec).  Personally, I don't see the need for jumping through hoops
for python-2.x.

Cheers, Andre.

> Cheers,
> /Manuel
> 
> [0] http://docs.python.org/library/mmap.html
-- 
Nothing is ever easy.