[SAGA-RG] Python bindings: Buffer class issue

Wed Nov 11 14:11:55 CST 2009

Quoting [Manuel Franceschini] (Nov 11 2009):
> 
> On Mon, Nov 9, 2009 at 11:10 PM, Andre Merzky <andre at merzky.net> wrote:
> > Quoting [Manuel Franceschini] (Nov 09 2009):
> >>
> >> Hi all,
> >>
> >> Quick summary from GFD.90: the SAGA I/O Buffer encapsulates a sequence
> >> of bytes to be used for I/O operations, e.g. read()/write() on files
> >> and streams, and call() on rpc instances. The recent removal of the
> >> buffer class from the Python bindings of the C++ SAGA implementation
> >> led us to think again about this issue. The GFD is C/C++ oriented
> >
> > Well, it should not be C/C++ oriented, but the bias of the authors
> > probably shows :-)  The intent was to support binary I/O on any
> > language, as that was mentioned in many use cases.
> >
> >
> >> and therefore the Python implementation is all but clear in this regard.
> >>
> >> Given that that memory management is automatic in Python, the notion
> >> of application-managed and implementation-managed Buffer disappears.
> >
> > From what I learned during the discussion in Banff, this is not
> > really true: one *can* allocate an array in user space and pass it
> > to an API by-reference, which actually makes it a application
> > managed memory segment.  The point in python seems to be that nobody
> > is doing that...
> 
> Well, in Python there is *only* by-reference parameter passing,
> references to objects that is. Version 2.6 introduced an io module
> that allows to do what you describe. One problem with this is that our
> JySAGA bindings can't support this new feature as Jython just reached
> version 2.5.1 and it looks like there is quite a long way to go to
> 2.6.

That is an implementation problem, and should not influence the
python bindings, right? ;-)

> I did some memory profiling with large chunks of data copied from one
> file to another and the automatic memory management in Python seemed
> to be very efficient. In my tests the garbage collection was
> instantaneously. In other words, as soon as there was no more
> references to a data chunk, memory was deallocated. So when shuffling
> 1MB chunks 10000 times from one file to another, the memory
> consumption of the test program never exceeded 2,5 MB. If somebody can
> come up with a test program that shows the advantage of using the new
> io module in relevant use cases, we could think about using it in the
> C++ bindings. Otherwise, why optimize when there's not real problem?

Fair point.  

But, BTW, I don't see app managed buffers for optimizing memory
consumption, but for optimizing latency, as you save memcopy calls.
In theory at least...

> >> There is no need for a Python SAGA user to tell the bindings who
> >> manages the Buffer, since it is managed by the underlying Python VM.
> >>
> >> Another more critical issue is the data type used to hold binary data
> >> in Python. In Python 2.x the immutable 'str' type is used whereas
> >> Python 3.x has a newly introduced immutable 'bytes' type. Let's forget
> >> about 3.x for a moment, since 2.x will be around for at least a couple
> >> of more years. In order to manipulate large binary datasets, the mmap
> >> class [0] could be used, which basically transforms a immutable 'str'
> >> into a mutable mmap object. In other words it provides the ability to
> >> efficiently modify binary data.
> >>
> >> In the VU Python bindings the buffer class is still present, while, as
> >> previously said, in the C++ Python bindings it was removed recently. I
> >> do not see any issues with the removal of the Buffer class in the
> >> Python bindings. However, I'm not sure whether I am forgetting some
> >> corner cases (e.g. async) that would require a dedicated Buffer class.
> >> When removing the Buffer class, the user would simply deal with 'str'
> >> type data to pass data back and forth to a SAGA file, stream or rpc.
> >
> > If the bindings decide to go for strings, then that should pose no
> > problem for the async calls, as far as I can tell: semantics of sync
> > and async calls is identical (apart from synchronization obviously).
> >
> >
> >> Now, I identified the following crucial questions:
> >> 1) Can the Buffer class be safely removed from the Python bindings?
> >
> > According to the original SAGA use cases: no
> > According to current SAGA users: yes
> >
> > So, tough call ;-)
> 
> What do other people think?

anybody??

> >> 3) Is compliance to Python 3.x a concern right now? In other words, is
> >> the eventual migration to 3.x to take into consideration?
> >
> > If 3.x makes something easier, it might be good to be aware of it at
> > least.  I think all agree that 2.x will be around for a long time,
> > and that limiting the bindings to 3.x is not an option.  OTOH, it
> > should be possible to have slightly differing bindings for 2.x and
> > 3.x, depending on the changes in the language itself.
> 
> Yeah, I don't think we should think too much about that now. But for
> the future it will bring several benefits to the Python bindings.

agree.

Cheers, Andre.

-- 
Nothing is ever easy.