[SAGA-RG] Python bindings: Buffer class issue

Mathijs den Burger mathijs at cs.vu.nl
Thu Nov 12 04:45:35 CST 2009


On Wed, 2009-11-11 at 21:11 +0100, Andre Merzky wrote:
> Quoting [Manuel Franceschini] (Nov 11 2009):
> > 
> > On Mon, Nov 9, 2009 at 11:10 PM, Andre Merzky <andre at merzky.net> wrote:
> > > Quoting [Manuel Franceschini] (Nov 09 2009):
> > >>
> > >> Hi all,
> > >>
> > >> Quick summary from GFD.90: the SAGA I/O Buffer encapsulates a sequence
> > >> of bytes to be used for I/O operations, e.g. read()/write() on files
> > >> and streams, and call() on rpc instances. The recent removal of the
> > >> buffer class from the Python bindings of the C++ SAGA implementation
> > >> led us to think again about this issue. The GFD is C/C++ oriented
> > >
> > > Well, it should not be C/C++ oriented, but the bias of the authors
> > > probably shows :-)  The intent was to support binary I/O on any
> > > language, as that was mentioned in many use cases.
> > >
> > >
> > >> and therefore the Python implementation is all but clear in this regard.
> > >>
> > >> Given that that memory management is automatic in Python, the notion
> > >> of application-managed and implementation-managed Buffer disappears.
> > >
> > > From what I learned during the discussion in Banff, this is not
> > > really true: one *can* allocate an array in user space and pass it
> > > to an API by-reference, which actually makes it a application
> > > managed memory segment.  The point in python seems to be that nobody
> > > is doing that...
> > 
> > Well, in Python there is *only* by-reference parameter passing,
> > references to objects that is. Version 2.6 introduced an io module
> > that allows to do what you describe. One problem with this is that our
> > JySAGA bindings can't support this new feature as Jython just reached
> > version 2.5.1 and it looks like there is quite a long way to go to
> > 2.6.
> 
> That is an implementation problem, and should not influence the
> python bindings, right? ;-)

Well, defining bindings that break all current implementations and their
usage won't work either. The C++ wrapper now also requires Python >=
2.2. Would all current users be willing/able to upgrade to >= 2.6?

The bindings will have to define which Python version is required. It
not only a matter of 2.x or 3.x; the 2.x versions also contain
increasingly more relevant functionality.

We can either opt for something low (e.g. >= 2.2) to increase
acceptance, or something high (e.g. >= 2.6 or >= 3.0) if these contain
features that are essential for the bindings. A third option is to
specify optional additional functionality for an implementation that's
only targeted at newer versions of Python, but that will probably
generate a lot of confusion.

I'd say we stick to >= 2.2; widely used, and supported by all current
implementations.

> > I did some memory profiling with large chunks of data copied from one
> > file to another and the automatic memory management in Python seemed
> > to be very efficient. In my tests the garbage collection was
> > instantaneously. In other words, as soon as there was no more
> > references to a data chunk, memory was deallocated. So when shuffling
> > 1MB chunks 10000 times from one file to another, the memory
> > consumption of the test program never exceeded 2,5 MB. If somebody can
> > come up with a test program that shows the advantage of using the new
> > io module in relevant use cases, we could think about using it in the
> > C++ bindings. Otherwise, why optimize when there's not real problem?
> 
> Fair point.  
> 
> But, BTW, I don't see app managed buffers for optimizing memory
> consumption, but for optimizing latency, as you save memcopy calls.
> In theory at least...
> 
> 
> > >> There is no need for a Python SAGA user to tell the bindings who
> > >> manages the Buffer, since it is managed by the underlying Python VM.
> > >>
> > >> Another more critical issue is the data type used to hold binary data
> > >> in Python. In Python 2.x the immutable 'str' type is used whereas
> > >> Python 3.x has a newly introduced immutable 'bytes' type. Let's forget
> > >> about 3.x for a moment, since 2.x will be around for at least a couple
> > >> of more years. In order to manipulate large binary datasets, the mmap
> > >> class [0] could be used, which basically transforms a immutable 'str'
> > >> into a mutable mmap object. In other words it provides the ability to
> > >> efficiently modify binary data.

Not really; it memory-maps a file, not an arbitrary string. However, you
can easily convert a string to a list or array and manipulate that in
place.

The real question is: which use cases are we trying to optimize? What
will SAGA Python apps do with binary data?

> > >> In the VU Python bindings the buffer class is still present, while, as
> > >> previously said, in the C++ Python bindings it was removed recently. I
> > >> do not see any issues with the removal of the Buffer class in the
> > >> Python bindings. However, I'm not sure whether I am forgetting some
> > >> corner cases (e.g. async) that would require a dedicated Buffer class.
> > >> When removing the Buffer class, the user would simply deal with 'str'
> > >> type data to pass data back and forth to a SAGA file, stream or rpc.
> > >
> > > If the bindings decide to go for strings, then that should pose no
> > > problem for the async calls, as far as I can tell: semantics of sync
> > > and async calls is identical (apart from synchronization obviously).
> > >
> > >
> > >> Now, I identified the following crucial questions:
> > >> 1) Can the Buffer class be safely removed from the Python bindings?
> > >
> > > According to the original SAGA use cases: no
> > > According to current SAGA users: yes

What were the original use cases that required a Buffer class?

> > >
> > > So, tough call ;-)
> > 
> > What do other people think?
> 
> anybody??
> 
> 
> > >> 3) Is compliance to Python 3.x a concern right now? In other words, is
> > >> the eventual migration to 3.x to take into consideration?
> > >
> > > If 3.x makes something easier, it might be good to be aware of it at
> > > least.  I think all agree that 2.x will be around for a long time,
> > > and that limiting the bindings to 3.x is not an option.  OTOH, it
> > > should be possible to have slightly differing bindings for 2.x and
> > > 3.x, depending on the changes in the language itself.
> > 
> > Yeah, I don't think we should think too much about that now. But for
> > the future it will bring several benefits to the Python bindings.
>
> agree.
> 
> Cheers, Andre.

-Mathijs




More information about the saga-rg mailing list