how to best organise a large community "important" books library?

Wed May 17 03:50:51 PDT 2017

On Wed, May 17, 2017 at 09:14:41AM +0100, Ben Tasker wrote:
> On Wed, May 17, 2017 at 8:08 AM, Zenaan Harkness <zen at freedbms.net> wrote:
> > "Formal" library software (e.g. Evergreen) may be the answer for repo
> > wide searchability of meta data, but when others share interest, they
> > may only be able to store, and or may be only particularly interested
> > in a sub-category or two such as health and agriculture/gardening, so
> > it might be best anyway to use a handful of top level "general
> > categories" to reduce our maximums down from 50K books per dir, to 10
> > fold fewer at least.
>
> I guess your best bet would probably be to approach it like a
> physical library does.

Ack.

> Divide by broad category/genre (so separate fiction and
> non-fiction, subdivide that into health etc)

Personally not interested in fiction, but of course some will be, and
"the library" should work for everyone, including those who want to
store all books in all languages :)

> Then divide further by Author's name, perhaps dividing that further
> by the first two chars of the author's name

Sounds good - many authors write more than one book, and its good to
group by author, even though that will result in some authors with
more than one directory (those who write books existing in more than
1 category).

> But it'd get complicated quite quickly (particularly if you don't
> know the author's name)

And some books don't have an author - especially some of the older
historical books, but I guess they could be grouped under
"author: anonymous".

> so you'd want some sort of index available

Definitely.

> to do metadata based searching too. I think that's probably going
> to be hard to avoid with a substantial number of books though.

A principle is to have each item be self contained, in its own
directory, so different editions/versions and associated assets
(covers, imagery, audio) can all be sanely grouped for that book -
different editions should probably each have their own directory.

Rather than avoid an index, and based on the principle above, each
"content item" will eventually have its own meta data file, and this,
like every other file associated with a "content item" should be
physically associated with the item - i.e., in the item's canonical
directory, and then:
automate the index creation by processing the meta data files,
and supplementing this with information gathered directly from the
filesystem (file sizes, PDF "page count"s, file types (image, text,
PDF etc) and everything else that can be auto gathered) - DRY, don't
repeat yourself, so don't "write" meta data that is in the
filesystem, at the least filename and file size.

YAML or YAML like for meta data files, seems to be the nicest for
humans to read+write, and still very good for automated processing.
I began this for my software repo years ago, but need to rewrite it.

Thanks,