On Wed, May 17, 2017 at 09:14:41AM +0100, Ben Tasker wrote:
On Wed, May 17, 2017 at 8:08 AM, Zenaan Harkness <zen@freedbms.net> wrote:
"Formal" library software (e.g. Evergreen) may be the answer for repo wide searchability of meta data, but when others share interest, they may only be able to store, and or may be only particularly interested in a sub-category or two such as health and agriculture/gardening, so it might be best anyway to use a handful of top level "general categories" to reduce our maximums down from 50K books per dir, to 10 fold fewer at least.
I guess your best bet would probably be to approach it like a physical library does.
Ack.
Divide by broad category/genre (so separate fiction and non-fiction, subdivide that into health etc)
Personally not interested in fiction, but of course some will be, and "the library" should work for everyone, including those who want to store all books in all languages :)
Then divide further by Author's name, perhaps dividing that further by the first two chars of the author's name
Sounds good - many authors write more than one book, and its good to group by author, even though that will result in some authors with more than one directory (those who write books existing in more than 1 category).
But it'd get complicated quite quickly (particularly if you don't know the author's name)
And some books don't have an author - especially some of the older historical books, but I guess they could be grouped under "author: anonymous".
so you'd want some sort of index available
Definitely.
to do metadata based searching too. I think that's probably going to be hard to avoid with a substantial number of books though.
A principle is to have each item be self contained, in its own directory, so different editions/versions and associated assets (covers, imagery, audio) can all be sanely grouped for that book - different editions should probably each have their own directory. Rather than avoid an index, and based on the principle above, each "content item" will eventually have its own meta data file, and this, like every other file associated with a "content item" should be physically associated with the item - i.e., in the item's canonical directory, and then: automate the index creation by processing the meta data files, and supplementing this with information gathered directly from the filesystem (file sizes, PDF "page count"s, file types (image, text, PDF etc) and everything else that can be auto gathered) - DRY, don't repeat yourself, so don't "write" meta data that is in the filesystem, at the least filename and file size. YAML or YAML like for meta data files, seems to be the nicest for humans to read+write, and still very good for automated processing. I began this for my software repo years ago, but need to rewrite it. Thanks,