Processing of Cypherpunks Archives Available archives of the Cypherpunks email list are incomplete, and in fact there is evidence they have been tampered with and/or redacted over the years. This project was to do some basic clean-up of available archives, which are in mbox format, so that they could be ingested and be viewed within Mailman. The archives contain many poorly formed messages, and Mailman defaults to the current date (December 2019, at the time of this writing) when it encounters a problem with the date. So, a separate 'cypherpunks-legacy' list was deployed to make the archives available without overlapping with the current active 'cypherpunks' list, which goes back to July 20 2013. Otherwise, the legacy archive messages would have been peppered into the current archives, in ways that would be difficult to predict or undo. There were especially many anomalies from the larger source, spanning 1999-2015. In addition to many poorly formed messages (i.e., messages that, in one way or another, could not be cleanly ingested with the Mailman 'arch') comment, there were invalid dates, and lines that had an errant "From " at the start. To ready the sources for ingestion to Mailman two automated tools were utilized, followed by some ad hoc edits and changes: 1. 'sortmbox.py' uses a Python library to put messages in by-date order. This proved to be less confusing to Mailman (i.e., fewer messages were inserted to the current month). 2. 'cleanarch' (/var/lib/mailman/bin/cleanarch) is part of the Mailman package. It fixes errant "From " entries at the start of lines. 3. I also used 'sed' to replace invalid dates with valid ones that were in the same ballpark day+time as the message. I found these either when 'arch' complained (such as for dates before the Unix epoch), or when Mailman was showing messages in the future: 's/ 0101 / 1999 /' | sed 's/ 0102 / 1999 /' 's/Thu Dec 31 22:40:39 1903/Thu Jul 5 22:40:39 2018/g' 's/Jan 1904/Jul 2018/' 's/Date: Sun, 1 Apr 2029 03:07:16 +0200/Date: Sat, 31 Mar 2001 15:59:46 -0800/' 's/Date: Fri, 3457 Jan 4 61400:2064:61300 +0200/Wed May 29 15:00:02 2013/' There might have been a few other small edits made within the files, which I didn't record, simply to help Mailman to do a better job of creating browsable archives. 4. I then concatenated all the mbox files (one each for 1992-1998, plus one larger file for 1999-2015), and reran sortmbox.py and cleanarch. 5. I edited the resulting file to remove everything after the new list archives were set up, on July 20 2013. To do the work above, I created a temporary Mailman list, and repeatedly used the 'arch' command to ingest the archives and fix problems. This was an iterative process. Once 'arch' was giving sane output, I created a new Mailman list, 'cypherpunks-legacy.' I put the single unified + fixed mbox file where Mailman tools would find it: /var/lib/mailman/archives/public/cypherpunks-legacy.mbox/cypherpunks-legacy.mbox At this point, I could use the Mailman to slurp the mbox files in, and create the browsable structure. The 'arch' command: /var/lib/mailman/bin/arch --wipe cypherpunks-legacy This served to populate the list archives, which are browsable here: https://lists.cpunks.org/pipermail/cypherpunks-legacy The single large mbox file resulting from the steps above is linked at the top of the Archives page. Here is a direct link: https://lists.cpunks.org/pipermail/cypherpunks-legacy.mbox/cypherpunks-legacy.mbox (615MB, containing approximately 180149 messages) Please be aware that Mailman's placement of messages by author, date, subject and thread - including the correct by-month folder - is not perfect. Messages sometimes end up in the wrong place, and sometimes threads are split across different years. Also note that these mbox files did not include attachments. The current Mailman archive does include attachments, but this legacy archive does not. It seems they were not included with the archive input sources (though it's possible some messages have MIME-encoded attachments within them). Anyone interested in doing a serious dig into the archive should also consult the original mbox files. These can be ingested into any capable email client program, and viewed as separate messages. They may be sorted and searched, just like any other email folder. The same sorts of issues as those described above will likely be evident in any email client, and clients will even show a different total number of messages. The types of sorting, editing, and displaying described above would have somewhat different results, if a different toolset is used. These archives are freely available, and the effort to make them available via Mailman is freely given. - gbn