Those of you might not know, but for some years I worked at the National Archives of Australia working on, at the time, their leading digital preservation platform. It was awesome, opensource, and they paid me to hack on it.
The most important parts of the platform was Xena and Digital Preservation Recorder (DPR). Xena was, and hopefully still is amazing. It takes in a file, guesses the format. If it’s a closed proprietary format and it had the right xena plugin it would convert it to an open standard and optionally turned it into a .xena file ready to be ingested into the digital repository for long term storage.
We did this knowing that proprietary formats change so quickly and if you want to store a file format long term (20, 40, 100 years) you won’t be able to open it. An open format on the other hand, even if there is no software that can read it any more is open, so you can get your data back.
Once a file had passed through Xena, we’d use DPR to ingest it into the archive. Once in the archive, we had other opensource daemons we wrote which ensured we didn’t lose things to bitrot, we’d keep things duplicated and separated. It was a lot of work, and the size of the space required kept growing.
Anyway, now I’m an OpenStack Swift core developer, and wow, I wish Swift was around back then, because it’s exactly what is required for the DPR side. It duplicates, infinitely scales, it checks checksums, quarantines and corrects. Keeps everything replicated and separated and does it all automatically. Swift is also highly customise-able. You can create your own middleware and insert it in the proxy pipeline or in any of the storage node’s pipelines, and do what ever you need it to do. Add metadata, do something to the object on ingest, or whenever the object is read, updating some other system.. really you can do what ever you want. Maybe even wrap Xena into some middleware.
Going one step further, IBM have been working on a thing called storlets which uses swift and docker to do some work on objects and is now in the OpenStack namespace. Currently storlets are written in Java, and so is Xena.. so this might also be a perfect fit.
Anyway, I got talking with Chris Smart, a mate who also used to work in the same team at NAA, so it got my mind thinking about all this and so I thought I’d place my rambling thoughts somewhere in case other archives or libraries are interested in digital preservation and needs some ideas.. best part, the software is open source and also free!