Protecting the world's digital heritage for future generations

Ready to archive in the cloud?

Top 5 questions to ask before outsourcing your archives to a cloud vendor

Curate, Preserve, Archive – you’ll probably agree that these words are neither used nor understood by the major cloud providers. There’s been a gaping lack of understanding around what archivists and librarians need in order to feel safe outsourcing their digital archives to the cloud. To be viable as a place to digitally preserve content forever (or many decades), cloud storage needs to do more than protect the bits for the long term. There are data ownership, location, authenticity and provenance considerations, among others.
Here are the first five questions I recommend asking a cloud storage vendor – and what to look for in their reply – to help you determine if they are the right choice for hosting and preserving your archives. (More questions - including security and geographic separation - in my next blog.)

Q1. What happens to my digital assets if the vendor goes out of business or ends the service?

A. Look for service level agreements that guarantee timely movement of your digital assets out of their cloud and back to you – or to another cloud provider of your choice.
B. Make sure that they guarantee adequate notice of end-of-service so you have time to plan for data migration out of that cloud.
C. If the vendor is subcontracting the cloud storage and acting as a broker, ask what promises they’re making – or not – around getting your data back.
D. Seek a vendor whose cloud storage service is based on open source and open APIs. These are a must-have for digital repositories and data storage solutions that will manage data for the long term (or forever), protecting you against vendor- or data-lock-in.

Q2. Does the cloud vendor understand the requirements of digital preservation and archiving versus digital storage?

A. Make sure the vendor understands things important to the digital stewardship and preservation of your assets. If the terms OAIS, PREMIS, AXF or BagIt are met with a glazed look, walk away!
B. The service should support different ways for file authentication (verification/fixity checking), such as different hash values that you provide with each file, hash computations you can actually verify were performed, as well as fixity-checks at time schedules you determine.
C. Digital content cannot be considered preserved without meaningful access, so you’ll need to know your options for access both for today’s users and longer term when formats and viewing technologies have evolved or become obsolete. If you’re using a front-end application to access the storage cloud, this functionality may be handled at the application layer; however, if directly depositing and accessing content into the cloud, assess your options for access via APIs, scripts, file system gateways or via a web UI. Check out the admin functions to manage your user access. Ask how you can use metadata to help prepare for future rendering and format transformation requirements.
D. Ask to see system logs or reports containing information that can help support your preservation planning and compliance audits. Ask if actions taken by different users of the system are captured – such as who did what, and when, to your assets. And, if corrupted data is repaired by the storage system, details about that repair.

Q3. How much preservation management and workflow is handled or aided?

A. If this causes another glazed look, walk away! Your vendor should be able to talk about how you can leverage their cloud storage service to store and access descriptive, technical or preservation metadata about each file/object stored. And how to use the metadata to control access, to aid in format transformations or rendering over time, to relate objects to each other, to assist in digital rights management, etc.

Q4. In what location will my archive collection be stored, and who owns my data and copyright?

A. Ask where the data center is located – perhaps State or Country law requires you keep data within geographic boundaries. Determine if the Vendor terms allow for your company’s data to be transferred elsewhere, to where, and whether your company will be notified beforehand.
B. Make sure you retain data ownership and all copyrights.

Q5. How do I (easily) get my thousands of terabytes of digital content into – and out of – the cloud?

A. The complexity of this challenge for assets that you want to relocate from your data center to the cloud (or back out of the cloud) require that your vendor offer bulk import/export services on physical media.
B. Ask your vendor if they can offer direct network connectivity between your data center and their storage cloud data center location.

To wrap this up, until now I’ve doggedly steered my museum and library archive clients away from using the cloud for anything more than small collections (under 10 terabytes) and for digital assets that cannot be replaced from another source. I felt cloud was suitable and “safe enough” for enabling sharing of copies of digital collections between institutions, for example, or for a disaster recovery copy, but definitely not viable as primary, preservation storage for large archive collections.

Over the past year I’ve had the opportunity to work with the EVault Long-Term Storage Service (LTS2) teams to roll out the first major storage cloud focused on addressing these unique needs of the preservation and archivist community. Preservation is a key tenet of the EVault LTS2 cloud offering. EVault LTS2 architects/engineering group, product management team and service organization all include preservation practitioners who work alongside archivists and professional societies to help ensure EVault LTS2 is delivering on features to address these (and more) issues.

Send Mail
Email me to be considered for a free trial account

– Watch for my next blog (or contact me) for 5 more important questions


How would you propose we work with an archival cloud if we are heavily invested in tape? (PBs)

I think you could use cloud as: *** as 2nd/Xnd copy (for disaster recovery) *** as most deep level storage (rare used or not used items) *** if you need some additional online services provided with cloud (no just storage - processing, distribution, sales etc.)

These issues are among those that keep 'fear of cloud' alive. Great to see the addressed head-on!

There are commitments that the repository makes to the donor, which ethically can't be third-partied....

Good point about donor commitments - thanks!

<a href="">pożyczki pozabankowe przez internet</a>