I've often thought that somewhere in the Amazon pricing team is a former Wall Street credit default obligation (CDO) designer with a grudge against humanity. As excited as I am about the new Amazon Glacier offering, I can't help but think its pricing is... well, a little complicated. On the surface, it's simple: $0.01/GB per month. If you are planning to store but never retrieve your data in Glacier, it really is that simple. But if you have plans on actually pulling data out of your vaults, it quickly gets a lot more complicated.
To calculate your costs, you must account for five components, four of which are fairly to moderately easy to understand:
- Storage - This is a simple flat rate of $0.01 per GB per month. Easy, right?
- API - You will be charged $0.05 per 1000 upload or retrieval API requests. Unless you plan on storing large volumes of very small objects, or retrieving very frequently, this charge will be inconsequential to your bill, and is merely Amazon socially engineering proper usage of their service.
- Data transfer - Putting data into Glacier is free; taking data out is not. To compute this, you use the tiered pricing table. e.g. retrieving 1 TB of data in a month will cost you a little under $1K.
- Early deletion - If you delete data within 90 days of uploading, you will be charged a pro-rated $0.03 per GB fee.
- Retrieval overage - The service is designed for retrieving no more than 5% of a vault in any given month. This 5% is pro-rated daily, and any overage above is charged based on a formula that uses the peak hourly retrieval for the month. I'd try to explain further, but just can't do justice to the Amazon explanation (read twice, reaffirm you have a college degree, try again).
Amazon support is fortunately busy trying to assist customers in making sense of this all. But I decided to build my own model, based on my understanding of the pricing, and using a couple common use cases.
The below table shows the cost for the two different use cases. The first use case is of a customer with a 100 TB of data in their vault, adding another 10TB per month, and retrieving about 5% of the data over the month. The second use case is the same as the first, but the customer is only retrieving 0.5% of their vault per month.
[table id=16 /]
In both cases, the economics are pretty compelling. But while use case #2 works out to a little more than $0.01 per gigabyte per month, use case #1 costs slightly more than $0.03 per gigabyte per month.
I probably should close by explaining my title: what does Amazon Glacier have in common with Wall Street CDOs? But do I really need to? ;)
Drop me an email if you'd like me to see the full spreadsheet model, including its detailed assumptions.