Saturday, April 23, 2011

Cloud storage revisited

The cheapest reliable cloud storage is Amazon S3.  It's 14c for normal storage and 9c for reduced reliability, which is still way more reliable than your single hard drive.

The others can be seen as value added services, that's why it's more expensive, way more expensive.  But for me sync in Linux is trivial so S3 will be the standard.

S3 is cheap but no way to compete with your own external USB drive.

My main consideration is all your photos and videos.  Does it justify to store on S3 permanently, increasing every year.  It doesn't matter if you lost a picture or two, a clip or two.  Or does it matter if you lost an album at spring break?  It's relatively expensive than a huge local drive, which is pretty safe nowadays.  On the other hand they are priceless.

There are cheaper online storages much cheaper than S3.  It's like a hard drive but attached remotely.  The problem is that there's no guarantee how reliable it is, how they are managed, or if they will still be there next month or next year.

The only exception is Google.  You can backup anything to Google Doc at $0.25/GB per year.  It is competitive to a hard drive but it's very slow as I have reported.  You can upload a large file to Google Doc and time it.  I think Google have to throttle so all the people in the world can share it.

I thought I have found a solution, using S3 to backup and moving over permanent archives to Google Doc.  Even GD is an order of magnitude slower than S3, and in turn an order of magnitude (or more) than external drive.

The software by SMEStorage supports many clouds, behaves like a dropbox, or file manager.  You can also sync directories automatically as often as you like.  Other than speed, you can use S3 or GD like using your own hard drive.

The catch is that, the free SMEStorage account has only 2GB storage and 1GB bandwidth per month.  Over that, it's like any other paid solutions.  For 1 or 2 GB, there are plenty of services for free to cater for different needs.

SMEStorage for S3 is not needed because the free software is good enough.  The only good thing is that if SME is gone, the clouds like S3 or Google will still be there.

I still can't find anything to replace SMEStorage for Google Doc.  But a lot of developer tools and API's are out there.  May be no one is interested because GD is so slow.  But probably I won't buy a lifetime $40 for the software, because I can buy a hard drive instead.

Using S3:

I have been using S3 to backup my things.  The only command I use is:

s3cmd sycn /fullpath/ s3://bucketname/fullpath/

It's in a script and I don't even bother to parameterized it.  I just edit it for different archives.

It's pretty slow, 100 KB/s, may be twice that overnight.  I do not advice to pack the files into larger files.  Typically S3 fails to upload in the middle for large files.  s3cmd will attempt to repeat the upload until successful.  So small files is good as you can see the progress, and any failure will only require upload again of a small file.

It's a surprise to many that the upload failure rate is pretty high, unlike any upload you ever encountered.  It's that way a couple of years ago and it's the same now, at least to me.  So freewares like S3 Fox [my bad, I mistakenly wrote S3 explorer], a Firefox extension, is totally useless.  If you upload a large file, a movie, or many files, all the pictures, you will always miss some.

But there are also many free tools on Windows and Linux that can retry until the file successes.  s3cmd is one of them.  But still, if you have a large file, and repeated upload fails, the whole backup process may stall.  So you will miss one overnight window to backup.

I don't know why S3 don't behave like ftp, when you can resume from the point of failure.  That means the ftp protocol move files in trunks, while S3 don't, not in a way that you retry the fail bits.  Perhaps it's the cloud feature.  S3 have to handle all the uploads in the world, while ftp is end to end.

So the sync feature of s3cmd is important.  I don't even look at the logs.  In the morning I just run the script again.  Only files not up there yet will be uploaded.

Normally you download the whole directory with a much faster speed, and compare the files to assure file integrity.  (I don't know if the S3 protocol uses any checksums to guard against errors.)  But for pictures, movie clips I don't bother as long as the file is up there.  At most you lost one picture or clip.  So you save half the bandwidth, which is the expensive part, and downloading is a lot more expensive than upload.

Small files have a disadvantage.  The number of put, get, list commands are huge, when you upload, download and list the directories up on S3.  One large zip file can save you hundreds and thousands of commands.  But these commands are cheap, like pennies per millions.  For me storage space is critical.

Unarchived files have another advantage.  When I clean up my drives, I may discover a picture or an album that belongs to the year 2008, but I have already uploaded the whole year to S3.  With sync, I only need to move the newly discovered stuff into the desired directory where they should have been.  Then I run the sync command.  Then only the new stuff will be uploaded.

1 comment:

likitha said...

Nice work, your blog is concept-oriented, kindly share more
AWS Online Training