s3cmd is good for Amazon S3. It's in python too, so you can also use it in Windows or wrap it up. There are two things critically missing - multipart upload and parallel upload.
There is a branch on s3cmd for "parallel" upload. I downloaded, only to find out that it doesn't improve anything. I think if you are uploading to the same account via the same IP, the limiting speed is the same. That's the reason the patch wasn't added to the main s3cmd.
The multipart upload is implemented in boto. The problem is that it's just a wrapper in python for the S3 (and all AWS) API. I downloaded some user scripts that uses boto. It works but only if you just want to upload a large file to the root of a bucket.
In the search for someone embedding boto, to my surprise it is gsutil - for google storage. I understand a few things.
Now that the giant is closing in, I wonder if the incentive of updating S3 tools is great. Google Storage is slightly more expansive in storage, about the same in bandwidth, but the promotion is 100 Gb free per month! The problem is that I have no idea if they will give me an account after I submitted an almost blank form with nothing to say, and how long I have to wait.
I understand why Amazon have the 1 year free promotion. And why they recently added the multipart upload and resumable download (?) API. Everybody will switch to GS without that.
Go to gsutil in Google code to download and install. They have good instructions for that. But it wouldn't work. Because developers' Ubuntu isn't the same Ubuntu as you and me!
After installed gsutil in your home directory, you cd into gsutil, edit the gsutil file and search for the line:
def SanityCheckXmlParser(cmd):
Then, right below the line, add a line with two spaces and the word
return
That will make it work. You also need to setup ~/.boto file like this
[Credentials]
aws_access_key_id = your id key
aws_secret_access_key = your secret key
[Boto]
debug = 0
num_retries = 10
I think if you don't have anything, gsutil will ask you, but for GS only, or not.
The documentation (or lack of it) says gs:// but it will also works for s3://, exactly the same, or not. The syntax is similar to s3cmd but more like Linux.
To copy the whole directory (and subs):
gsutil cp -R mydir s3://myuniquebucket
Large files are probably uploaded in 1M chunks (or not). To copy subdirectories, -R option is necessary for s3.
Unfortunately gsutil doesn't sync. But since gsutil is rather reliable with multipart upload, you don't need to sync that often. When you added something or just wanted to check, you can use s3cmd:
s3cmd sync mydir/ s3://myuniquebucket/mydir/
Note the slashes at the end, without which s3cmd will check more than that. If s3cmd detected that you need to upload a large file, you are better off to upload the file yourself by gsutil.
Now that's the complete solution. Three packages to do something simple. You may want to pay up for the commercial software.
Now I can backup my large encrypted containers.
Wednesday, April 27, 2011
Monday, April 25, 2011
Multipart Parallel upload to Amazon Simple Storage Service using boto
The comment I received on upload to S3 trigged me to do some research (or some googling). Life is short and we always need something to save time.
Yeah, in Windows I tried one of the free uploaders. It's like using file explorer after asking for your id key and secret key. It's fast and error free. I don't remember if I have large files in there, and I would think retries will be hidden from view.
What to look for is multipart (relatively new S3 protocol) and parallel upload, and of course automatic retry. Basically it's more like ftp, where you see the files are split into trunks and transferred in many parallel threads.
Multipart upload allow behavior like "resume". If you lost one file trunk you just need to retry that trunk. The transfer speed depends on both end of each connection, but the number of connections is not limited to one.
Most Windows uploaders have both in the commercial versions. You may or may not find it in the free versions. The problem is, I haven't booted into Windows for weeks now, and I have no intention to if not necessary.
In Linux it's a different story. Some good uploaders don't have Linux versions. s3cmd is good and free, but it doesn't support multipart nor parallelism. You can see that it upload one file at a time, with retry on errors.
After googling I came across the one and only one boto package. I also copied the one and only one complete script that demonstrate multipart and parallel upload.
First, boto is written in python, so you can also try it in Windows. Of course python is already in my Linux, and I didn't know if it was me who installed it. (I'm the only supervisor.) Also, boto is already installed, but I didn't know if it was me.
But the boto in my Linux is old. You need the latest boto, 2.x. You have to uninstall the current version using any one of the usual methods. Download boto from the official site. I unzip the .gz file using the archiver via the GUI. I don't even know the command to unzip it.
After some guessing, I figured out how to "install" the new version. You change directory to where the setup file is. Then use the command:
$python setup install
To try multipart upload, create myuniquebucket using the S3 control console, at the reduced redundancy rate, the cheaper rate. I think the rate dependents on your bucket, determined at creation.
Copy the "bio" script or any other script you want to try. I removed the reduced redundancy option in the script because it got complains from python.
To run boto, first you need to edit the ~/.boto file to declare the id key and secret key. See the official boto site. Download, modify the upload script somewhere. Then
$python s3multi.py bigfilename myuniquebucket
Maybe I post my configure file and the bio script later. This script works as intended but far from an end user product.
It is not obvious how you specify the size of the parts. If the files are over 10M you may think that the script hangs. I am getting 100K/s upload rate. You need some signs of life like in s3cmd, when the parts are large.
The script uses all your cores, or threads in a multithread CPU. For me it's two. I don't know if you can maximum my 10M cable modem connection by more parallelism.
The script uploads one file, in whole or in parts. You still need to call the script from other directory sync scripts. I do not know if the script will upload to subdirectories.
Yeah, in Windows I tried one of the free uploaders. It's like using file explorer after asking for your id key and secret key. It's fast and error free. I don't remember if I have large files in there, and I would think retries will be hidden from view.
What to look for is multipart (relatively new S3 protocol) and parallel upload, and of course automatic retry. Basically it's more like ftp, where you see the files are split into trunks and transferred in many parallel threads.
Multipart upload allow behavior like "resume". If you lost one file trunk you just need to retry that trunk. The transfer speed depends on both end of each connection, but the number of connections is not limited to one.
Most Windows uploaders have both in the commercial versions. You may or may not find it in the free versions. The problem is, I haven't booted into Windows for weeks now, and I have no intention to if not necessary.
In Linux it's a different story. Some good uploaders don't have Linux versions. s3cmd is good and free, but it doesn't support multipart nor parallelism. You can see that it upload one file at a time, with retry on errors.
After googling I came across the one and only one boto package. I also copied the one and only one complete script that demonstrate multipart and parallel upload.
First, boto is written in python, so you can also try it in Windows. Of course python is already in my Linux, and I didn't know if it was me who installed it. (I'm the only supervisor.) Also, boto is already installed, but I didn't know if it was me.
But the boto in my Linux is old. You need the latest boto, 2.x. You have to uninstall the current version using any one of the usual methods. Download boto from the official site. I unzip the .gz file using the archiver via the GUI. I don't even know the command to unzip it.
After some guessing, I figured out how to "install" the new version. You change directory to where the setup file is. Then use the command:
$python setup install
To try multipart upload, create myuniquebucket using the S3 control console, at the reduced redundancy rate, the cheaper rate. I think the rate dependents on your bucket, determined at creation.
Copy the "bio" script or any other script you want to try. I removed the reduced redundancy option in the script because it got complains from python.
To run boto, first you need to edit the ~/.boto file to declare the id key and secret key. See the official boto site. Download, modify the upload script somewhere. Then
$python s3multi.py bigfilename myuniquebucket
Maybe I post my configure file and the bio script later. This script works as intended but far from an end user product.
It is not obvious how you specify the size of the parts. If the files are over 10M you may think that the script hangs. I am getting 100K/s upload rate. You need some signs of life like in s3cmd, when the parts are large.
The script uses all your cores, or threads in a multithread CPU. For me it's two. I don't know if you can maximum my 10M cable modem connection by more parallelism.
The script uploads one file, in whole or in parts. You still need to call the script from other directory sync scripts. I do not know if the script will upload to subdirectories.
Saturday, April 23, 2011
Cloud storage revisited
The cheapest reliable cloud storage is Amazon S3. It's 14c for normal storage and 9c for reduced reliability, which is still way more reliable than your single hard drive.
The others can be seen as value added services, that's why it's more expensive, way more expensive. But for me sync in Linux is trivial so S3 will be the standard.
S3 is cheap but no way to compete with your own external USB drive.
My main consideration is all your photos and videos. Does it justify to store on S3 permanently, increasing every year. It doesn't matter if you lost a picture or two, a clip or two. Or does it matter if you lost an album at spring break? It's relatively expensive than a huge local drive, which is pretty safe nowadays. On the other hand they are priceless.
There are cheaper online storages much cheaper than S3. It's like a hard drive but attached remotely. The problem is that there's no guarantee how reliable it is, how they are managed, or if they will still be there next month or next year.
The only exception is Google. You can backup anything to Google Doc at $0.25/GB per year. It is competitive to a hard drive but it's very slow as I have reported. You can upload a large file to Google Doc and time it. I think Google have to throttle so all the people in the world can share it.
I thought I have found a solution, using S3 to backup and moving over permanent archives to Google Doc. Even GD is an order of magnitude slower than S3, and in turn an order of magnitude (or more) than external drive.
The software by SMEStorage supports many clouds, behaves like a dropbox, or file manager. You can also sync directories automatically as often as you like. Other than speed, you can use S3 or GD like using your own hard drive.
The catch is that, the free SMEStorage account has only 2GB storage and 1GB bandwidth per month. Over that, it's like any other paid solutions. For 1 or 2 GB, there are plenty of services for free to cater for different needs.
SMEStorage for S3 is not needed because the free software is good enough. The only good thing is that if SME is gone, the clouds like S3 or Google will still be there.
I still can't find anything to replace SMEStorage for Google Doc. But a lot of developer tools and API's are out there. May be no one is interested because GD is so slow. But probably I won't buy a lifetime $40 for the software, because I can buy a hard drive instead.
Using S3:
I have been using S3 to backup my things. The only command I use is:
s3cmd sycn /fullpath/ s3://bucketname/fullpath/
It's in a script and I don't even bother to parameterized it. I just edit it for different archives.
It's pretty slow, 100 KB/s, may be twice that overnight. I do not advice to pack the files into larger files. Typically S3 fails to upload in the middle for large files. s3cmd will attempt to repeat the upload until successful. So small files is good as you can see the progress, and any failure will only require upload again of a small file.
It's a surprise to many that the upload failure rate is pretty high, unlike any upload you ever encountered. It's that way a couple of years ago and it's the same now, at least to me. So freewares like S3 Fox [my bad, I mistakenly wrote S3 explorer], a Firefox extension, is totally useless. If you upload a large file, a movie, or many files, all the pictures, you will always miss some.
But there are also many free tools on Windows and Linux that can retry until the file successes. s3cmd is one of them. But still, if you have a large file, and repeated upload fails, the whole backup process may stall. So you will miss one overnight window to backup.
I don't know why S3 don't behave like ftp, when you can resume from the point of failure. That means the ftp protocol move files in trunks, while S3 don't, not in a way that you retry the fail bits. Perhaps it's the cloud feature. S3 have to handle all the uploads in the world, while ftp is end to end.
So the sync feature of s3cmd is important. I don't even look at the logs. In the morning I just run the script again. Only files not up there yet will be uploaded.
Normally you download the whole directory with a much faster speed, and compare the files to assure file integrity. (I don't know if the S3 protocol uses any checksums to guard against errors.) But for pictures, movie clips I don't bother as long as the file is up there. At most you lost one picture or clip. So you save half the bandwidth, which is the expensive part, and downloading is a lot more expensive than upload.
Small files have a disadvantage. The number of put, get, list commands are huge, when you upload, download and list the directories up on S3. One large zip file can save you hundreds and thousands of commands. But these commands are cheap, like pennies per millions. For me storage space is critical.
Unarchived files have another advantage. When I clean up my drives, I may discover a picture or an album that belongs to the year 2008, but I have already uploaded the whole year to S3. With sync, I only need to move the newly discovered stuff into the desired directory where they should have been. Then I run the sync command. Then only the new stuff will be uploaded.
The others can be seen as value added services, that's why it's more expensive, way more expensive. But for me sync in Linux is trivial so S3 will be the standard.
S3 is cheap but no way to compete with your own external USB drive.
My main consideration is all your photos and videos. Does it justify to store on S3 permanently, increasing every year. It doesn't matter if you lost a picture or two, a clip or two. Or does it matter if you lost an album at spring break? It's relatively expensive than a huge local drive, which is pretty safe nowadays. On the other hand they are priceless.
There are cheaper online storages much cheaper than S3. It's like a hard drive but attached remotely. The problem is that there's no guarantee how reliable it is, how they are managed, or if they will still be there next month or next year.
The only exception is Google. You can backup anything to Google Doc at $0.25/GB per year. It is competitive to a hard drive but it's very slow as I have reported. You can upload a large file to Google Doc and time it. I think Google have to throttle so all the people in the world can share it.
I thought I have found a solution, using S3 to backup and moving over permanent archives to Google Doc. Even GD is an order of magnitude slower than S3, and in turn an order of magnitude (or more) than external drive.
The software by SMEStorage supports many clouds, behaves like a dropbox, or file manager. You can also sync directories automatically as often as you like. Other than speed, you can use S3 or GD like using your own hard drive.
The catch is that, the free SMEStorage account has only 2GB storage and 1GB bandwidth per month. Over that, it's like any other paid solutions. For 1 or 2 GB, there are plenty of services for free to cater for different needs.
SMEStorage for S3 is not needed because the free software is good enough. The only good thing is that if SME is gone, the clouds like S3 or Google will still be there.
I still can't find anything to replace SMEStorage for Google Doc. But a lot of developer tools and API's are out there. May be no one is interested because GD is so slow. But probably I won't buy a lifetime $40 for the software, because I can buy a hard drive instead.
Using S3:
I have been using S3 to backup my things. The only command I use is:
s3cmd sycn /fullpath/ s3://bucketname/fullpath/
It's in a script and I don't even bother to parameterized it. I just edit it for different archives.
It's pretty slow, 100 KB/s, may be twice that overnight. I do not advice to pack the files into larger files. Typically S3 fails to upload in the middle for large files. s3cmd will attempt to repeat the upload until successful. So small files is good as you can see the progress, and any failure will only require upload again of a small file.
It's a surprise to many that the upload failure rate is pretty high, unlike any upload you ever encountered. It's that way a couple of years ago and it's the same now, at least to me. So freewares like S3 Fox [my bad, I mistakenly wrote S3 explorer], a Firefox extension, is totally useless. If you upload a large file, a movie, or many files, all the pictures, you will always miss some.
But there are also many free tools on Windows and Linux that can retry until the file successes. s3cmd is one of them. But still, if you have a large file, and repeated upload fails, the whole backup process may stall. So you will miss one overnight window to backup.
I don't know why S3 don't behave like ftp, when you can resume from the point of failure. That means the ftp protocol move files in trunks, while S3 don't, not in a way that you retry the fail bits. Perhaps it's the cloud feature. S3 have to handle all the uploads in the world, while ftp is end to end.
So the sync feature of s3cmd is important. I don't even look at the logs. In the morning I just run the script again. Only files not up there yet will be uploaded.
Normally you download the whole directory with a much faster speed, and compare the files to assure file integrity. (I don't know if the S3 protocol uses any checksums to guard against errors.) But for pictures, movie clips I don't bother as long as the file is up there. At most you lost one picture or clip. So you save half the bandwidth, which is the expensive part, and downloading is a lot more expensive than upload.
Small files have a disadvantage. The number of put, get, list commands are huge, when you upload, download and list the directories up on S3. One large zip file can save you hundreds and thousands of commands. But these commands are cheap, like pennies per millions. For me storage space is critical.
Unarchived files have another advantage. When I clean up my drives, I may discover a picture or an album that belongs to the year 2008, but I have already uploaded the whole year to S3. With sync, I only need to move the newly discovered stuff into the desired directory where they should have been. Then I run the sync command. Then only the new stuff will be uploaded.
Seedboxes using Amazon Elastic Compute Cloud
Seedboxes usually refers to virtual private servers (VPS) specially for bit-torrent uses. Without one, you can't get in private trackers, which are like private clubs for file sharing.
Even the most elite private trackers are short of unloaders. And they are paranoid of downloaders who are spies or content stealers. It's bullshit that you need to pay good money to get into it. Save your money for good HD cable service. Not that it's not worth it. But applying, interviewing, and begging is not my style. I think those running private trackers are making money in other means - such as bidding for membership, selling seedboxes.
Using AWS EC2 as seedboxes.
Availablity:
A micro-instant with 8Gb storage is free for one calendar year. You can leave it on 24/7. The normal price is pennies for an hour - in which you may be able to download a file the size of an HD movie or two, or even a whole season of TV.
Storage:
10Gb virtual hard disk space is free for a year. Additional storage is cheap and you don't need much more than that. Because it doesn't make sense to store all your files in the cloud, unless you want to host everything for a long time.
You can get a bigger drive or move content into S3 storage without paying for bandwidth (I think).
Bandwidth:
Amazon get you in the bandwidth. Currently 15 Gb in and out of EC2 is free per month. Over that, it's 10 cents / Gb in and 15 cents out. For download only, and then copy to your local drive, it's $0.25/Gb. But for private trackers a common requirement is that you also seed to a 1:1 ratio. That's $0.4/Gb. So it's up to $4 for 10 Gb, plus overheads like encryption.
Speed:
It's amazingly fast. I see speeds up to 13Mb/s for private trackers for hot torrents. Several Mb/s is usual. Typically for a public torrent in a broadband ISP connection, 400Kb/s is pretty fast and 100Kb/s is decent. So instead of hours, it's almost on demand - 10, 20 min download for a movie - make some coffee or pour some wine and you are there.
Verdict:
$4 for 10 Gb doesn't seem to make a lot of sense compared to conventional seedboxes. They usually have something like 100Gb storage with bandwidth included. EC2 can be compared to people going to the cinema once a week, and sometimes don't go at all for weeks or months. For occasionally users this means several dollars per month or much less. For now it's totally free.
Decent seedboxes cost about $20/month. You need to be heavy users to justify it. Or you can add premium channels on TV.
EC2 should be welcomed by private trackers. The speed as fast as anybody else. You can also leave it to seed forever for free (but holding up the disk space that otherwise you could download other stuff).
The problem is, the community is still hostile to EC2. Admins don't like Amazon IP's because it's as good as anonymous. And there are plenty of EC2 hackers, even using high power instants for cracking passwords. My own EC2 have people probing it non-stop. Some ban Amazon IP's altogether.
Many in the community are selling services like seedboxes, VPS, etc. They are naturally hostile to Amazon who are cutting into their business.
Other users look at EC2 as poorman's seedboxes. Actually for the same money, there are worse bargain basement seedboxes, but they are gone. The spec of EC2 is actually quite respectable - only that the bandwidth is rather expensive.
The usage:
Bit torrent client Transmission is particularly ease to use and setup. There's no need for any user setup on your desktop. Everything is controlled by your fav browser. All you need is to browse to
http://your-ec2-url:9091
Download the torrent file you want or just copy the url into Transmission. There's not much else to do. You can also set which files in the torrent pack to download, instead of the whole 4 seasons.
Once done, you can download the files to your local drive in your favorite ways - ftp, secure ftp, http download. In Linux you have additional simple commands like remote copy or secure copy via ssh.
The setup:
It's also pretty simple. Starting EC2 is covered many many times by many people. You can do it all using GUI via the browser. In Ubuntu, you just need one apt-get command to install the transmission package. It will even starts after that. That's it.
Normally you need to do a little editing on the config file. You need to add password, or whitelist your own IP or network to access the EC2 instant. And you want the downloads to your home directory instead of system directory with restricted access.
For some private trackers, you need to disable DHT, and PEX. After searching for manuals, this is:
dht-enable: false,
pex-enable: false,
Again the rest is here:
EC2 Micro Instance as a Remote Bittorrent Client
Even the most elite private trackers are short of unloaders. And they are paranoid of downloaders who are spies or content stealers. It's bullshit that you need to pay good money to get into it. Save your money for good HD cable service. Not that it's not worth it. But applying, interviewing, and begging is not my style. I think those running private trackers are making money in other means - such as bidding for membership, selling seedboxes.
Using AWS EC2 as seedboxes.
Availablity:
A micro-instant with 8Gb storage is free for one calendar year. You can leave it on 24/7. The normal price is pennies for an hour - in which you may be able to download a file the size of an HD movie or two, or even a whole season of TV.
Storage:
10Gb virtual hard disk space is free for a year. Additional storage is cheap and you don't need much more than that. Because it doesn't make sense to store all your files in the cloud, unless you want to host everything for a long time.
You can get a bigger drive or move content into S3 storage without paying for bandwidth (I think).
Bandwidth:
Amazon get you in the bandwidth. Currently 15 Gb in and out of EC2 is free per month. Over that, it's 10 cents / Gb in and 15 cents out. For download only, and then copy to your local drive, it's $0.25/Gb. But for private trackers a common requirement is that you also seed to a 1:1 ratio. That's $0.4/Gb. So it's up to $4 for 10 Gb, plus overheads like encryption.
Speed:
It's amazingly fast. I see speeds up to 13Mb/s for private trackers for hot torrents. Several Mb/s is usual. Typically for a public torrent in a broadband ISP connection, 400Kb/s is pretty fast and 100Kb/s is decent. So instead of hours, it's almost on demand - 10, 20 min download for a movie - make some coffee or pour some wine and you are there.
Verdict:
$4 for 10 Gb doesn't seem to make a lot of sense compared to conventional seedboxes. They usually have something like 100Gb storage with bandwidth included. EC2 can be compared to people going to the cinema once a week, and sometimes don't go at all for weeks or months. For occasionally users this means several dollars per month or much less. For now it's totally free.
Decent seedboxes cost about $20/month. You need to be heavy users to justify it. Or you can add premium channels on TV.
EC2 should be welcomed by private trackers. The speed as fast as anybody else. You can also leave it to seed forever for free (but holding up the disk space that otherwise you could download other stuff).
The problem is, the community is still hostile to EC2. Admins don't like Amazon IP's because it's as good as anonymous. And there are plenty of EC2 hackers, even using high power instants for cracking passwords. My own EC2 have people probing it non-stop. Some ban Amazon IP's altogether.
Many in the community are selling services like seedboxes, VPS, etc. They are naturally hostile to Amazon who are cutting into their business.
Other users look at EC2 as poorman's seedboxes. Actually for the same money, there are worse bargain basement seedboxes, but they are gone. The spec of EC2 is actually quite respectable - only that the bandwidth is rather expensive.
The usage:
Bit torrent client Transmission is particularly ease to use and setup. There's no need for any user setup on your desktop. Everything is controlled by your fav browser. All you need is to browse to
http://your-ec2-url:9091
Download the torrent file you want or just copy the url into Transmission. There's not much else to do. You can also set which files in the torrent pack to download, instead of the whole 4 seasons.
Once done, you can download the files to your local drive in your favorite ways - ftp, secure ftp, http download. In Linux you have additional simple commands like remote copy or secure copy via ssh.
The setup:
It's also pretty simple. Starting EC2 is covered many many times by many people. You can do it all using GUI via the browser. In Ubuntu, you just need one apt-get command to install the transmission package. It will even starts after that. That's it.
Normally you need to do a little editing on the config file. You need to add password, or whitelist your own IP or network to access the EC2 instant. And you want the downloads to your home directory instead of system directory with restricted access.
For some private trackers, you need to disable DHT, and PEX. After searching for manuals, this is:
dht-enable: false,
pex-enable: false,
Again the rest is here:
EC2 Micro Instance as a Remote Bittorrent Client
Wednesday, April 20, 2011
Running bittorrent on Amazon EC2
EC2 Micro Instance as a Remote Bittorrent Client
I know it will be easy but not this easy in Linux. The motivation is that it's private. Nobody knows who is downloading. Amazon doesn't provide any service other than a computer so I doubt if they maintain any logs.
The other thing is that it's fast. You can download anytime without interfering with your other work. It's up to 700 kb/s and a 1G file downloads in less than 1 hr, probably much less. Downloading from EC2 to your local computer is like 600 kb/s. It all varies.
Price wise it's free for storage unless you store the file up there. You have some 15 G/b in and about the same out free per month. More than that it's in pennies. For a 1 Gb file, it will become a few GB on your bandwidth. But because it's so fast, the upload is about 10% of the file. So the total bandwidth is like 2.2 plus the encryption / packaging overheads.
Even if it's not free, it's less than a penny to turn on the EC2. It's 10/15 pennies per GB in / out. So it beats any VPN / Bit torrent / Usenet / File download plans, unless you are a heavy user.
There a good step by step guide to start the EC2 service at the link. The steps are update except for the Security Group. Now there's no from/to Port, just Port. So you just need to open the port 9091 and the range 49152-65535. (plus ssh at port 22). The range is for random port numbers, you need at least one at 51413.
Now you can control the bit torrent client from your own browser, neat!
Now for Linux/Ubuntu, the copy command is built into ssh.
scp -i yourkeyfile.pem ubuntu@yourpublicdns.com:Downloads/yourfile .
The keyfile, public DNS are the same as you start the ssh. Downloads is for Transmission you setup in the linked instructions, under your home directory. You can also use the internal IP address of your EC2 instead of the public DNA, and also in your browser.
I know it will be easy but not this easy in Linux. The motivation is that it's private. Nobody knows who is downloading. Amazon doesn't provide any service other than a computer so I doubt if they maintain any logs.
The other thing is that it's fast. You can download anytime without interfering with your other work. It's up to 700 kb/s and a 1G file downloads in less than 1 hr, probably much less. Downloading from EC2 to your local computer is like 600 kb/s. It all varies.
Price wise it's free for storage unless you store the file up there. You have some 15 G/b in and about the same out free per month. More than that it's in pennies. For a 1 Gb file, it will become a few GB on your bandwidth. But because it's so fast, the upload is about 10% of the file. So the total bandwidth is like 2.2 plus the encryption / packaging overheads.
Even if it's not free, it's less than a penny to turn on the EC2. It's 10/15 pennies per GB in / out. So it beats any VPN / Bit torrent / Usenet / File download plans, unless you are a heavy user.
There a good step by step guide to start the EC2 service at the link. The steps are update except for the Security Group. Now there's no from/to Port, just Port. So you just need to open the port 9091 and the range 49152-65535. (plus ssh at port 22). The range is for random port numbers, you need at least one at 51413.
Now you can control the bit torrent client from your own browser, neat!
Now for Linux/Ubuntu, the copy command is built into ssh.
scp -i yourkeyfile.pem ubuntu@yourpublicdns.com:Downloads/yourfile .
The keyfile, public DNS are the same as you start the ssh. Downloads is for Transmission you setup in the linked instructions, under your home directory. You can also use the internal IP address of your EC2 instead of the public DNA, and also in your browser.
Saturday, April 16, 2011
Compare secure online and offline storages for your digital life
I'm talking about storing the videos (and pictures) permanently for your whole life.
Since a decade ago, hard drives are so reliable that parallel RAID drives aren't really necessary. You need location diversity at the least. I looked at all the online storage options and they are no match for an external hard drive.
A 1 TB (1024 GB? A Terrible Blob anyway) external drive (USB) cost about $60 for the bulkier models. Whereas Amazon S3 cost 14 cents per GB per month. For the same money you can only have 36 GB in S3 for a year!
The speed of USB 2 drives are 480 Mb/s, while upload to S3 is about 100 Kb/s, a thousand times slower?
So the first line of defense is get a Terrible Blob and put it not next to your computer. For your home computer put the TB it at work and vice versa. I already have a fire proof (30 min or an hour) box for computer media. Now the media are obsolete (except for memory cards). I can put in an external drive with space to spare. Then I'll wrap it in plastic so it will survive fire and putting off the fire. You may not find a reliable airtight container to hold the drive, but making the whole thing float is easier. If you go through with that you can put a GPS inside too!
This is quite a permanent solution, part of. When the time comes, your fire proof box can hold many TB's and solid state drives/cards to replace your current TB.
The other solution is similar but very close to online backup. You buddy up with someone, or buddy up your home computer with your company computer. Get external drives or even cheaper internal drives. There are free software to do the backup between the two sites as if they are attached local drives. There are also encryption so your buddy can't see what's in your drive.
I still like the idea of online cloud backup. But you can't get away with less than 14 cent / GB / month. If that's not your only copy, you can opt for 9 cent / GB, which is already 400 times more reliable than your own disk drive, Amazon claims.
The big 3, others being Google and Microsoft, are competing for the Amazon business, so all the other clouds services will be value-added providers, that is, more expensive, usually a lot more. ADrive will give you 50GB for free but how long will it last? I can tell you that, having uploaded 50GB, you are as good as married to those guys, in order not to have to upload again.
A surprise contender seems to be Google Docs! Seems to be more than perfect except for the upload speed.
First, Google Picasa Web Album is useless. You cannot have nested folders or albums. You just can't put your whole life into it.
The Picasa software running on your computer is a rather good file management system, if only for your pictures. The Linux version works fine but does not deal with videos yet! You download via your browser and your Linux version will install it for you, with less hassle than in Windows. Picasa is very good in locating deeply nested pictures in copies of all your old hard drive images. So you will not missed in deleting some useless images. Otherwise it works fine like any file manager.
The problem with Picasa is that it does not sync with your file system. It will sync, on it's own will. If you understand that and are not confused then it's a good additional tool.
Picasa only sees images and nothing else. One good come out of it is that it's trivial to separate the images into a separate folder and move the image folder somewhere else. Then you can sync the pure image folders with albums in Picasa Web Albums. Of course you can still sync your mixed folders, but having to backup the same folder into PWA and something else is a nightmare.
Now back to Google Doc. Actually you can "copy" over your nested directories or your whole hard drive for that matter. It's video and picture aware, making you wonder why you still have PWB around? You can view pictures and stream videos, and share like Youtube.
Google Doc provides 1GB of free storage. Additional are at 25 cents / GB / year, not per month !!! So this is more like a permanent archive for your digital life. They didn't say about the reliability, in contrast to Amazon. But everybody is using Google online services and I never heard of any data lost.
The fine prints are even surprising. You can increase storage anytime and your rate prorated. It's not surprising. But if you reduce storage or unprescribed altogether, it looks like Google still keep all your data permanently, but you cannot edit or add to it. So it's more like paying $5 once and you get 20GB permanently! Of course Google can terminate and change terms of service, but this is the terms at the moment. There's no guarantee but obviously Google is not aiming to get money from storage. They want to get you to use their service and sell advertising.
Even more surprising is that pictures smaller than 800x800, and videos shorter than 15 min, are always free, that do not appear on your account. So it appears to be too good to be true. Forget about the pictures. Cutting videos into 15 min pieces is trivial, and nowadays everybody only take interesting 1-minute clips. But it seems to be true. I uploaded a video and it doesn't show up in the total usage. You can download the original, that you cannot do in PWA.
There's no worry about backup or upload/download software. It's much easier than I found out in the last few posts. For Linux you have SMEStorage software for free. You download it as in Windows, and it will install itself. You register, login and start the software via GUI. Then you have a 1TB drive attached to your PC. You can do whatever as a local drive, and everything goes to Google Doc. And you can also access it online. You can use the Linux file manager to do whatever to the files and folders like in Windows, and use any tools on Linux that can be used on ordinary local files. Actually the file manager is much better than Windows because it can have tabs, and you know it takes IE a long time to catch up with the competition.
SME software includes how to automatically sync the remote directories to your local directories, as frequently as you want. I'm pretty sure any backup software will work on the remote directories the same as local.
SMEStorage not only support Google Docs, but many storage services like Amazon S3, Microsoft Azure, Google Storage for developers, and many other add-on providers. Windows is everybody's bread and butter so anyone can use them. But I'm sure there are plenty of other software developers as good as SME, but they just don't bother with Linux.
For the free SME service, you can pick two storage providers, and multiple accounts on each I think. For Amazon you need to have different payment cards and perhaps phones, but for Google I wonder how many free accounts you can have.
As I have said at first, Google Doc is very slow. But everybody know it when you upload and work in Google services, same as attaching a large file to your email. It turns out to be 10 Kb/s (<20), an order of magnitude slower than Amazon's speed of 100 Kb/s (<150).
So you need overnight to upload 1GB into Google Doc. And you probably need 3 months to upload 100 GB. That's why PWA doesn't support much fancy things, and Google never worried that users will cheat their disk space. If you manage to upload 100GB to Google, you are likely to be chained to them for life.
If you have 10 to 100 GB of critical data to backup frequently, Amazon S3 is a no brainner. For videos and pictures Google Doc is as permanent as it gets, close to free if you ever need to pay for it. But if you can spare $60, an extra 1TB drive may be all that some people need.
Since a decade ago, hard drives are so reliable that parallel RAID drives aren't really necessary. You need location diversity at the least. I looked at all the online storage options and they are no match for an external hard drive.
A 1 TB (1024 GB? A Terrible Blob anyway) external drive (USB) cost about $60 for the bulkier models. Whereas Amazon S3 cost 14 cents per GB per month. For the same money you can only have 36 GB in S3 for a year!
The speed of USB 2 drives are 480 Mb/s, while upload to S3 is about 100 Kb/s, a thousand times slower?
So the first line of defense is get a Terrible Blob and put it not next to your computer. For your home computer put the TB it at work and vice versa. I already have a fire proof (30 min or an hour) box for computer media. Now the media are obsolete (except for memory cards). I can put in an external drive with space to spare. Then I'll wrap it in plastic so it will survive fire and putting off the fire. You may not find a reliable airtight container to hold the drive, but making the whole thing float is easier. If you go through with that you can put a GPS inside too!
This is quite a permanent solution, part of. When the time comes, your fire proof box can hold many TB's and solid state drives/cards to replace your current TB.
The other solution is similar but very close to online backup. You buddy up with someone, or buddy up your home computer with your company computer. Get external drives or even cheaper internal drives. There are free software to do the backup between the two sites as if they are attached local drives. There are also encryption so your buddy can't see what's in your drive.
I still like the idea of online cloud backup. But you can't get away with less than 14 cent / GB / month. If that's not your only copy, you can opt for 9 cent / GB, which is already 400 times more reliable than your own disk drive, Amazon claims.
The big 3, others being Google and Microsoft, are competing for the Amazon business, so all the other clouds services will be value-added providers, that is, more expensive, usually a lot more. ADrive will give you 50GB for free but how long will it last? I can tell you that, having uploaded 50GB, you are as good as married to those guys, in order not to have to upload again.
A surprise contender seems to be Google Docs! Seems to be more than perfect except for the upload speed.
First, Google Picasa Web Album is useless. You cannot have nested folders or albums. You just can't put your whole life into it.
The Picasa software running on your computer is a rather good file management system, if only for your pictures. The Linux version works fine but does not deal with videos yet! You download via your browser and your Linux version will install it for you, with less hassle than in Windows. Picasa is very good in locating deeply nested pictures in copies of all your old hard drive images. So you will not missed in deleting some useless images. Otherwise it works fine like any file manager.
The problem with Picasa is that it does not sync with your file system. It will sync, on it's own will. If you understand that and are not confused then it's a good additional tool.
Picasa only sees images and nothing else. One good come out of it is that it's trivial to separate the images into a separate folder and move the image folder somewhere else. Then you can sync the pure image folders with albums in Picasa Web Albums. Of course you can still sync your mixed folders, but having to backup the same folder into PWA and something else is a nightmare.
Now back to Google Doc. Actually you can "copy" over your nested directories or your whole hard drive for that matter. It's video and picture aware, making you wonder why you still have PWB around? You can view pictures and stream videos, and share like Youtube.
Google Doc provides 1GB of free storage. Additional are at 25 cents / GB / year, not per month !!! So this is more like a permanent archive for your digital life. They didn't say about the reliability, in contrast to Amazon. But everybody is using Google online services and I never heard of any data lost.
The fine prints are even surprising. You can increase storage anytime and your rate prorated. It's not surprising. But if you reduce storage or unprescribed altogether, it looks like Google still keep all your data permanently, but you cannot edit or add to it. So it's more like paying $5 once and you get 20GB permanently! Of course Google can terminate and change terms of service, but this is the terms at the moment. There's no guarantee but obviously Google is not aiming to get money from storage. They want to get you to use their service and sell advertising.
Even more surprising is that pictures smaller than 800x800, and videos shorter than 15 min, are always free, that do not appear on your account. So it appears to be too good to be true. Forget about the pictures. Cutting videos into 15 min pieces is trivial, and nowadays everybody only take interesting 1-minute clips. But it seems to be true. I uploaded a video and it doesn't show up in the total usage. You can download the original, that you cannot do in PWA.
There's no worry about backup or upload/download software. It's much easier than I found out in the last few posts. For Linux you have SMEStorage software for free. You download it as in Windows, and it will install itself. You register, login and start the software via GUI. Then you have a 1TB drive attached to your PC. You can do whatever as a local drive, and everything goes to Google Doc. And you can also access it online. You can use the Linux file manager to do whatever to the files and folders like in Windows, and use any tools on Linux that can be used on ordinary local files. Actually the file manager is much better than Windows because it can have tabs, and you know it takes IE a long time to catch up with the competition.
SME software includes how to automatically sync the remote directories to your local directories, as frequently as you want. I'm pretty sure any backup software will work on the remote directories the same as local.
SMEStorage not only support Google Docs, but many storage services like Amazon S3, Microsoft Azure, Google Storage for developers, and many other add-on providers. Windows is everybody's bread and butter so anyone can use them. But I'm sure there are plenty of other software developers as good as SME, but they just don't bother with Linux.
For the free SME service, you can pick two storage providers, and multiple accounts on each I think. For Amazon you need to have different payment cards and perhaps phones, but for Google I wonder how many free accounts you can have.
As I have said at first, Google Doc is very slow. But everybody know it when you upload and work in Google services, same as attaching a large file to your email. It turns out to be 10 Kb/s (<20), an order of magnitude slower than Amazon's speed of 100 Kb/s (<150).
So you need overnight to upload 1GB into Google Doc. And you probably need 3 months to upload 100 GB. That's why PWA doesn't support much fancy things, and Google never worried that users will cheat their disk space. If you manage to upload 100GB to Google, you are likely to be chained to them for life.
If you have 10 to 100 GB of critical data to backup frequently, Amazon S3 is a no brainner. For videos and pictures Google Doc is as permanent as it gets, close to free if you ever need to pay for it. But if you can spare $60, an extra 1TB drive may be all that some people need.
Mother board compability with Linux
I always failed to read my CPU temp from the motherboard in Linux.
The OS has little to do with how the mother board is designed in hardware. There is the memory and there is the CPU. But for personal computers, there are some communications between the OS and the motherboard. Monitoring the CPU temperature and reading from the motherboard is one of them. The others are power management such as allowing the OS to control the CPU and fan speeds.
These are the lowest level of hardware drivers. It's the manufacturers job to provide drivers and to pick some standards. However, even there are more Linux users they are still less than 1%, and most of them aren't those typically using Windows. Therefore most manufacturers just don't provide drivers for Linux. The Linux developers do it for free with the manufacturer's documents. There are also some defacto standards due to the monopoly of some critical chip manufacturers.
Devices, like printers, drivers used to be the problems with Linux. Even Vista can't be compatible to XP. But nowadays most devices should work ... to some extend. Like my printer, it's working fine on Linux, until the ink goes out. Then I realized that the ink monitor and cleaning program aren't available in Linux. I have to go back to Windows to see which ink cartridge is out!
My problem is out there for years, when my motherboard is between old and new. The old way of doing that in Linux is out and the new way hasn't work yet. So they still have the problem and nobody is going to do anything for my motherboard. There is a fix, it works for some people and I don't know how they do it.
I can always change to a new motherboard or a new nettop. But the whole point is that my PC is still way more powerful than nettops. Moving away from Windows is like a major upgrade in speed. And the current setup will last a couple more years. That is a certain. The only catch is that my CPU may overheat someday without me knowing.
So when switching to Linux a dual-boot is a very good idea. And if you are worrying about your hardware or buying something new, check the compatibility list. For the portable computers, check your power management. I'm sure standards are well established, or you can just get a Linux one. The Chrome OS is coming out. It may be good for portables but you don't want to do things differently on your top and your book.
The OS has little to do with how the mother board is designed in hardware. There is the memory and there is the CPU. But for personal computers, there are some communications between the OS and the motherboard. Monitoring the CPU temperature and reading from the motherboard is one of them. The others are power management such as allowing the OS to control the CPU and fan speeds.
These are the lowest level of hardware drivers. It's the manufacturers job to provide drivers and to pick some standards. However, even there are more Linux users they are still less than 1%, and most of them aren't those typically using Windows. Therefore most manufacturers just don't provide drivers for Linux. The Linux developers do it for free with the manufacturer's documents. There are also some defacto standards due to the monopoly of some critical chip manufacturers.
Devices, like printers, drivers used to be the problems with Linux. Even Vista can't be compatible to XP. But nowadays most devices should work ... to some extend. Like my printer, it's working fine on Linux, until the ink goes out. Then I realized that the ink monitor and cleaning program aren't available in Linux. I have to go back to Windows to see which ink cartridge is out!
My problem is out there for years, when my motherboard is between old and new. The old way of doing that in Linux is out and the new way hasn't work yet. So they still have the problem and nobody is going to do anything for my motherboard. There is a fix, it works for some people and I don't know how they do it.
I can always change to a new motherboard or a new nettop. But the whole point is that my PC is still way more powerful than nettops. Moving away from Windows is like a major upgrade in speed. And the current setup will last a couple more years. That is a certain. The only catch is that my CPU may overheat someday without me knowing.
So when switching to Linux a dual-boot is a very good idea. And if you are worrying about your hardware or buying something new, check the compatibility list. For the portable computers, check your power management. I'm sure standards are well established, or you can just get a Linux one. The Chrome OS is coming out. It may be good for portables but you don't want to do things differently on your top and your book.
Thursday, April 14, 2011
The switch to Linux
I was wrong to say that I won't blog here again. For things that will not possibly identify me personally, I don't see what will stop me from blogging something else. See how it goes.
I'm always the sort of unwilling pioneer. I was exposed to email long before most. Anybody can see that it's the future. I should have find a job doing that for whatever means, instead of complaining that I have to write a letter on paper every time. The difference is, Bill make sure that it get to everybody's desktop. But it will still takes decades now for the post office to disappear in its present from.
I'm sure they have cable TV, but the cable company did a lot of digging and other work for me when they first delivery Internet and phone to my street. They still call me a decade after to ask my opinion when they launch some new services.
I was under a lot of pressure to buy my last watch. What a relief when I heard that people with a smart phone don't wear watches! I was ahead of the era of smart phones by a few years at least.
Now I'm one of the 1% who switches to a Linux desktop. Actually I could be 1% of 1% if I wanted to be a pioneer. For example, the last reason that I didn't switch is because of Hotspot Shield. Most people having a notebook computer should have a VPN when they do anything in hotels or cafes. HSS is fast and free, and it's trivial to get rid of the ads. But it didn't support Linux. For this non-reason I put up with Windows longer. It only cost a few dollars per month or even per year. But I stuck on the free service. Not anymore.
I can say a lot of good reasons to switch but not right now. Today's topic is why I switch because I like free. And Amazon web services is free (for a calendar year).
You have to understand that trivial things necessary on your computer, perfected decades ago, became integrated part of any operating system, like Unix. But when Windows came along, all hell broke lose. This tiny "computers" have only 256K of memory! And a floppy disk!
Now Windows became the dinosaur, while Linux can do what Unix mini computers could do in a fraction of the resources of Windows, and do it faster.
Take backup. To backup to Amazon S3, you only need one command:
s3cmd sync yourfolder s3://yourbucket
Only changed or additional files will be uploaded to S3 - synchronization. You may prefer GUI but immediately you will understand that why GUI is an obstacle. The command itself is flexible, allowing any folders you want. And for different destinations, you can have another command. You can put all your customized locations in one single text file and name it "syn3", like shortening URL's. And if you copy the file into the daily jobs directory, you will never worry losing another file longer than one day old.
Nowadays installing software is easier than Windows. You don't even need to go through the browser to "download" ! If you try to run s3cmd, Ubuntu will tell you that you need to install it first by the command:
sudo apt-get install s3cmd
It's a single step to download and install, and then you can run it right away. (There's a interactive program to run first to get your S3 access keys.) Whereas in Windows you have to start the browser, find the download URL, wait for the download manager, and then run the installation program.
For me this is (one of) the killer application. And this is the only way to upload to S3. Because S3 is actually very slow (~100 kb/s) and unreliable. You always fail in the middle of uploading a large backup file. If sync fails, it will only be the current file. And if you keep syncing, all the files will be there eventually. There are similar things is Windows but you have to pay for it, or it won't be that convenient, and because of GUI, it can't be that easily customized.
And for a slightly conventional task, backing up your critical files into a separate drive daily. I have been struggling to do this for years in Windows. But in Linux, this is trivial:
rsync sourcefolder destinationfolder
Only the new and modified files will be copied. This is part of the OS 10 to 30 years ago! You can see the problem with Microsoft. First they have to strip off everything to fit in the tiny IBM box. They could have done a better job providing backup software, but they can't kill third parties for commercial reasons. You can pay but companies do that sort of business are desperate. Amateur developers come and go.
The other thing is, if you search there are lots of backup programs for Windows, many from 1st year Computer Science students, and some may crash your disk. But in Linux, the developers have some standards, even that the software is free. The bad ones will not be recommended or advertised, and the few good ones will find their way into depositories, where you can easily download and install, and even become installed by default, or part of the OS.
I'm always the sort of unwilling pioneer. I was exposed to email long before most. Anybody can see that it's the future. I should have find a job doing that for whatever means, instead of complaining that I have to write a letter on paper every time. The difference is, Bill make sure that it get to everybody's desktop. But it will still takes decades now for the post office to disappear in its present from.
I'm sure they have cable TV, but the cable company did a lot of digging and other work for me when they first delivery Internet and phone to my street. They still call me a decade after to ask my opinion when they launch some new services.
I was under a lot of pressure to buy my last watch. What a relief when I heard that people with a smart phone don't wear watches! I was ahead of the era of smart phones by a few years at least.
Now I'm one of the 1% who switches to a Linux desktop. Actually I could be 1% of 1% if I wanted to be a pioneer. For example, the last reason that I didn't switch is because of Hotspot Shield. Most people having a notebook computer should have a VPN when they do anything in hotels or cafes. HSS is fast and free, and it's trivial to get rid of the ads. But it didn't support Linux. For this non-reason I put up with Windows longer. It only cost a few dollars per month or even per year. But I stuck on the free service. Not anymore.
I can say a lot of good reasons to switch but not right now. Today's topic is why I switch because I like free. And Amazon web services is free (for a calendar year).
You have to understand that trivial things necessary on your computer, perfected decades ago, became integrated part of any operating system, like Unix. But when Windows came along, all hell broke lose. This tiny "computers" have only 256K of memory! And a floppy disk!
Now Windows became the dinosaur, while Linux can do what Unix mini computers could do in a fraction of the resources of Windows, and do it faster.
Take backup. To backup to Amazon S3, you only need one command:
s3cmd sync yourfolder s3://yourbucket
Only changed or additional files will be uploaded to S3 - synchronization. You may prefer GUI but immediately you will understand that why GUI is an obstacle. The command itself is flexible, allowing any folders you want. And for different destinations, you can have another command. You can put all your customized locations in one single text file and name it "syn3", like shortening URL's. And if you copy the file into the daily jobs directory, you will never worry losing another file longer than one day old.
Nowadays installing software is easier than Windows. You don't even need to go through the browser to "download" ! If you try to run s3cmd, Ubuntu will tell you that you need to install it first by the command:
sudo apt-get install s3cmd
It's a single step to download and install, and then you can run it right away. (There's a interactive program to run first to get your S3 access keys.) Whereas in Windows you have to start the browser, find the download URL, wait for the download manager, and then run the installation program.
For me this is (one of) the killer application. And this is the only way to upload to S3. Because S3 is actually very slow (~100 kb/s) and unreliable. You always fail in the middle of uploading a large backup file. If sync fails, it will only be the current file. And if you keep syncing, all the files will be there eventually. There are similar things is Windows but you have to pay for it, or it won't be that convenient, and because of GUI, it can't be that easily customized.
And for a slightly conventional task, backing up your critical files into a separate drive daily. I have been struggling to do this for years in Windows. But in Linux, this is trivial:
rsync sourcefolder destinationfolder
Only the new and modified files will be copied. This is part of the OS 10 to 30 years ago! You can see the problem with Microsoft. First they have to strip off everything to fit in the tiny IBM box. They could have done a better job providing backup software, but they can't kill third parties for commercial reasons. You can pay but companies do that sort of business are desperate. Amateur developers come and go.
The other thing is, if you search there are lots of backup programs for Windows, many from 1st year Computer Science students, and some may crash your disk. But in Linux, the developers have some standards, even that the software is free. The bad ones will not be recommended or advertised, and the few good ones will find their way into depositories, where you can easily download and install, and even become installed by default, or part of the OS.
Subscribe to:
Posts (Atom)