How to download public data from Amazon EC2

Amazon EC2 contains wealth of public data. However, there are list of things that we need to do in order to download them. In this blog, we shall describe how to download M-Lab’s NPAD dataset.

  1. Create an EC2 account if you do not have one.
  2. Login to EC2
  3. Select the Amazon EC2 tab in the Management Console page
  4. amazon ec2 control panel
    Create a new instance (by clicking on Launch Instance Button), select Linux image. Create a Key Pairs for authentication purpose. Save the pem file in your local directory and change its permission to 400. (You do not need to do this key pairs creation if you already have it)
  5. After the instance is created, note its instance ID
  6. Point your browser to here and search for “m-lab” to locate m-lab’s NPAD dataset. Note the snapshot ID (for Linux).
  7. Go back to the EC 2 Management Console and select SnapShots (on the left panel under ELASTIC BLOCK STORES), under the viewing drop down list, select Public Snapshots and search for the M-Lab’s NAPD’s Linux snapshot ID.
  8. Create on the Create Volume button to create a volume on your instance that you have previously created. (e.g. /dev/sdz)
  9. Select the Instances option (on the left panel under INSTANCES), select the checkbox of the instance that you have previously created, click on Instance Actions and select Connect option. This will tell you how to connect to this instance using ssh.
  10. On your local machine, in the directory where your pem file is stored, enter the ssh command accordingly
  11. After the ssh is completed, mkdir data; mount /dev/sdz data
  12. Now data directory contains M-Lab’s NPAD data. (You can use scp to copy it accordingly).

Addendum

As I was using scp, I wish to have multiple clients downloading the massive data. I was forced to distribute the pem file. Instead of doing this, (I got this idea from my boss) we installed apache server on the Amazon EC2 instance and mounted the data volume to the document root of web server. Now, we can use wget to download the data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: