How to download public data from Amazon EC2

Amazon EC2 contains wealth of public data. However, there are list of things that we need to do in order to download them. In this blog, we shall describe how to download M-Lab’s NPAD dataset.

  1. Create an EC2 account if you do not have one.
  2. Login to EC2
  3. Select the Amazon EC2 tab in the Management Console page
  4. amazon ec2 control panel
    Create a new instance (by clicking on Launch Instance Button), select Linux image. Create a Key Pairs for authentication purpose. Save the pem file in your local directory and change its permission to 400. (You do not need to do this key pairs creation if you already have it)
  5. After the instance is created, note its instance ID
  6. Point your browser to here and search for “m-lab” to locate m-lab’s NPAD dataset. Note the snapshot ID (for Linux).
  7. Go back to the EC 2 Management Console and select SnapShots (on the left panel under ELASTIC BLOCK STORES), under the viewing drop down list, select Public Snapshots and search for the M-Lab’s NAPD’s Linux snapshot ID.
  8. Create on the Create Volume button to create a volume on your instance that you have previously created. (e.g. /dev/sdz)
  9. Select the Instances option (on the left panel under INSTANCES), select the checkbox of the instance that you have previously created, click on Instance Actions and select Connect option. This will tell you how to connect to this instance using ssh.
  10. On your local machine, in the directory where your pem file is stored, enter the ssh command accordingly
  11. After the ssh is completed, mkdir data; mount /dev/sdz data
  12. Now data directory contains M-Lab’s NPAD data. (You can use scp to copy it accordingly).

Addendum

As I was using scp, I wish to have multiple clients downloading the massive data. I was forced to distribute the pem file. Instead of doing this, (I got this idea from my boss) we installed apache server on the Amazon EC2 instance and mounted the data volume to the document root of web server. Now, we can use wget to download the data.

Chrome Extension

The recent Haiti’s 7.0 earthquake has turned my “earthquake alert mode” on. I visit this web page pretty often to check for the major ones.

And I have started to use google chrome after its linux’s release is out. I always wanted to create a “hello world” chrome extension. So tonight (Friday night), I was able to find some quiet time to create a google-chrome extension to show recent earthquakes in CA and NV.

This extension contains 3 things. (place them in a directory.)

  1. manifest.json
  2. icon.png
  3. popup.html

This is the manifest.json

{
  "name": "Recent Earthquakes in CA and NV",
  "version": "1.0",
  "description": "Recent Earthquakes (> 3.0) in California-Nevada.",
  "browser_action": {
    "default_icon": "icon.png",
     "popup": "popup.html"
  },
  "permissions": [
    "http://quake.usgs.gov/", "tabs"
  ]
}

here is my icon.png

and the popup.html

<style>
body {
    min-width:500px;
    background-color: #FFFFCC;
}
div {
    font-size: 13px;
    font-weight: bold;
}

span.mag {
    font-size: 10px;
    color: red;
}
span {
    font-size: 10px;
    color: black;
}

a {
    font-size: 10px;
}
</style>

<script>
var hostUrl = 'http://quake.usgs.gov';
var req = new XMLHttpRequest();
req.open(
    "GET",
    "http://quake.usgs.gov/recenteqs/Quakes/quakes0.htm",
    true);
req.onload = showTable;
req.send(null);

function showUrl(anchor) {
    var urlLocation = '' + anchor.href;
    chrome.tabs.create({url:urlLocation});
}

function showTable() {
    var title = document.createElement("div");
    title.innerText = 'Recent Earthquakes in CA and NV';
    document.body.appendChild(title);
    var hr = document.createElement("hr");
    document.body.appendChild(hr);

    var body = req.responseText;
    var i = body.indexOf('<A HREF="/recenteqs/Maps/');
    body = body.substring(i);
    i = body.lastIndexOf('</PRE>');
    body = body.substring(0, i);

    i = body.indexOf('<STRONG>');
    while (i != -1) {
        body = body.substring(i);
        i = body.indexOf('<A HREF="');
        j = body.indexOf('">');
        var map = hostUrl + body.substring(i+9, j);

        j = body.indexOf('<FONT COLOR="#CC0000"');
        i = body.indexOf('>', j);
        body = body.substring(i+2);
        i = body.indexOf('<A ');
        var mag = body.substring(0, i-1);

        i = body.indexOf('<A HREF="');
        j = body.indexOf('">');
        var details = hostUrl + body.substring(i+9, j);

        body = body.substring(j+2);
        i = body.indexOf('</A>');
        var detailsLoc = body.substring(0, i);

        body = body.substring(i+5);
        i = body.indexOf('</FONT>');
        var mapLoc = body.substring(0, i);

        i = body.indexOf('</STRONG>');
        body = body.substring(i+9);
        i = body.indexOf('<STRONG>');

        var span = document.createElement("span");
        span.setAttribute('class', 'mag');
        span.innerText = 'Mag ' + mag;

        var detailLink = document.createElement("a");
        detailLink.setAttribute('href', details);
        detailLink.setAttribute('onClick', 'showUrl(this)');
        detailLink.appendChild(span);
        document.body.appendChild(detailLink);

        var span = document.createElement("span");
        span.innerText = ' . . ';
        document.body.appendChild(span);

        var mapLink = document.createElement("a");
        mapLink.setAttribute('href', map);
        mapLink.setAttribute('onClick', 'showUrl(this)');
        mapLink.innerText = mapLoc;
        document.body.appendChild(mapLink);

        var br = document.createElement("br");
        document.body.appendChild(br);
    }
}
</script>

Next, go to the chrome extension page and switch to developer mode to load unpack extension. Basically, just point to the directory that I have created. And this is how it looks like.

Entitlements Service

I am among the team of 4 engineers who work on designing and implementing the entitlements service for OpenSSO project for a year. And we are happy to see that entitlements service is the key feature in OpenSSO Express 9 (see our press release).

Followings are the key things that we have done (I wrote about 80% of the SDK so it is ok to blame me if things are not working)

  1. Improve scalability.
    We use indexes in LDAP server to locate policies for evaluation. This fast and heuristic approach helps us to eliminate most of unrelated policies and retrieve the related ones rapidly. Our latest test shows that we can handle over 1 million policies.
  2. Improve performance.
    We use multi-threading, re-entrance read-write lock (Java concurrency package) and LDAP indexes to speed up policy evaluation. The performance has improved significantly as we benchmark it against the previous policy evaluation engine. We are in the process of tuning the caching system. And better results would be expected.
  3. REST interfaces
    We have REST interfaces for policy evaluation and management. This means that non Java (such as PHP and Python) clients can make policy evaluation requests. State of art, Jersey is used for our REST implementation; and JSON is used too.
  4. User friendly UI
    My co-worker has developed a nice set of entitlements services UI using icefaces
  5. XACML support
    Able to import and export XACML.

Currently, we are working with technical writers on documenting entitlement services. You should be able to see these document early next year when OpenSSO Express 9 is shipped.

Well, year 2009 is coming to an end. It has been a fun year working on entitlements service (among my other doings). Sleepless nights, long meetings, ranting and hanging my head on the keyboard (just kidding) are part of the fun :-). I hope that 2010 will be more exciting.

My OpenSSO’s activity

Markmail is kind of cool as it archives emails. I was looking at it today to see OpenSSO’s archive and found this.
opensso-markmail

Gee, I have sent over 9000 emails to OpenSSO’s email aliases for the past 4 years.
That’s 2381 emails per year. About 9 emails per day (excluding weekends). So, 1 email per hour! (8-9 hours work day). :-)

Entitlements Service in OpenSSO

Finally, we have moved the entitlements service (an OpenSSO extension) to the products directory i.e. opensso/extensions/entitlements to opensso/products/ directory (if you are familiar with OpenSSO workspace).

Here is where you can find the source code for entitlements service.

  • console resources: opensso/products/federation/openfm/web
  • console source: opensso/products/federation/openfm/source
  • entitlement APIs and backend implementation: opensso/products/amserver/com/sun/identity/entitlement
  • CLI: opensso/products/amserver/com/sun/identity/cli/entitlement

Entitlements Service is released part of OpenSSO Express 8.0.

Update OpenSSO’s Configuration store password

There was a question posted to our internal (Sun) alias, and I think I should share it out for the benefit of OpenSSO community. The question was “How to update OpenSSO configuration store password?”

There are two types of datastore in OpenSSO server, namely the configuration datastore and user datastore. As the name suggest, the former stores the configuration data that are required by OpenSSO server to operate properly. The latter stores users related information, such as role, group and user entries.

It can be done through Command Line Interface or Administration Console.

The Command Line Interface way.

  1. Output the current server configuration XML
    ./ssoadm get-svrcfg-xml -u amadmin -f /tmp/fampass -s \
    http://owen1.red.iplanet.com:8080/opensso -o /tmp/serverconfig.xml

  2. Encrypt new password
    ./ampassword -e /tmp/newpassword

  3. edit /tmp/serverconfig.xml. replace admin password with the new encrypted password.
  4. Output the current server configuration XML
    ./ssoadm set-svrcfg-xml -u amadmin -f /tmp/fampass -s \
    http://owen1.red.iplanet.com:8080/opensso -X /tmp/serverconfig.xml

The Administration Console Interface way.

  1. Login as amadmin
  2. select Configuration tab
  3. select Sites and Servers tab
  4. Choose the server
  5. select Directory Configuration tab
  6. set the password

OpenSSO Java Runtime >= 1.5

We have recently modified our Java build target to 1.5. Hence, you need Java Runtime version 1.5 and above to run OpenSSO Client. The Java runtime version requirement for OpenSSO server remains unchanged i.e. 1.5.

This new client runtime requirement shall be in our next official release i.e. OpenSSO Express 8 which is scheduled to released in a couple of months from now.

Follow

Get every new post delivered to your Inbox.