EC2 at XaoP

At XaoP, we have recently started checking out Amazon’s EC2. Although
the use of virtualization technology is hardly new in hosting
technologies, Amazon’s take on it offers extra flexibility for
developers to exploit.

The principle

The principle behind virtual hosting is simple: you get what looks
like a machine of your own to do with as you please, but in reality
that machine runs inside another machine that you share with other
people. This technique allows hosting companies to provide you with
fully customizable servers for a very affordable price because they
need less hardware to support the same number of servers.

Amazon EC2 takes it that one step further and gives this idea an
interesting twist. Instead of getting one virtual host, you get one
image (an Amazon Machine Image or AMI) that you can customize just
like a virtual host, and that you can instantiate as many times as you
like. In classic setups — physical hosts or virtual hosts — adding
extra servers is always somewhat of a chore because you need to set it
up again, taking care to have it configured as close as possible to
that of your other servers to make problems with deployment as
unlikely as possible. With EC2, you can add instances easily and with
a certainty that they are all identical.

Each instance gets most of its state from the AMI, such as the
installed software. The partitions these are on are limited in size
though. For storing your data, you get a much larger partition. This
is where you’d usually store your database data and such. The AMIs are
stored on S3 and can be run from there. The instances of these AMIs
are ephemeral though; unless the instance is bundled as an AMI before
shut-down, all state is lost. This not only applies to the data gotten
from the AMI that was booted, but also everything on the data
partition.

Opportunities

The obvious advantage of EC2 is that you can scale up your server
cloud when for instance your web application gains traction. A less
obvious observation is that you can also scale it down. It is an
elastic cloud that can expand and shrink to fit your current need
exactly, and it can do so fast.

There are many applications that have highly varying needs for
computing power. A simple example of this is a document management system
that is used only in a local office, and is heavily used during office
hours and very sporadically used outside these hours. You can easily
shave off 50% of your hosting costs by running only a fifth of your
EC2 instances outside business hours.

A more pronounced example is that of our own DRP system. Mostly, the
DRP application will be consulted for shipments plans. Periodically,
the planner will need to run the heavy tasks to calculate these
shipment plans and all data required to do so. This typically happens
once a week, on Friday or over the weekend. Outside these hours, the
fire power needed to run these heavy tasks is wasted, so again it pays
to shut down the hosts running these tasks at that time.

Part of our core business are document migrations. While some
migrations consist of just dump and import, most migrations require
some transformation or enrichment of the attached meta-data. Because
this process is still largely manual — at the least in reviewing the
meta-data after enrichment and after import — it is usually performed
in batches and we want to throw the computing power at it only when
needed. EC2 is ideal for this.

These examples are all somewhat predictable in its periodicity and the
starting up and shutting down of extra services can be hard-coded or
initiated manually. This is not always the case. For instance a
document management system that stores monthly income statements will
usually have heavy traffic when these statements are published and
consulted shortly afterwards, but because of weekends and holidays,
the exact day of publishing can vary somewhat. An intelligent system
can dynamically adapt to the extra traffic and start extra hosts when
needed and shut them down afterwards.

Starting extra hosts can still take some time, so the ultimate system
would try to predict the load and start extra hosts in
anticipation. It doesn’t even have to start enough to match the
expected load, just enough to prevent going down completely and react
sufficiently quick to get back up.

Challenges

Working with EC2 poses its own set of challenges. The obvious one is
the lack at persistent storage, although Amazon is working on
that. This means that if your server goes down due to unforeseen
circumstances such as hardware failure, then without extra
precautions, you stand to lose all your data. The usual solution is to
backup the data regularly to S3 or your own storage. Even if you do
hourly backups, you can still lose an hour worth of data which can be
detrimental if the data is highly important.

A better solution looks to be to introduce redundancy. Amazon EC2
supports availability zones that allow you to force hosts to start in
physically separated and isolated locations, severely reducing the
chance of overall failure. If the data is thus placed on two or more
hosts and one goes down, then the other still has the data. This
supposes then that at least one host is running all the time, or that
we save the data to S3 before taking a all hosts down.

An important choice is how we split the full data set of a running
instance over the corresponding AMI and the data partition. More
precisely, the question arises whether the AMI should carry an
installation of your application or not. On the one hand, you can
create an AMI for each of your applications, or even for each role
without your application such as application server, database server,
a server for running heavy tasks, etc. On the other hand, you can
create one generic AMI on which you deploy your specific application
after instantiation.

Currently, we are tending towards the option of using a single generic
AMI because we believe it is the more flexible one. It really puts
all its money on your deployment. You should really have scripted your
deployment with packages like Capistrano anyway, whichever option you
choose. This deployment can then be made as flexible as you could ever
want, allowing each instance you start to be uniquely
configured. Another advantage is that you can put multiple
applications on a single host. This could be useful if you have an
application that is rarely used and that you “piggyback” off another
application’s host, or if you have applications whose use in time
doesn’t overlap (much), e.g., one for Europe, one for the US and one
for China. The downside of a generic AMI is of course that starting a
new instance takes longer because we need to do a full deployment. If
we need to be able to start instances quickly, we can still complement
this scheme by taking “AMI snapshots” of deployed instances and
booting these.

Implementation

With one generic AMI, all boils down to the deployment. The more
flexible we make it, the more flexibility we gain in maintaining our
applications. Our requirements were these:

  • Be able to put the database, the web application and the heavy tasks
    on separate EC2 instances, with the possibility of starting multiple
    instances for each of these parts of our application.
  • Start the EC2 instances and install the right version of our
    application automatically on deployment so we get “one-click”
    deployment. Basically, we want to run one Capistrano command which
    we pass a single config file, and all is set up automatically.
  • Allow easily changing the setup of the deployment. This includes
    changing the number of mongrels, the number of processes for running
    the heavy tasks, and how many EC2 instances we dedicate to each
    part of our application.
  • Because EC2 dynamically assigns addresses to the EC2 instances
    (except if we assign one of our elastic addresses), we need some
    more user-friendly way to refer to the EC2 instances than by
    address.
  • Deploy multiple applications to the same set of EC2 instances.

We basically use a slightly modified version of the AMIs provided by
the EC2 on Rails project. For future reference, the data partition is
mounted on /mnt. For deployment we use Capistrano which is a great
tool for the job. We do need some tricks to get some of the behaviour
we want though.

To refer to EC2 instances, we use labels. Labels are uniquely attached
to each EC2 instance we deploy. This label is placed in a file
/mnt/LABEL on each of the instances. This allows us to identify the
instances. We don’t want to be constantly fetching those files from
all our running EC2 instances, so we cache it in a file with the ID
assigned by Amazon to the instance and the address as keys. When we
read the file, we cross-reference it against the output of the
“ec2-describe-instances” tool. If this output shows that some instances
disappeared or that an instance changed address, we invalidate the
entry for that instance. Only for new instances, we fetch the labels to
update our cache. So only if an instance with the same identifier and
the same address is started by us, we may run into trouble, but this
is highly unlikely.

Next, we start the instances for the labels that have no instance
assigned per the procedure above. Because we want to have one-click
deployment, after starting an instance we would right away deploy to
it. This doesn’t work though as the instance doesn’t get an address
right away, and it still needs to boot before the SSH server is
started. The output of “ec2-run-instances” gives us the ID of the
instance we started and we then poll by periodically running
“ec2-describe-instances” on the ID until an address shows up. Then we
use that address to contact the instance to put the label file on
it. The connection may fail because the SSH server hasn’t started yet,
so we have to rescue any exceptions and retry, like this

<code class="ruby">
begin
  put label, "/mnt/LABEL"
rescue Exception => e
  p e
  retry
end

The method put is used by Capistrano to upload files. We only need
to do this in this method, afterwards we are sure the EC2 instance is
reachable.

Right away, the question is raised how we connect to the newly started
server only. A Capistrano task is typically executed for several
predetermined servers at once, with commands such as run and put
being run on each of the target servers. This makes it impossible to
include if statements that are evaluated on a per server basis, but
that is what we need. For each server, we need to check if it runs and
if not, start it. To make things worse, we can’t specify the servers
ahead of time because we don’t know the addresses ahead of time.

To solve this, we generate the Capistrano tasks dynamically from the
moment the server addresses are known. It would help if Capistrano’s
roles could be dynamically scoped, but as far as we know they
can’t. For each conceptual task, we generate an actual task for each
of the addresses. This allows us to take the actual addresses from the
config file or get them from EC2. The template we use is something
like this:

<code class="ruby">
  def configurated_task(name, &blk)
    task name do
      configure
      instance_eval &blk
    end
  end
  def configure
    config = YAML.load(File.read(conf))
    task_for_hosts :do_setup, config[:hosts] do
      setup_host unless host_is_setup
    end
  end
  task_for_hosts name, hosts, &blk
    hosts.each do |h|
      task "#{name}_for_#{h}", :hosts => [h] do
      end
    end
    task name do
      hosts.each do |h|
        send("#{name}_for_#{h}")
      end
    end
  end

  configurated_task :setup do
    do_setup
  end

This defines the setup task that is called as so


  cap setup -s conf=conf_file

The configured_task method generates a task with the given name
which first calls configure and then calls the block. The hosts are
extracted from the config file and tasks are generated for these
hosts. This operation is actually done by task_for_hosts which
generates tasks for each of the hosts that execute the block and that
encode the host in its name. It generates one additional task that
just calls the host specific tasks it just generated.

This setup enables us to give each host a personal treatment. We can
check per host if it was already set up, and if not set it up. We can
check per host if a certain revision of our application has already
been deployed, and if not deploy it. We can start a different number
of mongrels or tasks per host, depending on what instance type we’ve
chosen.

The two techniques outlined above provide us with all the tools necessary to
satisfy all of our requirements. There may be better ways to do this,
especially the hack with hosts, but it works well.

Conclusion

In our business, it sure looks like EC2 could be a valuable asset. It
not only gives us the computing power we need at will, but it also
allows us to reduce the computing power when possible to save
costs. This gives us a cost-effective solution that allows for a
flexibility that is unseen with the classic hosting
solutions. Deploying to EC2 has its own set of challenges though, but
some of these will be relieved by the persistent storage that will be
introduced to EC2 later this year.

To make deployment even easier, we will be building a web interface to
control deployment. This web interface can just call the Capistrano
recipes and have it do all the hard work.

Entries per category

  1. 6 pages are tagged with docpublisher
  2. 11 pages are tagged with events
  3. 14 pages are tagged with rails
  4. 30 pages are tagged with ruby
  5. 7 pages are tagged with sharepoint

Recent Comments

Popular Threads