Company Blog

Zero downtime deployments with chef, nginx and unicorn

Rails, Nginx and Unicorn are a common trifecta in the world of web applications. In this blog post, we’ll show you how to use chef to create a zero downtime deployment for one of these applications.

This article assumes you already have a chef node with all the necessary packages installed to run your app. If you’re not that far along yet, check out how to bootstrap a node.

Deploying code with chefs deploy resource

Chef has a handy deploy resource which will grab code from a repository and deploy it to a chef node. It has a lot of other functionality, such as the ability to run migrations and create symlinks. Check out the deploy resource documentation for more information.

The fine folks at Chef recommend using deploy_revision, which works like deploy, but also ensures that the name of a release subdirectory is based on git’s SHA checksum revision identifier.

Here is an example of a deploy_revision recipe:

deploy_revision "#{node['app']}" do
  deploy_to "/data/#{node['app']}"
  user 'deploy_user'
  group 'deploy_user'
  repo 'git@github.com:user/app.git'
  migrate false
  ssh_wrapper node['ssh-key-wrapper-path']
  branch node['branch']

  symlinks({
    'config/secrets.yml' => 'config/secrets.yml',
    'log' => 'log',
    'system' => 'public'
  })

  before_symlink do
    execute 'bundle install' do
      command "/path/to/ruby-#{node['ruby']['version']}/bin/bundle install"
      user 'deploy_user'
      cwd "#{release_path}"
    end

    node.default['release_path'] = release_path
    run_context.include_recipe 'deploy_assets'
  end

  notifies :run, "execute[restart #{node['app']}]", :delayed
end

execute "restart #{node['app']}" do
  command "/usr/local/bin/#{node['app']}_app deploy"
  user 'deploy'
  action :nothing
end

Points of interest:

  • deploy_to: This specifies the directory to deploy code to. Three subfolders will be created here: current, shared and releases.
  • migrate false: We don’t want to run migrations during a zero downtime deploy. If they’re destructive, this can result in downtime.
  • ssh_wrapper: Our code is hosted in a private repository. In order to allow our chef node to clone it, we need to:
    • Generate a public/private key pair
    • Copy the private key to the node
    • Upload public key to github (or wherever the repository is hosted)
    • Use an ssh wrapper to specify the path to the private key:
      /usr/bin/env ssh -i /home/deploy_user/deploy_user_key
    • Copy the github host key to the nodes known_hosts file, or better yet use the ssh_known_hosts cookbook to do it
  • symlinks: This helper creates symlinks. The keys of the hash are the target file located in the shared directory. The values are the link which will be created in the release directory.
  • The restart script, which restarts unicorn gracefully (see next section)
  • The deploy_assets recipe – which we’ll address later. Note that in order to use the release_path variable (set by chef) we defined it as an attribute:
    node.default['release_path'] = release_path

Restarting unicorn

A zero downtime deploy requires a graceful restart of the unicorn workers. This is done by sending a USR2 signal to the unicorn master in the app restart script:

kill -USR2 $unicorn_master_pid

When we send a USR2 signal to the unicorn master, it re-executes the running unicorn binary. It suffixes its pidfile with .oldbin and starts up a new unicorn master pointed to the new release directory. The first worker spawned will check to see if an .oldbin pidfile exists. If it does, it kills the old process by sending it a QUIT signal. Here’s the code required to make this happen (in config/unicorn.rb):

before_fork do |server, worker|
  # Close connections on the master process if ActiveRecord is loaded
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end

  old_pid = "#{server.config[:pid]}.oldbin"

  if File.exists?(old_pid) && server.pid != old_pid
    begin
      sig = (worker.nr + 1) >= server.worker_processes ? :TERM : :TTOU
      Process.kill(sig, File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # It’s already gone
    end
  end
end

Asset management

At this point we ran into a tricky problem with the assets. During the zero downtime deployment, they would disappear for a split second, causing the website to render without any css or javascript. What was happening? After digging through some logs we realized the old unicorn workers were looking for the new assets in the old release directory just before being reaped.

To achieve a seamless zero downtime, the unicorn workers must have access to the new and old assets at all times. To achieve this, we moved the compiled assets into /data/app/shared/assets. The previous releases’ assets are kept in /data/app/shared/last_assets/assets (Rails automatically prefixes asset urls with /assets, which is the reason for the subfolder). Then, the release directories contain symlinks to both these folders. This is similar to the strategy capistrano uses.

asset_strategy

Here is an example recipe:

# Precompile the assets
execute 'assets:precompile' do
  command "/home/#{node[‘app’]}/ruby-#{node['ruby']['version']}/bin/rake assets:precompile"
  user 'deploy_user'
  cwd "#{node['release_path']}"
end

# Create required directories if they don't already exist
directory "/data/#{node['app']}/shared/assets" do
  owner "deploy_user"
  group "deploy_user"
end

directory "/data/#{node['app']}/shared/last_assets" do
  owner "deploy_user"
  group "deploy_user"
end

directory "/data/#{node['app']}/shared/last_assets/assets" do
  owner "deploy_user"
  group "deploy_user"
end

# Empty out the last_assets folder
# shopt -s dotglob ensures the hidden Rails sprockets manifest is also copied
execute "clear out last_assets" do
  command "/bin/bash -c 'shopt -s dotglob && rm -rf /data/#{node['app']}/shared/last_assets/assets/*'"
  user "deploy_user"
end

# Move shared/assets to shared/last_assets/assets.
# Note the not_if: we don’t want to do this if shared/assets is empty
# eg. on the first deploy
execute "move shared/assets to shared/last_assets/assets" do
  command "/bin/bash -c 'shopt -s dotglob && mv /data/#{node['app']}/shared/assets/* /data/#{node['app']}/shared/last_assets/assets'"
  user "deploy_user"
  not_if {`ls -A /data/#{node['app']}/shared/assets`.to_s.chomp == ""}
end

# Move all the freshly precompiled assets into shared/assets
execute "move current/assets to shared/assets" do
  command "/bin/bash -c 'shopt -s dotglob && mv #{node['release_path']}/public/assets/* /data/# {node['app']}/shared/assets' && rmdir #{node['release_path']}/public/assets"
  user "deploy_user"
end

# Create a symlink from current/assets -> shared/assets
link "#{node['release_path']}/public/assets" do
  to "/data/#{node['app']}/shared/assets"
  user "deploy_user"
end

# Create a symlink from current/last_assets -> shared/last_assets
link "#{node['release_path']}/public/last_assets" do
  to "/data/#{node['app']}/shared/last_assets"
  user "deploy_user"
end

The file move and symlinking operations done above should be nearly instantaneous, which keeps the zero in our ZDT.

Finally, nginx needs to be configured to serve assets from both /assets and /assets/last_assets. Since Rails automatically appends /assets to any asset request, /assets is implied as part of the $uri. Add this to your /etc/nginx.conf (or the chef .erb template that deploys it):

root /data/app/current/public;

location ~ ^/(images|assets|javascripts|stylesheets)/ {
  try_files $uri $uri/index.html /last_assets/$uri /last_assets/$uri.html @app;
  expires 10y;
}

The try_files directive will check the existence of the file in the specified order.

That solved the problem. No more disappearing assets during deployment!

Conclusion

With that completed we had our zero downtime deployment working. To deploy we simply run the following command in a Jenkins job:

knife ssh 'chef_environment:app AND name:app' \
--attribute ec2.public_hostname \
--ssh-user user "sudo chef-client -o 'recipe[deploy_zdt]'

Happy chef’ing!