Understanding how Ruby stores objects in memory – the Ruby Heap

October 29th, 2009

Ruby has it’s own heap management which actually consists of several ‘Ruby Heaps’ to manage objects created during the execution of a Ruby program; this is separate from the System Heap for your Operating System. Each individual Ruby Heap contains Slots, with each Slot able to one reference one object.

The entire space that an object takes up in memory ***is not stored inside the Slot***. Rather each Slot is a small fixed size space which can be thought of as the Ruby interpreter’s handle a location in memory. This location exists outside of the Ruby Heap itself and contains the real ‘meat’ of the object. To be clear, if you have a 50MB string – the 50MB of data is stored outside of Ruby’s Heap. If you really want to know the story of the 50MB, the space for it is actually allocated by something like the malloc command in C (as good ol’ Ruby is written in C) and then stored on the System Heap. The Slot in the Ruby Heap simply contains a reference to that memory location on the System Heap which contains the 50MB of data.

Here’s an example. Let’s say that a Ruby program creates a single string of 50MB
* A single free Slot in a Ruby Heap becomes filled
* Memory to store the 50MB of data that makes up the string itself is allocated in memory and put on the System Heap (outside the Ruby Heap!) and a reference to this location is stored in the Filled Slot on the Ruby Heap
* There comes a point in time when this string is no longer needed. This slot is garbage collected on the next GC iteration
* The Filled Slot is turned into a free slot. The 50MB of data in memory referred to by the slot is also freed and returned to the Operating System

Ruby starts of with a minimal set of Ruby Heaps. These are managed by by a Ruby Heap list. Ruby creates Ruby Heaps when needed and frees Ruby Heaps back to the OS when no longer needed (the latter is done in a sub-optimal manner – more on this later). Each Ruby Heap created will be 1.8 times the size of the previous heap. In other words, it will contain 1.8 times the number of slots of the previous heap. Ruby’s Garbage Collector, periodically iterates through the Ruby Heaps and frees up any Slots as appropriate (and also the memory that an object really occupies which is referenced by the Slot – ie. the 50MB data of the String) back to the system. Once a GC iteration is complete, some of the Slots that were filled will now be empty – known as Free Slots. Remember that we said that Ruby’s Heap management actually consists of many Ruby Heaps. Well if one of these Ruby Heaps consists of only Free Slots then the Ruby Heap itself will be freed back to the Operating System.

There is a problem with this last statement however – if a Ruby Heap contains mostly Free Slots and one Filled Slot then it will not be freed. You could have many Ruby Heaps in this state. As long as a Ruby Heap contains even one Filled Slot it will not be returned to the Operating System. It just takes one bad apple to spoil everything! What would be nice is if some sort of Heap Compaction (kind of like disk fragmentation) took place where all Filled slots were pushed together into completed filled Ruby Heaps. This would leave you with completely filled Ruby Heaps, one semi-filled Ruby Heap and then a bunch of completely empty Ruby Heaps. The completely empty Ruby Heaps could then be freed, releasing precious memory back to the Operating System. Alas the current mainstream Ruby interpreter does not do this.

References
* How the Ruby Heap is Implemented Phusion Passenger’s Hong Lai gives a great explanation of the Ruby Heap – the banner may not be quite suitable for work. Fortunately, there’s a censor button :-)

* Fine tuning your garbage collector Chris Heald explains some of the settings around garbage collection

* Ruby’s Garbage Collections effect on Ruby on Rails Pluron Inc’s blog discusses so of the knock-on effects of Ruby GC on Rails and importantly mentions the 8 MB memory allocation tigger for the garbage collector

Bleak House – A Tool for measuring Objects in Memory for a Ruby Program

October 28th, 2009

Bleak House is a tool that tells us
- How many Slots there are in total at a point in time in a Ruby program
- How many Slots in total are filled
- How many Slots in total are empty (free)
- How many Filled Slots can be attributed to a particular line of code

Bleakhouse can be used to tell you if program is holding on to objects that it should be relinquishing. But it doesn’t tell you how much data is stored in memory for the ‘meat’ of the object (ie. that 50MB of data in a 50MB String). Just because you know there is a Filled Slot exists – you don’t know if the data in memory that correlates back to that Slot is 1MB, 10MB or 100MB.

However, if you repeat a series of a specific set of operations a small number of times, measuring with Bleakhouse, and then restart the server with Bleak house and repeat the operations a large number of times and see a big difference in the number of filled slots can tell that your program is holding onto objects (references) that it should not. Of course, if your program is supposed to keep hold of an increasing number of references (such as a global variable or a singleton that keeps accumulating references for the duration of your program) then this would be expected. Though you might want to double check your design. You will be able to see the cause of the problem from the detailed breakdown of which lines of code were the biggest offenders in terms of creating objects. If you see a large number of free slots (relative to the number of filled slots) then this means that at some point in your program a lot of objects existed (possibly due to a spike in application usage) but then reduced.

Does the free slots count matter? Well, yes because there is an memory overhead due to each free slot that exists – how much depends on your particular system. If your system has a slot size of 20 bytes then every one million free slots costs you an additional 20MB that is not being utilised. This becomes a problem if your application is subject to large but infrequent spikes in the number of objects that exist within your program a particular moment in time because the free slots are taking up significant amounts of memory even when your application is twiddling its thumbs between the spikes.

Getting Drupal 7 (development snapshot) running on Ubuntu

September 28th, 2009

Notes before starting

  • At time of writing, only experimental snapshots of Drupal were available with the actual Drupal 7 release being some way off. The Drupal 5.x or Drupal 6.x series is recommended if you want to use Drupal in production
  • This guide was written against Ubuntu 8.04
  • This guide assumes
    • You have apache, mysql and php happily installed on your machine
    • That you are serving websites out of /var/www/
    • That that sites under /var/www/ are accessible in your browser via http://localhost/

Instructions

  • Download Drupal 7 from the development release section at and http://drupal.org/project/drupal
  • Extract the folder and rename the extracted folder to drupal7
  • Move the folder to an appropriate place on your webserver, such as under /var/www/
  • Create a fresh settings file for your new Drupal 7 site. You should use the supplied settings file as a basis, eg.
    • cp sites/default/default.settings.php sites/default/settings.php
  • Make the settings file writeable, so that the installer can edit it, using
    • chmod a+w sites/default/settings.php
  • Create a database for your Drupal 7 site
    • Log into mysql, eg.
      • mysql -uroot -p
    • Create a database once logged into mysql from the ‘mysql>’ prompt, eg.
      • create database drupal7;
    • Create a user for the database and grant permissions from the ‘mysql>’ prompt, eg.
      • GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, LOCK TABLES, CREATE TEMPORARY TABLES ON drupal7.* TO ‘drupal7user’@'localhost’ IDENTIFIED BY ‘drupal7password’;
  • Run the install script as follows
    • Browse to your new Drupal site at http://localhost/drupal7/install.php
    • Elect to proceed with a normal Drupal install (ie. not minimal) in your chosen language
    • As you proceed through the wizard, the installer may encounter issues creating certain directories for Drupal on the file system. If so, ensure then do the following
      • cd to the root of your drupal7 directory
      • mkdir sites/default/files
      • mkdir -p sites/default/private/files
      • mkdir sites/default/private/temp
      • Change the ownership of the created directories to belong to user account under which the web server runs under (www-data)
        • chown -R www-data:www-data private
        • chown -R www-data:www-data files
    • The wizard now asks you to configure the database settings. In the form presented by the wizard, replace the user ‘admin’ with the ‘drupal7user’ you created earlier and point at the drupal7 database. Remember to put in the correct password for ‘drupal7user’
    • If you get a 500 error in the last step, then ignore it and elect to continue through to the error page. This will actually let you continue the install
    • If you get the error starting with something like “Fatal error: Allowed memory size of 16777216 bytes exhausted” then allocate Drupal more memory in the settings file (see See http://drupal.org/node/90605 for more information), eg. Add the line
      • ini_set(‘memory_limit’, ‘30M’);

Hopefully now you should be good to go!

Audacity Tip of the Day – How Not to Lose Data!

September 19th, 2009

One issue when editing audio, is copy and pasting a section of a track from one open Audacity project to another. To save space audacity does not copy the underlying track completely to the new project, rather it links to it. This means that your second project (the one you are pasting into) is not completely self-contained as it depends on external files. This can be a quickfire way for the unsuspecting podcaster to lose a whole bunch of data (yes, me!). This problem is particularly nasty as you don’t realise something has gone wrong until you close and reopen the project – finding that a long stretch of audio containing your beautiful voice is missing. To avoid this, adhere to the following workflow when copy and pasting from one project to another.

* In the first project, select and copy the audio you wish to duplicate
* Paste the audio into the second project
* In the File menu of the second project, click ‘Check Dependencies’
* Click ‘Copy All Audio into Project (Safer)’
* Just to stress the previous point, you really do want to use the Safer of the two copying options. I’ve found that using the other option results in some of the audio I wanted to copy being truncated

* Save the project

By following this workflow you should hopefully avoid seeing what I call the ‘dreaded blue flat-line of death’ where, on reopening of a project, you find that the middle of a track has been lost. If you adhere to these instructions you should also be able to select ‘Delete orphaned files’ when it appears from time to time while reopening an Audacity project. However, due to the frustrating and sensitive nature of audio loss issues, I accept no responsibility for anything that goes wrong! Best of luck and happy editing!

Open Letter to the Irish Government on Open Source Driven Innovation

September 18th, 2009

“Recent years show that openness and collaboration is essential to the generation of innovation in the software sector. Technology increasingly means software. In Ireland, we can see that the production of hardware technology in many, but not all, cases is providing ever diminishing returns. Here we outline some key policy recommendations that are crucial to the fulfilling the vision of making the Irish Smart Economy a reality for the software industry through the adoption and encouragement of Open Source technologies.”

The above extract is from a paper we are submitting to the Innovation Taskforce as requested by the Department of Taoiseach. The draft paper is available at Positioning Ireland as an International Innovation Hub

Please note, we are submitting the paper ahead of the deadline which is Friday the 18th of September. We appreciate any feedback, support or criticisms you may have. Please post them as comments below.

Simple straight up caching for pages served by Heroku

September 16th, 2009

So you’ve got an app that’s ticking along nicely; being served up a good steak in a 5 star restaurant – but you’d like to boost it’s performance with some caching. For those who develop their apps on the Heroku platform, a great way to do this is to cache a dynamic page using Varnish. This means that your page is served up super fast without hitting Rails/Sinatra/whatever. And best of all it requires no extra gems or anything, just a well placed one-liner in your controller.

Firstly, you can only use this technique if all users that visit this page expect to see the exact same content – in other words you have no ‘per user’ customised content on a page. To help understand how this type of caching works, imagine that the first time your page (let’s say an Events index page) is hit it is turned into a static html page for a pre-defined amount of time (let’s say 60 seconds). Anyone else who visit this page (ie. anyone else who visits this particular controller action) during the next 60 seconds gets that static html page. After the 60 seconds the static html page is removed from the cache. Thus the next hit will cause your underlying dynamic page to be invoked; then the caching process kicks off again lasting another 60 seconds. And so on and so fourth.

With the increasing amount of web applications that call APIs, such as Twitter’s API, this is a really easy way to ensure that you do not end up spamming a service provider with an unreasonable number of calls per hour. This is the technique we use on www.thelisbontweety.com to keep our API overhead down.

So how do you do this? Simply put something along the lines of

response.headers['Cache-Control'] = ‘public, max-age=60′

as the first line of your action for the page you wish to cache. The max-age setting means that this will be cached for 60 seconds. After you put this in your application and redeploy to Heroku, you can see if it’s working by using http://hurl.it

Just enter the  URL for your action and click Send. You should see something like “Cache-Control: max-age=60, public” in the output if it’s working.

And that’s it! No need to install anything. Just cache your little heart out with Varnish. Top marks to chaps at Heroku for making this so easy to use out of the box at Heroku. For more on this technique check out their HTTP caching docs at http://docs.heroku.com/http-caching

Packaging Ruby Apps for Ubuntu: Dissecting an existing Ruby Ubuntu Package

September 9th, 2009

One of the best ways to learn about how a Ubuntu package is put together is reverse engineer the package into it’s constituent components. We are going to take a look at how to do this for the chef application and it’s related libchef library is packaged as a Debian package.

* Visit the page http://packages.ubuntu.com/karmic/ruby/chef
* Under the Download chef section, download the package via the ‘All’ link into a directory called chef
* Visit the page http://packages.ubuntu.com/karmic/ruby/libchef-ruby1.8
* Under the Download libchef-ruby1.8 section, download the package via the ‘All’ link into a directory called libchef1.8

From the following guide (http://www.g-loaded.eu/2008/01/28/how-to-extract-rpm-or-deb-packages) you can learn how to ‘unzip’ a Debian package. This is easy as they are pure ar archives. Here’s what we need to do

* In the chef directory, run the commands

ar vx chef_0.7.8-0ubuntu2_all.deb
tar -zxvf data.tar.gz

* In the libchef1.8 directory, run the commands

ar vx libchef-ruby_0.7.8-0ubuntu2_all.deb
tar -zxvf data.tar.gz

Now you can study the layout of the of the data payload of the package (this is where to look in order to study the anatomy of the application as it was being packaged). This layout is what will be of most interest to you.

If you have an application in a particular programming language that you wish to package, pick a similar application for which a package already exists and dissect it as shown above. Then bend your app into a similar shape in terms of directory layout before attempting to package it. To find out more about how to create your own Ubuntu packages check out this great video by Horst Jens Ubuntu: Making a .deb package out of a python program. It’s worth the effort of watching it to the end!

Happy packaging!

Bringing Back the Spirit of the Amateur Programmer

August 26th, 2009

In a blog post this month, Richard Dale (the man behind Qt/KDE’s Smoke bindings) eloquently phrased a noble goal,

“In the 1980s there were lots of computer magazines that used to publish programming articles with BASIC code, that everyone could input and run on their own computers. However, in the 1990s such large scale end user computer programming pretty much died out – tweaking the odd web page isn’t quite the same thing. One of the assumptions that the Free Software movement makes is that every user is also a programmer of some sort, who is able to tweak the software on their computers. I hope we can get back to that spirit, and change the way that people think about KDE programming, because at the moment there is a tendency to think it is hard and that only the ‘C++ gods’ like David Faure or Thiago Macieira can do it. In fact it is pretty easy to write small Python and Ruby apps and plasmoids, or to write a little script to message an app over DBus. We just need to get communities of like minded people together who write tutorials on TechBase, create blog entries with code (like the 1980s BASIC articles), and help beginners get started. These ubiquitous end user programming environments in Kubuntu (and other distributions I hope) will make it possible to do that.”

This really sums up something that would be fantastic to see over the next few years. There’s so many gadget lovers and technology geeks out there – the type of people who would’ve probably punched those BASIC tutorials into a Commodore 64, an Amstrad CPC464 or ZX Spectrum back in the good old days – that feel left behind as they perceive professional programmers to have blazed ahead a path that cannot be caught. But in many ways nothing could be further from the truth. For any programmer, there’s always some guy or gal that’s coding something more challenging or doing cleverer(er) stuff on the next machine. It’s all relative. And since software turned into a mainstream industry over the last couple of decades, it’s been the programmers doing the simplest tasks that have made the megabucks whilst the hardcore wizards of machine code and assembly have seen their demand dimish.

So next time you think there’s no point in picking up a few programming skills give a language like Ruby or Python a shot. Hopefully, with the continuing progress of Kubuntu and other distro’s to make programming more accessible, you’ll have the perfect environment to do so!

HOWTO: Add a secondary hard drive for Windows via VirtualBox 2.1.4 OSE

July 22nd, 2009

Sometimes one disk just isn’t enough. In fact most times! Here’s how to add an E Drive to your Windows Guest OS

  • In VirtualBox, go to the File->Virtual Media Manager
  • Click the New button to create a new hard disk and create a new hard drive file via the wizard (a .vdi file)
  • Ensure Windows Guest OS has been shut down
  • Click on the Settings button for your Windows Guest OS, and choose the Hard Disks tab
  • Add the .vdi file you just created to be a hard drive for the Guest OS by clicking the Add Attachmenticon on the right and selecting your .vdi file in the file explorer from the drop down list that appears, here is the typical order of drives on a system
    • IDE Primary Master: this is the main drive onto which your Guest OS is installed (very important!)
    • IDE Primary Slave: this is the first additional drive you add on (ie. the E: drive)
    • IDE Secondary Slave: this is the second additional drive you add on
  • Once your done start up you Virtual Machine Guest OS
  • Visit the Control Panel in Windows
  • If you’re using XP elect to ’switch to classic view’ on the left of Windows Explorer
  • Visit Administrative Tools->Computer Management
  • Under the storage node on the left, there should be a Disk Management node
  • A wizard should instantly popup to help you add the drive as soon as you click on this, agree to its demands
  • After the wizard, you should see a drive displayed as ‘unallocated’ as you scroll down the list of drives
  • Right click, select New Volume and before you know it you should be done!
  • Now go and open Windows Explorer and experience the joy that is an extra drive!

A Breath of Fresh Air – The Well Grounded Rubyist

June 9th, 2009

After a tough day in the office you want to catch up on the news, so you look at the ticker on a TV channel or tune in your car radio. Other days you’ll want to sit down with a meaty broadsheet and really take in the detail of what lies behind the headlines. This is a book about Ruby which which triumphs at walking the line between these two styles. The Well Grounded Rubyist aims to appeal to a developer that has been exposed to some Ruby coding and take their knowledge to the next level. And it succeeds brilliantly.

This is not a book about Rails or any other web frameworks; purely Ruby. Though much of the material will also apply to the 1.8.x series of Ruby, this is a book about the 1.9 version of the Ruby language. It’s broken into three parts – Ruby foundations, Built-in classes and modules and finally Ruby dynamics. But don’t let the title of Ruby foundations fool you for part one – this is not some remedial rush through the basics of Ruby in six chapters. Rather, after a couple of warm-up chapters, it moves quickly to clarify the key aspects of how classes and module inter-relate, as well things such as crystallising what ’self’ really means in different contexts in a Ruby program. The author sets out his stall early – what makes Ruby different from other languages is it’s focus on objects rather than classes. Everything else stems from this and by the end of the section you feel like you have an understanding of Ruby’s design and focus.

Part two of the book is Built-in classes and modules. Now that you know what makes Ruby tick, it’s time to get seeped in all aspects of the core library that ships with it. One of the problems when learning a language is that becoming familiar with all methods of a particular core class is a tedious task. It’s much more interesting to learn about concepts such as meta programming than memorising lists of methods by rote. But if you don’t take the time to familiarise yourself with the dusty corners of a language’s API then you’re less likely to think of those handy methods when a problem they would elegantly solve presents itself. At this point the book shifts gear to a more reference style of text. However, it still gives the reader an interesting story to follow as it documents arrays, hashes and other classes – throwing in the occasional golden nugget of information that will be a valuable additional to the toolbox of even experienced Rubyists. One side effect of the change in style is that this section is probably the most accessible to beginners. Again it’s broken down into six chapters. In addition to collections, it also covers topics such as regular expressions and file handling. Each topic takes a zero-to-hero approach meaning that you can bring little regular expression knowledge to the table yet still walk away learning an immense amount about the subject.

The final part of the book, Ruby dynamics, returns to the book’s roots from part one – a focus on imparting a deep knowledge of Ruby’s design. Before you even pick up this book you have an inkling that procs and lamdas are going to make a guest appearance at this late stage. And they do not disappoint. Extending the behaviour of objects takes centre-stage and meta-programming based techniques move quickly to the fore. Any block/proc/lamda confusion you may have will be a distant memory by the time you finish this section. Threading is also covered here – though a detailed discussion of 1.9’s new native OS threads vs green threads is left to one side to focus on the usage of threads regardless of which underlying type you use. Lots of material is also provided on querying objects; which is not only useful for program design but also invaluable as a debugging aid. The book really shines in this section because a lot of other texts make the mistake of going into ’super-boffin’ mode at this point, leaving the reader lost, whereas the author here continues to provide patience and context to get you round that final lap on you way to becoming a Well Grounded Rubyist!

This book cannot be all things to all people. Because it is catering for a wide range of intermediate to advanced Rubyists, it will feel like it’s moving a little too slowly for some. By the author’s own admission, this is in order to make it accessible to a wider audience and no doubt it will make it easier for developers of all levels to digest – your humble reviewer very much included! Ruby first-timers would be best off having some straight-to-the-point tutorials or entry-level text to hand in order to get some instant gratification – as part one of the book, by it’s very nature, is a little more abstract than a complete beginner would expect. But all in all, this book is a great way to learn just how Ruby crams so much expressiveness into such a simple clean framework. Whenever I read a book like this I keep a list of new things learned along the way. For The Well Grounded Rubyist it is a very long list! Well done to David A. Black and Manning for producing a book that fills those gaps in many Rubyists’ understanding of the language while at the same time delivering an absorbing readable book that would sit proudly on any Ruby programmer’s bookshelf.