Libxml-ruby memory leaks

Some time ago, we pointed out a serious memory problem when using libxml-ruby. We found a solution, but recently Ben Lam told us about another way to solve this. Details inside.

First let’s take a look at the evil code:

parser = XML::Parser.new
10000.times do |time|
  parser.string = "<?xml version='1.0'?><test><test2/></test>" 
  parser.parse
end

This adds 90Mb of memory in 0.2 secs to the process.

Solution 1 : Fork

Our solution back then, was to perform this parse in a fork, thus releasing the memory when the child process ends. This works fine, as long as you don’t need the XML::Document object in your parent process (it can’t be dumped). Other objects can be returned by dumping them and passing them through a pipe.

Example

parser = XML::Parser.new
rd, wr = IO.pipe
begin
  pid = fork do
    rd.close
    begin
      parser.string = string_to_parse
      parsed = parser.parse
      #do something
      wr.write Marshal.dump(result)
      wr.close
    rescue 
    ensure
      exit!
    end
  end
  wr.close
  result = Marshal.load(rd.read)
  rd.close
  Process.wait(pid)
end

Solution 2 : GC

Another solution, pointed out by Ben Lam, involves the Ruby Garbage Collector. After parsing the xml, invoke the GC and voila, memory usage remains unaffected. This removes the overhead of creating a childprocess and returning the data through a pipe.

Code

parser = XML::Parser.new
parser.string = string_to_parse
parser.parse
GC.start

Comparison

Depending on your needs, you will want to use either solution 1 or 2. The GC solution looks a lot cleaner in code and doesn’t need any UNIX tricks to work, yet when it comes to performance, the fork beats it by almost 1 to 4.

Benchmark

[:regular_parse, :gc_parse, :fork_parse].each do |method|
  start = Time.now
  #parse this string 1000 times
  send method, "<?xml version='1.0'?><test><test2/></test>", 1000
  finish = Time.now
  puts "#{method.to_s.capitalize}: #{finish-start} seconds" 
end
Output
Regular_parse: 0.035903 seconds
Gc_parse: 5.601423 seconds
Fork_parse: 1.574503 seconds

Environment

  • ruby 1.8.6
  • libxml-ruby 0.5.2.0
  • XML::Parser 0.5.2.0 (found by running ‘ruby -e ‘require “rubygems”; require “xml/libxml”; puts XML::Parser::VERSION’‘)

Entries per category

  1. docpublisher (6)
  2. events (6)
  3. rails (6)
  4. ruby (15)
  5. xml (3)