Libxml-ruby memory leaks
Posted by Tim Brys on Apr 01, 2008
Some time ago, we pointed out a serious memory problem when using libxml-ruby. We found a solution, but recently Ben Lam told us about another way to solve this. Details inside.
First let’s take a look at the evil code:
parser = XML::Parser.new
10000.times do |time|
parser.string = "<?xml version='1.0'?><test><test2/></test>"
parser.parse
end
This adds 90Mb of memory in 0.2 secs to the process.
Solution 1 : Fork
Our solution back then, was to perform this parse in a fork, thus releasing the memory when the child process ends. This works fine, as long as you don’t need the XML::Document object in your parent process (it can’t be dumped). Other objects can be returned by dumping them and passing them through a pipe.
Example
parser = XML::Parser.new
rd, wr = IO.pipe
begin
pid = fork do
rd.close
begin
parser.string = string_to_parse
parsed = parser.parse
#do something
wr.write Marshal.dump(result)
wr.close
rescue
ensure
exit!
end
end
wr.close
result = Marshal.load(rd.read)
rd.close
Process.wait(pid)
end
Solution 2 : GC
Another solution, pointed out by Ben Lam, involves the Ruby Garbage Collector. After parsing the xml, invoke the GC and voila, memory usage remains unaffected. This removes the overhead of creating a childprocess and returning the data through a pipe.
Code
parser = XML::Parser.new
parser.string = string_to_parse
parser.parse
GC.start
Comparison
Depending on your needs, you will want to use either solution 1 or 2. The GC solution looks a lot cleaner in code and doesn’t need any UNIX tricks to work, yet when it comes to performance, the fork beats it by almost 1 to 4.
Benchmark
[:regular_parse, :gc_parse, :fork_parse].each do |method|
start = Time.now
#parse this string 1000 times
send method, "<?xml version='1.0'?><test><test2/></test>", 1000
finish = Time.now
puts "#{method.to_s.capitalize}: #{finish-start} seconds"
end
Output
Regular_parse: 0.035903 seconds Gc_parse: 5.601423 seconds Fork_parse: 1.574503 seconds
Environment
- ruby 1.8.6
- libxml-ruby 0.5.2.0
- XML::Parser 0.5.2.0 (found by running ‘ruby -e ‘require “rubygems”; require “xml/libxml”; puts XML::Parser::VERSION’‘)