There’s this Ruby problem I revisit from time to time. The built-in hash array has a really handy map method. To the uninitiated, the map method is used to modify all elements in an array with the function specified.
> [1,2,3].map{ |i| i+1 }
=> [2, 3, 4]
Each element in the array is modified by the block passed into the map method. Similarly, Ruby hashes also have a map method - it does more or less the same thing.
> { "x" => 1, "y" => 2, "z" => 3 }.map{ |k,v| k.to_sym => v }
SyntaxError: (irb):2: syntax error, unexpected tASSOC, expecting '}'
{ "x" => 1, "y" => 2, "z" => 3 }.map{ |k,v| k.to_sym => v }
from /home/smeagol/.rvm/rubies/ruby-1.9.3-p429/bin/irb:16:in <main>
Wait what? Unlike function parameters, you have to wrap your key-value pairs in braces when executing a block like this.
> { "x" => 1, "y" => 2, "z" => 3 }.map{ |k,v| { k.to_sym => v } }
=> [{:x=>1}, {:y=>2}, {:z=>3}]
Hmmm, marginally better - no error… but this isn’t really what we’re looking for. The output is an array instead of a hash - the map method signature calls for an array return type. If we want to modify key-value pairs in a hash based on a common pattern, we’ll have to look elsewhere. Here’s a common idiom:
> { "x" => 1, "y" => 2, "z" => 3 }.\
> inject({}){ |hash, (k, v)| hash.merge( k.to_sym => v ) }
=> {:x=>1, :y=>2, :z=>3}
> # if you prefer big-data parlance, the preferred term is reduce, not inject.
> # Either way, Ruby doesn't care - they do the same thing.
> { "x" => 1, "y" => 2, "z" => 3 }.\
> reduce({}){ |hash, (k, v)| hash.merge( k.to_sym => v ) }
=> {:x=>1, :y=>2, :z=>3}
That works. Our output looks right, but the syntax is really verbose and just looks kind of nasty. Let’s throw caution to the wind and monkey-patch the Hash class with a method to simplify the syntax a bit.
class Hash
def hmap(&block)
self.inject({}){ |hash,(k,v)| hash.merge( block.call(k,v) ) }
end
end
x = { "x" => 1, "y" => 2 }
x.hmap{ |k,v| { k.to_sym => v.to_s } }
=> {:x => "1", :y => "2"}
Much cleaner. Let’s see how it performs.
x = {}
(1..10000).each do |i|
x["key#{i}"] = i
end
t=Time.now
puts x.hmap{ |k,v| { k.to_sym => v.to_s } }
puts "#{Time.now - t} seconds"
=> 31.950418765 seconds
Ouch. This is probably fine for many cases - maybe your hash only has a few key-value pairs. Perhaps you’re just symbolizing some string-based keys… but we can do better. It turns out, you can map the hash into an array, then pass that array back into a new hash:
Hash[x.map {|k, v| [k.to_sym, v.to_s] }]
=> 0.028062438 seconds
That’s more like it! Let’s wrap this into the hmap() method.
class Hash
def hmap(&block)
Hash[self.map {|k, v| block.call(k,v) }]
end
end
x.hmap{ |k,v| [ k.to_sym, v ] }
=> 0.029331384 seconds
Awesome. The syntax is slightly different than the inject version - we pass in an array with two cells - one for the key and the other for the value. I think this is a touch more clunky, but the speed improvement is well worth it.
Keep in mind, this is a non-destructive method - you get a new hash when you run it. If we don’t want to create a new hash, we’ll have to enumerate every key in the hash, create a new key and value based on the block provided and then delete the old key.
class Hash
def hmap!(&block)
self.keys.each do |key|
hash = block.call(key, self[key])
self[hash.keys.first] = hash[hash.keys.first]
self.delete(key)
end
self
end
end
x = { "x" => 1, "y" => 2, "z" => 3 }
x.hmap!{ |k,v| { k.to_sym => v.to_s } }
=> {:x=>"1", :y=>"2", :z=>"3"}
If we run this against the 10,000 key data set from earlier, we decent performance. Not as good, but still not too bad.
x = {}
(1..10000).each do |i|
x["key#{i}"] = i
end
t=Time.now
x.hmap!{ |k,v| { k.to_sym => v.to_s } }
puts Time.now - t
=> 0.061217351
Depending on the amount of data in your hash, your mileage will vary, but I find this a pretty serviceable addition to the hash class. If you’re concerned about monkey-patching, consider including and/or extending this functionality into one of your own classes or instances.
Let me know if you have any other approaches to this problem. I revisit it from time to time and would love to find a way to make the destructive version of hmap!() run faster.