when JavaScript beat C

This is a post about my journey through the depths of the V8 Javascript engine’s source code, specifically trying to explain how a micro benchmark in string to number conversion in Node would grossly outperform the equivalent in C, even when cranking up the optimization level to -O3 (feel free to check out the github repo)

js_vs_c_tonumber

No, that is not a typo, JS was parsing positive integers used in the benchmark about 8.5x faster than C – how can this be? (NOTE: the C code uses atof to parse the strings since JS does not really have integer types)

Since we’re going to use string-to-number conversion quite a bit in our product, I decided to take a look under the covers to see how the conversion happens in JS.  Having used C/C++ for more than a decade before my current venture in JS, I felt confident I’d be able to get to the root of this quickly. However, being a complete n00b to the V8 source code it took quite some time to narrow down where the conversion’s entry point is, that journey took me to builtins the CodeStubAssembler. After hours of reading the source code and a few articles explaining their purpose, I did finally get to the source code dealing with the conversion of a string to a number String::ToNumber. Hmm that was fun I thought to myself, but little did I know that while digging through the code I would come across a neat trick that the V8 developers have used to speed up the conversion of certain numeric strings to numbers.

Before looking at the trick though, it is important to quickly review two things:
a) String representation in V8 and
b) how Array elements are accessed in Javascript.

V8 has various String representations (dealing with, substrings, concatenations etc out of the scope for this post), each String has a field to store the hash of the string. This hash value is computed lazily (by IteratingStringHasher::Hash) and used (mostly) when working with hash maps.

In Javascript you can access the elements of an Array using numeric indexes, no surprise there, however being a dynamically typed language it also allows for accessing of Array elements using the string representation of those numbers, ie

array[2] === array['2']

The V8 engine would have to convert the index string into a number, however since Strings are immutable and array operations tend to occur over loops it would be pretty expensive to do that conversion every time. Caching the value into the String object would be great, but would add to the memory overhead of Strings (even the ones not ever used to index to arrays).

Here’s where the V8 developers use a neat trick, similar to tagged pointers used to represent small integers, but this time the target is String::hash. For a class of strings, those that can be used as array indexes, (30 bit unsigned integer, with 2 bits needed for “tagging”), the hash is set to the number represented by the string.  (There’s also another unrelated hashing optimization used; for very long strings, longer than 16KB, the length of the string is used as the hash value.)

Therefore, in V8 strings that represent small positive integers are converted to numbers only once during their lifetime and that value is cached (in the hash field). String::ToNumber takes advantage of this whenever asked to convert a string to number (as you can see in the snippet below).

 // Fast array index case.
 uint32_t index;
 if (subject->AsArrayIndex(&index)) {
    return isolate->factory()->NewNumberFromUint(index);
 }

 

Now, we can see that our microbenchmark was not fair to C as in Javascript the conversion would happen only once and be reused, as we do reuse the string (remember we’re interested in string to number conversion time, not new object creation/copy time)

Adjusting for skipping this optimization (by using integers > 7 digits) shows that the conversion in V8 is roughly the same as that of C. My (untested) hypothesis for this is that for longer numbers (either larger or more significant digit) multiplication by 10 becomes the dominating factor that can’t be optimized away.

js_vs_c_tonumber_lg

Partying thoughts:

A) Kudos to all V8 developers for building a kick ass engine!!!
B) It’s interesting how V8 really falls behind with garbage / empty string conversions (digging into this another time)
C) In the current version of Node (10.4.1) converting to a number using +’String’ is generally faster than Number(‘String’) – I’ve raised an issue with Node around a performance regression in this area
C) In micro benchmarks there’s always a risk of testing a highly optimized (or non-optimized) path, so whenever results look too good/bad to be true be prepared to dig in and understand what’s going on.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s