Wednesday, October 8, 2008

Performance Optimizations for High Speed JavaScript

Performance Optimizations for High Speed JavaScript

Description

In this article, we look at how important JavaScript optimizations are analyzed. These will be explained, including using local function variables, avoiding references to objects or object properties, avoiding adding short strings to long strings, and finally, using buffering to process data in optimal sizes. These general-purpose JavaScript optimization techniques are designed for JavaScript on all browsers. Detailed graphs of all the performance results are given after each optimization. You will be amazed at the incredible speed improvements in JavaScript!
Introduction

Optimization of JavaScript computer code deserves attention, since JavaScript has a large impact on Web page performance. In this article we develop two high performance JavaScript algorithms using several performance optimization techniques.

You can easily use the same optimization methods in your code. Every optimization will shorten the execution speed of your program by 10 - 95% or more. Used in combination, these optimizations can mean the difference between programs that run as slow as molasses, or problem-crunching software that runs at a high rate of speed.
Important JavaScript optimizations

Optimizations in a broad sense will involve simplifying code, precomputing results which are repeatedly reused, and organizing code so more results can be reused. From the standpoint of computer programming purity, optimizations should increase the simplicity, clarity and generality of a computer program. (See The Practice of Programming by Brian Kernighan and Rob Pike.)

Adding simplicity, clarity and generality is what these optimizations will do. In fact, one of the optimizations adds support for Unicode and multibyte characters such as   and —, and still improves performance drastically compared to the slower unoptimized version!

We analyze four optimizations in this article. More complicated optimizations or ones with bigger payoffs are listed after easy-to-use optimizations.

* Use local function variables
(82% improvement, 63 microseconds versus 359).
* Avoid references to objects or object properties
(41% improvement, 27.68 microseconds versus 47.297).
* Avoid adding short strings to long strings
(93% improvement, 4.608 microseconds versus 62.54).
* Use buffering to process data in optimal sizes
(96% improvement, 2.0 seconds versus 50.0).

Technique 1: Use local function variables

When I wrote code to implement the MD5 message digest algorithm, my code was first written in C, then transferred to JavaScript. In order to make my JavaScript MD5 code faster than everyone else's, I had to take advantage of local function variables. What makes my MD5 code faster are local function variables and optimized buffer sizes, techniques that we address in this article.

Using local function variables is simple. If you have code which uses variables repetitively, it's worthwhile to make the code into a function to take advantage of the higher performance of local function variables.

Global variables have slow performance because they live in a highly-populated namespace. Not only are they stored along with many other user-defined quantities and JavaScript variables, the browser must also distinguish between global variables and properties of objects that are in the current context. Many objects in the current context can be referred to by a variable name rather than as an object property, such as alert() being synonymous with window.alert(). The down side is this convenience slows down code that uses global variables.

Sometimes global variables also have higher performance, like local function variables, if you declare them explicitly with the var keyword. An example is var d = document, instead of d = document, but it's not reliable. Mysterious behavior that works sometimes but not always is a danger sign, and I feel more comfortable using local function variables.

On the other hand, it makes sense that local function variables should have better performance. There are few local function variables in most functions, and references to local function variables can be converted to efficient executable instructions by the JavaScript compiler.

It's amazing how few people are aware of simple JavaScript optimizations like this, or who simply don't care. For example, no one else has taken the time to re-optimize their MD5 JavaScript code. When I wrote my md5.js script, I had no intention of competing for the top position, and yet the code has been unbeaten since 2003.

Let's figure out why local function variables make a difference by counting to a million, first without local function variables, and then with local function variables.
Counting to one million without local function variables

Tip: the new Date() object returns the time difference in milliseconds when it's subtracted from another new Date() object, thus providing a great way to time your scripts.
1 t0 = new Date();
2 var i;
3 for (i=0; i<1000000; i++);
4 t1 = new Date();
view plain | print | ?
t0 = new Date(); var i; for (i=0; i<1000000; i++); t1 = new Date();
Counting to one million with local function variables
1 function count() {
2 var i;
3 for (i=0; i<1000000; i++);
4 }
5 count();
6 t2 = new Date();
view plain | print | ?
function count() { var i; for (i=0; i<1000000; i++); } count(); t2 = new Date();
Code that gives us the results of the timing
1 d = document;
2 d.write('Without local variables = ', t1-t0, '<br />');
3 d.write('With local variables = ', t2-t1, '<br />');
view plain | print | ?
d = document; d.write('Without local variables = ', t1-t0, '<br />'); d.write('With local variables = ', t2-t1, '<br />');

(See Figure 1)

The result is 359 milliseconds (thousandths of a second) when not using local function variables compared to 63 milliseconds when local function variables are used. This improvement is worth taking the extra time to convert code into a function.


Technique 2: Avoid references to objects or object properties


To illustrate how this technique works, we use a real-life JavaScript function which creates strings of whatever length is needed. And as we'll see, more optimizations can be added!

A function like the one used here is to create padding to align columns of text, for formatting money, or for filling block data up to the boundary. A text generation function also allows variable length input for testing any other function that operates on text. This function is one of the important components of the JavaScript text processing module.
Original code for creating strings stringFill1()

Here, we cover two more of the most important optimization techniques while developing the original code into an optimized algorithm for creating strings. The result is an industrial-strength, high-performance function that I've used everywhere--aligning item prices and totals in JavaScript order forms, data formatting and email / text message formatting and many other uses.
1 function stringFill1(x, n) {
2 var s = '';
3 while (s.length < n) s += x;
4 return s;
5 }
6 /* Example of output: stringFill1('x', 3) == 'xxx' */
view plain | print | ?
function stringFill1(x, n) { var s = ''; while (s.length < n) s += x; return s; } /* Example of output: stringFill1('x', 3) == 'xxx' */

The syntax is here is clear. As you can see, we've used local function variables already, before going on to more optimizations.

Be aware that there's one innocent reference to an object property s.length in the code that hurts its performance. Even worse, the use of this object property reduces the simplicity of the program by making the assumption that the reader knows about the properties of JavaScript string objects.

The use of this object property destroys the generality of the computer program. The program assumes that x must be a string of length one. This limits the application of the stringFill1() function to anything except repetition of single characters. Even single characters cannot be used if they contain multiple bytes like the HTML entity  .

The worst problem caused by this unnecessary use of an object property is that the function creates an infinite loop if tested on an empty input string x. To check generality, apply a program to the smallest possible amount of input. A program which crashes when asked to exceed the amount of available memory has an excuse. A program like this one which crashes when asked to produce nothing is unacceptable. Sometimes pretty code is poisonous code.

Simplicity may be an ambiguous goal of computer programming, but generally it's not. When a program lacks any reasonable level of generality, it's not valid to say, "The program is good enough as far as it goes." As you can see, using the string.length property prevents this program from working in a general setting, and in fact, the incorrect program is ready to cause a browser or system crash.

Is there a way to improve the performance of this JavaScript as well as take care of these two serious problems?

Of course. Just use integers.
Optimized code for creating strings stringFill2()
1 function stringFill2(x, n) {
2 var s = '';
3 while (n-- > 0) s += x;
4 return s;
5 }
view plain | print | ?
function stringFill2(x, n) { var s = ''; while (n-- > 0) s += x; return s; }
Timing code to compare stringFill1() and stringFill2()
1 function testFill(functionToBeTested, outputSize) {
2 var i = 0, t0 = new Date();
3 do {
4 functionToBeTested('x', outputSize);
5 t = new Date() - t0;
6 i++;
7 } while (t < 2000);
8 return t/i/1000;
9 }
10 seconds1 = testFill(stringFill1, 100);
11 seconds2 = testFill(stringFill2, 100);
view plain | print | ?
function testFill(functionToBeTested, outputSize) { var i = 0, t0 = new Date(); do { functionToBeTested('x', outputSize); t = new Date() - t0; i++; } while (t < 2000); return t/i/1000; } seconds1 = testFill(stringFill1, 100); seconds2 = testFill(stringFill2, 100);
The success so far of stringFill2()

stringFill1() takes 47.297 microseconds (millionths of a second) to fill a 100-byte string, and stringFill2() takes 27.68 microseconds to do the same thing. That's almost a doubling in performance by avoiding a reference to an object property.

Technique 3: Avoid adding short strings to long strings

Our previous result looked good--very good, in fact. The improved function stringFill2() is much faster due to the use of our first two optimizations. Would you believe it if I told you that it can be improved to be many times faster than it is now?

Yes, we can accomplish that goal. Right now we need to explain how we avoid appending short strings to long strings.

The short-term behavior appears to be quite good, in comparison to our original function. Computer scientists like to analyze the "asymptotic behavior" of a function or computer program algorithm, which means to study its long-term behavior by testing it with larger inputs. Sometimes without doing further tests, one never becomes aware of ways that a computer program could be improved. To see what will happen, we're going to create a 200-byte string.
The problem that shows up with stringFill2()

Using our timing function, we find that the time increases to 62.54 microseconds for a 200-byte string, compared to 27.68 for a 100-byte string. It seems like the time should be doubled for doing twice as much work, but instead it's tripled or quadrupled. From programming experience, this result seems strange, because if anything, the function should be slightly faster since work is being done more efficiently (200 bytes per function call rather than 100 bytes per function call). This issue has to do with an insidious property of JavaScript strings: JavaScript strings are "immutable."

Immutable means that you cannot change a string once it's created. By adding on one byte at a time, we're not using up one more byte of effort. We're actually recreating the entire string plus one more byte.

In effect, to add one more byte to a 100-byte string, it takes 101 bytes worth of work. Let's briefly analyze the computational cost for creating a string of N bytes. The cost of adding the first byte is 1 unit of computational effort. The cost of adding the second byte isn't one unit but 2 units (copying the first byte to a new string object as well as adding the second byte). The third byte requires a cost of 3 units, etc.

C(N) = 1 + 2 + 3 + ... + N = N(N+1)/2 = O(N2). The symbol O(N2) is pronounced Big O of N squared, and it means that the computational cost in the long run is proproportional to the square of the string length. To create 100 characters takes 10,000 units of work, and to create 200 characters takes 40,000 units of work.

This is why it took more than twice as long to create 200 characters than 100 characters. In fact, it should have taken four times as long. Our programming experience was correct in that the work is being done slightly more efficiently for longer strings, and hence it took only about three times as long. Once the overhead of the function call becomes negligible as to how long of a string we're creating, it will actually take four times as much time to create a string twice as long.

(Historical note: This analysis doesn't necessarily apply to strings in source code, such as html = 'abcd\n' + 'efgh\n' + ... + 'xyz.\n', since the JavaScript source code compiler can join the strings together before making them into a JavaScript string object. Just a few years ago, the KJS implementation of JavaScript would freeze or crash when loading long strings of source code joined by plus signs. Since the computational time was O(N^2) it wasn't difficult to make Web pages which overloaded the Konqueror Web browser or Safari, which used the KJS JavaScript engine core. I first came across this issue when I was developing a markup language and JavaScript markup language parser, and then I discovered what was causing the problem when I wrote my script for JavaScript Includes.)

Clearly this rapid degradation of performance is a huge problem. How can we deal with it, given that we cannot change JavaScript's way of handling strings as immutable objects? The solution is to use an algorithm which recreates the string as few times as possible.

To clarify, our goal is to avoid adding short strings to long strings, since in order to add the short string, the entire long string also must be duplicated.
How the algorithm works to avoid adding short strings to long strings

Here's a good way to reduce the number of times new string objects are created. Concatenate longer lengths of string together so that more than one byte at a time is added to the output.

For instance, to make a string of length N = 9:
x = 'x';
s = '';
s += x; /* Now s = 'x' */
x += x; /* Now x = 'xx' */
x += x; /* Now x = 'xxxx' */
x += x; /* Now x = 'xxxxxxxx' */
s += x; /* Now s = 'xxxxxxxxx' as desired */
x = 'x'; s = ''; s += x; /* Now s = 'x' */ x += x; /* Now x = 'xx' */ x += x; /* Now x = 'xxxx' */ x += x; /* Now x = 'xxxxxxxx' */ s += x; /* Now s = 'xxxxxxxxx' as desired */

Doing this required creating a string of length 1, creating a string of length 2, creating a string of length 4, creating a string of length 8, and finally, creating a string of length 9. How much cost have we saved?

Old cost C(9) = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 9 = 45.

New cost C(9) = 1 + 2 + 4 + 8 + 9 = 24.

Note that we had to add a string of length 1 to a string of length 0, then a string of length 1 to a string of length 1, then a string of length 2 to a string of length 2, then a string of length 4 to a string of length 4, then a string of length 8 to a string of length 1, in order to obtain a string of length 9. What we're doing can be summarized as avoiding adding short strings to long strings, or in other words, trying to concatenate strings together that are of equal or nearly equal length.

For the old computational cost we found a formula N(N+1)/2. Is there a formula for the new cost? Yes, but it's complicated. The important thing is that it is O(N), and so doubling the string length will approximately double the amount of work rather than quadrupling it.

The code that implements this new idea is nearly as complicated as the formula for the computational cost. When you read it, remember that >>= 1 means to shift right by 1 byte. So if n = 10011 is a binary number, then n >>= 1 results in the value n = 1001.

The other part of the code you might not recognize is the bitwise and operator, written &. The expression n & 1 evaluates true if the last binary digit of n is 1, and false if the last binary digit of n is 0.
New highly-efficient stringFill3() function
1 function stringFill3(x, n) {
2 var s = '';
3 for (;;) {
4 if (n & 1) s += x;
5 n >>= 1;
6 if (n) x += x;
7 else break;
8 }
9 return s;
10 }
view plain | print | ?
function stringFill3(x, n) { var s = ''; for (;;) { if (n & 1) s += x; n >>= 1; if (n) x += x; else break; } return s; }

It looks ugly to the untrained eye, but it's performance is nothing less than lovely.

Let's see just how well this function performs. After seeing the results, it's likely that you'll never forget the difference between an O(N2) algorithm and an O(N) algorithm.

(See Figure 3)

stringFill1() takes 88.7 microseconds (millionths of a second) to create a 200-byte string, stringFill2() takes 62.54, and stringFill3() takes only 4.608. What made this algorithm so much better? All of the functions took advantage of using local function variables, but taking advantage of the second and third optimization techniques added a twenty-fold improvement to performance of stringFill3().
Deeper analysis

What makes this particular function blow the competition out of the water?

As I've mentioned, the reason that both of these functions, stringFill1() and stringFill2(), run so slowly is that JavaScript strings are immutable. Memory cannot be reallocated to allow one more byte at a time to be appended to the string data stored by JavaScript. Every time one more byte is added to the end of the string, the entire string is regenerated from beginning to end.

Thus, in order to improve the script's performance, one must precompute longer length strings by concatenating two strings together ahead of time, and then recursively building up the desired string length.

For instance, to create a 16-letter byte string, first a two byte string would be precomputed. Then the two byte string would be reused to precompute a four-byte string. Then the four-byte string would be reused to precompute an eight byte string. Finally, two eight-byte strings would be reused to create the desired new string of 16 bytes. Altogether four new strings had to be created, one of length 2, one of length 4, one of length 8 and one of length 16. The total cost is 2 + 4 + 8 + 16 = 30.

In the long run this efficiency can be computed by adding in reverse order and using a geometric series starting with a first term a1 = N and having a common ratio of r = 1/2. The sum of a geometric series is given by a1 / (1-r) = 2N.

This is more efficient than adding one character to create a new string of length 2, creating a new string of length 3, 4, 5, and so on, until 16. The previous algorithm used that process of adding a single byte at a time, and the total cost of it would be n (n + 1) / 2 = 16 (17) / 2 = 8 (17) = 136.

Obviously, 136 is a much greater number than 30, and so the previous algorithm takes much, much more time to build up a string.

To compare the two methods you can see how much faster the recursive algorithm (also called "divide and conquer") is on a string of length 123,457. On my FreeBSD computer this algorithm, implemented in the stringFill3() function, creates the string in 0.001058 seconds, while the original stringFill1() function creates the string in 0.0808 seconds. The new function is 76 times faster.

The difference in performance grows as the length of the string becomes larger. In the limit as larger and larger strings are created, the original function behaves roughly like C1 (constant) times N2, and the new function behaves like C2 (constant) times N.

From our experiment we can determine the value of C1 to be C1 = 0.0808 / (123457)2 = .00000000000530126997, and the value of C2 to be C2 = 0.001058 / 123457 = .00000000856978543136. In 10 seconds, the new function could create a string containing 1,166,890,359 characters. In order to create this same string, the old function would need 7,218,384 seconds of time.

This is almost three months compared to ten seconds!

No comments: