Scala Performance Basic (part II)

This post is a continuation of the previous post: (the title was changed to fit the purpose better)

In previous post, we have examined how JVM handles Scala code and found out that it is not smart enough to optimise out a loop invariant. That means we can’t rely on the compiler to figure out how to run our code quickly but to manage that ourselves. So that is one rule for ensuring performance, what about the other rules? We shall reuse our example to check what else can we learn.

The code:

Our previous example was a very simple one, finding intersection between two list of integer. Now let’s try a straight forward solution: We use a temporary variable to store the result while traversing through two lists using nested loops.

def forIntersect1(s1: IndexedSeq[Int], s2: IndexedSeq[Int]): Seq[Int] = {
  val result: ListBuffer[Int] = ListBuffer()
    for (x <- s1.indices) {
    for (y <- s2.indices) {
      if (s1(x) == s2(y))

Note that the type of the parameters in this case is IndexedSeq, not plain Seq. This will ensure that seeking an element by its index will take constant time. Default class for Seq is List which has O(n) access time by index as it is a linked list implementation. If you use this list then the runtime will grow exponentially!

And the second one is an idiomatic Scala code using for comprehension, which is shorter and clearer:

def forIntersect2(s1: Seq[Int], s2: Seq[Int]): Seq[Int] = {
  for {
    x <- s1
    y <- s2 if x == y
  } yield y

For this code snipper, Scala will generate two closures, pass to monadic filter to collect the result and build the result. Scala’s choice of using closure to implement monadic for notation is considered a trade off, slower speed but more expressive/powerful. So it is expected that this solution will run slower than the one above.

The result:

1. Traditional looping: 1.37508446E8
2. For comprehension: 7.8403791E7
3. Naive mapping: 1.469294276E9
4. Mapping (with toSet moveout): 3341655.0

Much to our surprise, the second solution runs twice as fast as the first one, despite the usage of closure. This may due to several reasons, one of them is that the first solution requires object unboxing every iteration for both arrays, while the second one only needs one. Aside from that, our previous example still runs the fastest as it has lower computational complexity: O(n*log(n)) instead of O(n^2) as other solutions.

The conclusions:

From the results above, we can draw a few simple rules for writing fast Scala program, or at least how to avoid writing slow ones:
– Know the big O of your algorithm, nothing can save your performance if you use the wrong one. This may seem obvious but there are plenty of cases performance can be increased by an order of magnitude just by switching the algorithm.
– Use the standard library and existing mechanism in language first before writing your own. They are well tested and should be able to fulfill most of your needs. It also makes your code simpler and easier to understand by other developers.
– Understand your code, try not to rely on compiler much to run code fast.
– Verify your understanding above with measurement by profiling, there are always surprises in how your code will perform.

Add a Comment

Scroll Up