Tuesday, December 27, 2011

Using Scala API Documentation

One link that was missing from my post about Scala on the Web was the link to the Scala API documentation, often called "scaladoc" for short, after the tool that generates it. I tried to put it in, but it kind of broke the narrative and, to be honest, I have a lot to talk about the subject. So I decided to do a whole blog just about it.

First of all, there are two main links that you can use to access it:

Most of the time, I prefer the second link: it contains improvements to the Scaladoc tool, and the documentation often contains improvements. On the other hand, being a nightly release, it is subject to regresssions in the Scaladoc tool, and the documentation contains information that is not correct for the latest release. For that reason, I keep both links in my bookmarks.

So, what else is there to say? Well, if you are new to Scala, a lot. Those familiar with the Java equivalent probably find it both familiar and strange: the screen is divided into left and right sections, the right section having description of classes and packages, while the left contains a list of them. Whereas Java splits package and class lists, however, Scala does not. And, of course, the look is completely different. These, however, are skin-deep differences, and I want to get to the bones of the matter.


One Doc, Two Frames




The first thing I want you to notice are the two search boxes: one on the left, at the very top of the page, and one on the right, a bit lower. Scaladoc is search-oriented! That means you usually don't browse it, looking for stuff, but just type what you want. You can still browse, of course, which is useful when you don't know what you are looking for.

On the left side you have the package hierarchy and classes, traits and objects belonging to the packages. Note that classes, traits and objects can be members of other classes traits and objects, in which case they won't appear on the left.

The right side contains information about a selected package, trait, class or object. The URL will change to match what is being displayed on the right side of the screen, so that you can easily bookmark or share links to specific class or object. At the present, there's no way to further refer to a member of a class, such as a specific method. This improvement will be added at some point in the future.

Typically, you'll search or browse for a class on the left side of the screen, and then browse its contents on the left, or further search for a particular member of that class.

The Left Frame

Let's drill down the left side, then, to get a better understanding of what's there and how to use it. At the top of the left frame, you'll see this:


Topmost is the search box, with a little "x" icon on the right hand to clear the search. Searching for something will hide any traits, objects and classes that don't match, as well as any packages that don't have any matching members.

Right below that is an index of all existing methods. Click on a letter and you'll get all methods starting with that letter, and the places they are defined at. Click on the hash symbol (#), and you'll get a page with all methods starting with a symbol instead of a letter. For example:


Finally, you have a small caption saying "display packages only". Surprisingly, that's a clickable option. In fact, ScalaDoc is full of clickable parts, so if you are getting familiar with it, my advice is to try to click stuff, just to see what happens. Back to that caption, though, it switches the package hierarchy from displaying all entities to displaying packages only. Clicking it will give you something like this:


If you click on "show" on one of these packages, it will open up that particular package. If you click on "display all entities", it will revert to the initial mode of display.

Speaking of "show", let's now look at the various parts of the entities list:


The darker background indicates package or package objects, and the entities on the lighter background right below it are the classes, objects and traits belonging to that package, as indicated by the icons on their left. Note that, though package objects have other members such as methods, they do not appear on the left frame of Scaladoc.

The icons for "o", "c" and "t", in dark blue, green and light blue respectively, indicate objects, classes and traits. A name can be shared between an object and a trait or class, as seen for Regex above. Traits and classes cannot share the same name, however, so each name will have at most two icons beside it.

If you are not familiar with Scala, an object is a singleton, containing what, in Java, would be represented as static.

One thing to realize here is that everything in that image is clickable, except whitespace. You can click on "scala.util.matching" and "scala.util.parsing.combinator", on "hide" and "focus", on "Regex" and "RegexParsers", and on the icons.

Clicking on a package name will show its traits, classes and objects on the right frame, unless the package is a package object, in which case the right frame will show the other members it might have. This is important because many package objects contain implicit definitions used as helpers with that package. Check, for instance, scala.sys.process.

Clicking on "hide" will hide the entities belonging to that particular package, but not any subpackage that might exist. The text will then change to "show", which will revert this action upon being clicked.

Clicking on "focus" will hide all other packages from view, like this:


Clicking on the "x" icon that appears will revert this action.

Clicking on an icon for object, class or trait will show information for it on the right frame, as will clicking on the text itself. If an object shares a name with a trait or class, however, clicking on the text will show not the object, but the class or trait, following the assumption that this is what people want most of the time.

On recent ScalaDoc versions, clicking on an entity will also move the focus to the search box on the right frame, so that you can instantly type

The Right Frame

To begin the discussion of the right, I picked GenSeq, which is rich enough in ScalaDoc UI components, but (relatively) poor enough in actual content to fit in here.


You may have wondered why the search box is not at the top on the right frame, like it is on the left. The reason for it is our starting point in explaining the right frame.

The right frame is divided in three parts, the topmost two being shown above. The first part contains general information about the selected entity, and it comprises everything from the top until right before the search box. The second part starts with the search box, and is comprised of everything in that gray background. This part contains display options that affect the third part of the right frame. The third part contains all members of the selected entity, with individual information for each one.

From the top, then, we have a green or blue background (the former for traits and classes, the latter for objects, packages and package objects) on which the name of the entity is prominently displayed. A big icon beside it indicates what kind of entity it is: t for traits, c for classes, o for objects and p for packages and package objects alike.

If the icon is slightly folded on the bottom, like the one in the example, clicking on it (or the entity name) will switch the right frame to the companion of that entity. If you are not familiar with Scala, objects, traits and classes that share a name are said to be companions to each other. Clicking on the GenSeq trait above, then, will display the GenSeq object.

In a smaller font above the entity name is the full path to that entity. Clicking on a path component will display that component.

After that, in a gray background, comes a description of all classes, traits and type parameters used in the definition of that entity. It does not list classes or traits that are inherited -- that is available further below. Not to sound too repetitive, but clicking on any of the classes and traits will display it... Also, moving the mouse over one of these names will show the full path for that name.

What follows is the full description and attributes associated with that entity. I deliberately choose one extremely poor in those, so that I could concentrate on what ScalaDoc provides. One of those things is the link to the source code in which that entity is defined. Not all Scala projects provide that attribute, but Scala API itself does. The link currently points to the web interface to the old Subversion repository -- I presume this will be switched to the new repository on Github in the near future. At any rate, all niceties of source version control systems are available on that link. For the Scala API, anyway.

"Linear Supertypes" and "Known Subclasses" at the bottom of this part can be expanded upon click, to display the exact linearization of an entity's supertypes -- the inheritance precedence -- and all known subclasses. For example, for List it will show this:


Also shown above is the tool tip indicating the full path of one of the supertypes being pointed at, just like mentioned above for entity declaration.

Let's now look at the second part of the right frame:


The search box works pretty much like the one on the left frame, hiding anything that isn't a match, but it includes descriptions on the search as well. Search for a verb, therefore, will often yield good results, unless the verb is too general.

One particularly nice feature, available on recent versions, is that typing multiple words will search for matches of any one of the words, which makes it easy to display two methods close together on the screen.

All other options below the search box also change the way things are shown in the third part. The default Ordering mode, Alphabetic, will display all non-private members of an entity, separated in categories which we'll shown below, in alphabetical order. This is different than Java, which only shows full information for members defined or overridden on that class/interface.

Clicking "By inheritance" will change the display mode to separate the members according to where it was last defined/overridden. Full information will still be displayed, and the members will still be shown in alphabetical order in their own section.

The Inherited options let you easily filter out inherited methods. Clicking on "Hide All" will toggle off all supertypes, leaving only the entity itself selected. This will hide methods that are not abstract, defined or overridden on the entity. For GenSeq, for example only two methods will be displayed: seq and companion.

Note that clicking on "Show all" will not select AnyValAnyRef or Any. Because the methods defined on these are available on pretty much everything, one rarely needs to see these methods.

Because many times you might be interested in a particular aspect of an entity, you can also toggle each supertype individually. You can do so to display the methods on Any, for example, or you could look into what Function1 has to offer to List.

The last option, Visibility, will toggle between displaying only public members and everything except private members.

The last part of the right frame, as mentioned, contains all members of that entity. These are divided into type members and value members. A type member is a trait, a class or a type, and value members are everything else. For example, the object Regex contains this:


Note that anything that shows up as a type member of anything but a package will not be displayed on the left frame, even though you can display it on the right frame by clicking on a link to it.

Value members are actually divided in three separate sections: abstract value members, concrete value members and deprecated value members. These are shown in that precise order, so that one can easily see all members that must be implemented to make a concrete class, and don't get the screen polluted by methods they shouldn't be using -- deprecated methods -- when browsing.

Let's look at some important points of value member definitions. The snapshot below was taken from a List with members filtered by "map":


By default, only a member's definition and the first sentence of its description are shown, like method groupBy above. Clicking on either the small arrow on the left or on the definition itself will show the full information for that member, as seen in the two definitions for map.

Since Scala supports method overload, methods can have multiple definitions. In this particular case, however, the first definition is not a real one -- this is indicated by the [use case] tag. It is important to understand what use cases are, for two reasons: first, they represent the most common way to use the method, and, second, they are lies. Well-meaning lies, but lies.

Compare the definition of the two map methods shown. Clearly, the second definition has a lot going on, whereas the first definition is pretty clear: on a List[A] (see definition for List), the map method takes a function that converts an A into a B, and returns a List[B].

Though that definition works very fine for List, the map method is not defined on List, but much  higher on the hierarchy. And what works for List won't work for a BitSet, for example: since a BitSet is a Set[Int], if you map those Int into String, you won't be able to return a BitSet! After all, a BitSet  cannot be a Set[String]. The same thing happens in other cases: a WrappedString is a Seq[Char], a Map[A, B] is a Iterable[Tuple2[A, B]] (aka Iterable[(A, B)]), etc. In any of these cases, a map definition like in the use case won't work.

So the actual definition of map is the second one, which can handle all of these cases. If you need precise information about map -- for example, if you are extended Scala collections -- you can look it up. On the other hand, if you just want to know how a method is used, the use case should be much easier to understand.

Most of the remaining information is pretty obvious: a description follows, and then various attributes describing how parameters are supposed to be used, what the return type is, etc.

The final interesting thing here is the Definition Classes shown below each method. This only appears when a method has been inherited from elsewhere, and it indicates both where it was originally declared (which might have been as abstract), and all places where it was overridden, ie, the implementation has been changed.

And this concludes the tutorial on using the Scala API documentation, as well as the docs for any other library written in Scala. Looking back, it is much longer (and took much more time to write) than I first thought. Yet, rest assured that using ScalaDoc becomes second nature very fast!

Friday, December 23, 2011

Scala on the Web

I was wondering how to approach Scala on a dojo about it, and I kept returning to the idea that I should first introduce Scala on the Web. So, how would I go about it? What sites would I mention? Scala has grown such a rich ecosystem around it since I first started using it that I'm actually feeling a bit lost nowadays. And, let me tell you, that's a nice change! :-)

Nevertheless, the problem remains. As I started compiling the sites in my head, I quickly realized I'd have to write it down somewhere, and where better than in my own, dusty blog? So, here it is: the DCS guide to Scala on the Web!

For a long time, the starting point and central hub to Scala resources has been the Scala-Lang site. From there you can download Scala, find documentation, guides, support information, news... and it has become dated. Mind you, it still is a great resource, but some alternatives -- some official alternatives! -- have emerged since then.

One of them is Typesafe. Typesafe is a startup created around supporting Scala and related technologies. While this might not sound like a starter resource, they provide the Typesafe Stack, a set of tools including not only the language, but also Eclipse support, a build tool, a distributed system framework and a web framework. I think they plan to add more elements to it, but, anyway, go to their DOWNLOAD link and you'll have Scala up-and-running in no time.

Another important resource is the Documentation Site. A newly designed site to replace the old one on scala-lang. Scala-lang even has a link to it, though it still keeps the old one as well. Here you'll find guides, tutorials, cheatsheets, FAQ, and even a glossary of Scala terms and a Scala Style Guide. Oh, and a link to the Scala Wiki as well.

While on the wiki, please pay close attention to the Tools and Libraries page. You might not find everything there, and you might find things that have not been kept up-to-date, but even so it is an invaluable resource.

And speaking of FAQs, some brave souls took the time to compile Stack Overflow Scala Tutorial! The thing is nothing short of amazing: dozens, maybe more than a hundred of the most important questions categorized in 31 different topics, from "Introduction to Scala" to "Functional Scala", with a side dish of five additional topics for further learning.

By the way, if you don't know Stack Overflow, it is a question&answer site about programming questions -- only it actually works and is enjoyable to use, as opposed to their competition. More importantly, the Scala community is well represented on Stack Overflow, and you can use it to get questions about Scala answered in no time. In fact, it has probably been asked and answered already. :-)

The only problem with Stack Overflow is that it doesn't search for symbols, and Scala has a fair share of them. If you have a question about a symbol, use the Symbol Hound site to search for it.

What else? Well, there's Planet Scala, a blog feed aggregator, and Implicit.ly, a feed of Scala projects that is automatically fed by a plugin on the build system. It doesn't contain all Scala projects, of course, but it goes a long way.

And if you want to try Scala without ever installing it, there's Simply Scala, a Scala tutorial that let you execute the sample code, as well as try out code of your own.

If you want to get more dee involved with Scala itself, look at its github account. Scala, compiler and library, is there, as well as the documentation site I mentioned earlier.

Ok, I'm sure I left a bunch of sites out (like some news aggregators -- I don't follow them, but if you do, send me the link and I'll put them up), but let's address now some of the basic tools most people will be searching for.

I spoke earlier about a build tool, so let's discuss it a bit. The build tool of preference for Scala is SBT. If you search for it on the Internet, you might come up on the link to the old version (up to 0.7.7), which is hosted on Google Code. The newer versions (0.10.0+) are hosted on Github, and are incompatible with the older ones. Make sure you get the new version.

Alternatively, you may opt to install the SBT Extras script instead. It's the same SBT, but with a starter script that provides a richer set of options. There's also a couple of alternate starter scripts on the SBT site, called "screpl" and "scalas", which uses SBT to load dependencies while starting Scala's REPL (that's an "interpreter" console, so to speak) or starting Scala shell scripts.

You don't actually need to download Scala at all if you install SBT: SBT will download whatever version of Scala you tell it your project uses, requiring only that you have a JVM installed.

What about testing, what should you use? Scala is in the unfortunate position of having two excellent testing frameworks: ScalaTest and Specs2. They are both mature, fully-featured, actively developed and supported, and with big communities. There's no way to recommend one over the other: it really comes down to personal preference.

Both of them also support ScalaCheck, a testing framework that test stuff by automatically generating input and verifying if the specified conditions hold true. ScalaCheck is great to find boundary conditions, and, as mentioned, you can use its checker under both ScalaTest and Specs2.

You can use existing Java mocking frameworks (Specs2 has special support for Mockito, though Specs, the older version, supported JMock and EasyMock as well), but there's a Scala mocking framework available as well: ScalaMock. ScalaMock has advanced features, such as mocking Scala objects (equivalent to Java static members) and even constructors! That means you can actually mock the behavior of a "new" statement without replacing it with factories or dependency injection.

To wrap it up, web frameworks and database access.

Scala has many web frameworks available, and there's even a healthy reuse of components between these frameworks. For instance, Scalate, a templating engine that can even serve as web server on a stretch, seems to be used by pretty much everyone nowadays. But let's look into the main alternatives.

Lift is the oldest one (at least among the more serious frameworks), and has a strong community. David Pollack, It's author looked into a number of successful frameworks and decided to pick the best of the best, add a few twists of his own, and use Scala's power to provide an incredible piece of software. Among Lift's strengths, there's a strong separation of concerns (web page design and code are strictly separated), a seamless AJAX/Comet integration, powerful wizards for common concerns, and a design that took security considerations into account right from the beginning. If you do go with Lift, however, please make use of their mailing list as support -- that is their main support channel, and they prefer to concentrate their help there.

To those who are familiar with Sinatra, there's Scalatra and Bowler, that latter being built on top of the former. And I might well be mistaken, but Circumflex seems to go the same way as well.

If providing a web service is just a small part of your application, you might want to opt for Unfiltered instead. With Unfiltered, the web server is a library in your application, instead of your application being a servlet inside a framework.

And if what you really need are web services to interconnect systems, try Blue Eyes.

Now, there are a lot of other web frameworks, I'll mention just one other. Typesafe has decided to integrate the Play! Framework into its stack. Play! offers the ease of web development that PHP has, but with all the advantages Scala has to offer.

Which leaves us, at last, with databases. Again, there are many options to choose from. If you are doing web development, the framework of your choice probably already has some recommended libraries to deal with it -- their own or others (did I mention that there's a healthy reuse of components? :). I suggest you go with that.

If not, I can make some suggestions. I don't have personal experience with this, so I'm mostly recommending based on what I perceive to be preferred choices with active communities. There's ScalaQuery, which has always been a favorite of mine (just waiting for a project where I can put it to use :). Querulous and Squeryl also get a lot of traffic, but once you get to NoSQL, the main choice seems to be MongoDB's Casbah driver. I have played a little bit with it, and it is certainly quite easy to get up and running for small projects or experiments.

Do look into the existing choices, however, as there might be something better suited to your needs.

And if you haven't payed attention the first time I linked to it, the Wiki of Tools and Libraries is a very good resource to get started on specialized, well, tools and libraries. :-)

If you are new to Scala, I hope this can get you going. If you are experienced, I hope you can use this when helping others. Of course, this post will get old, the links will get outdated, and new cool stuff will come up.

For now, enjoy!

Wednesday, October 12, 2011

String Interpolation on 2.10?

Happiness is made of small things...

Welcome to Scala version 2.10.0.r25815-b20111010230908 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_23).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val x = 1
x: Int = 1

scala> "\{x}"
res0: String = 1

Requires the -Xexperimental flag. This test case reveals even more interesting things:

scala> val x = 1.1
x: Double = 1.1

scala> println("We have a \{ x ;2.2f}% chance of success")
We have a 1,10% chance of success

Mind you, there's a lot of things that have been hidden behind -Xexperimental, some of them for a long time now, and some of them ended up canned.
Edit: The proposal for this extension can be found here.

Tuesday, August 30, 2011

A quick detour through Grub and choosing default operating system

I'm going to do a quick detour to talk about grub.

Grub is the boot manager of choice for Linux systems, and the one installed by default by Ubuntu, among others. My desktop at home dual boots between Ubuntu and Windows (I have gaming needs, after all), and Windows is the default operating system, so my wife doesn't have to do anything.

Setting that default was not exactly trivial. The web abounds with instructions on how to choose the default, most of which refer to the previous version of grub. About the newer version, not so much information.

So, in the end I used Ubuntu's Startup Manager to set this up. By the way, can anyone explain to me why you use Startup Manager to choose which OS to boot, and Bootup Manager to choose which processes will start? Sorry, I digress...

That worked nicely until the day I upgraded the system, at which point a new entry was created, changing the relative position of the Windows boot. That was disagreeable, so I decided to take a closer look.

The configuration used by grub during boot is located at /boot/grub, in particular /boot/grub/grub.cfg. But you shouldn't edit this file directly -- it is generated from the scripts located at /etc/grub, plus configuration on /etc/default/grub. Usually, it is this latter file you should edit.

To change the default operating system, you edit /etc/default/grub, change the setting GRUB_DEFAULT, and then run update-grub. Alas, the Startup Manager will do all this for you, but there is one thing the Startup Manager doesn't do...

So, here's the trick. The GRUB_DEFAULT is usually set to a number, indicating the relative position of the entry you want, but it can also be set to a name! To see what the entry names are, you can do a "grep menuentry /boot/grub/grub.cfg" -- the names are the strings between single or double-quotes right after "menuentry". For example:


$ grep menuentry /boot/grub/grub.cfg
menuentry 'Ubuntu, with Linux 2.6.38-11-generic' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 2.6.38-11-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 2.6.38-10-generic' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 2.6.38-10-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 2.6.38-8-generic' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry 'Ubuntu, with Linux 2.6.38-8-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
menuentry "Memory test (memtest86+)" {
menuentry "Memory test (memtest86+, serial console 115200)" {

This also shows why Startup Manager uses numbers instead of names: so that the latest Linux is always chosen. This seems a poor choice for me, for anything except the first entry, but there you go.

Anyway, once you got the name, all you have to do is edit /etc/default/grub, change the setting GRUB_DEFAULT, so that it is set to the name of the entry you want (don't forget the quotes), and then run update-grub.

Tuesday, July 5, 2011

Build a binary tree from pre-order traversal and in-order traversal

This post comes from i has 1337 code by way of An Algorithm a Day. I'm really just posting this to show the power of Scala collections.

The idea is that you have two sequential representations of a tree: one for in-order traversal, and the order for pre-order traversal. Just one of these representations are not enough to reconstruct the tree, but the two of them, in the absence of duplicate elements, is. See the links above for an example.

The problem lends itself to recursive solutions, but there's some list manipulation required to build the input for the recursive steps. The key point of the algorithm is the realization that the first element of pre-order is the root, and every element to the left of said root in the in-order belongs to the left of the tree, and every element to the right belongs to the right of the tree. The rest is pretty trivial.

Even so, Scala makes the trivial, well, trivial. Here's the full code:

case class Tree[T](el: T, left: Option[Tree[T]], right: Option[Tree[T]])

def mkTree[T](preorder: List[T], inorder: List[T]): Option[Tree[T]] = preorder.headOption map { head =>
    val (left, _ :: right) = inorder span (head !=)
    Tree(head, 
         mkTree(preorder filter (left contains), left), 
         mkTree(preorder filter (right contains), right))
}

Note: I'm using List instead of Seq so that I can use the :: extractor.

Monday, June 27, 2011

A very quick guide to project creation on SBT 0.10.0

SBT 0.10.0 is out, and it is a very different beast at first. Aside from the need for retooling and relearning, it is a very big improvement.

However, these minor differences can slow down things a bit. For one thing, previous versions of SBT asked if you wanted to create a project if run on an empty directory, and that does not happen anymore. Now it creates some stuff automatically, but other stuff -- project name, version, scala version, directory layout -- it doesn't. So, let's take a quick look on how to accomplish (roughly) the same tasks.

In the example below, I named my SBT 0.10.0 script xsbt, so that I could use the older 0.7.7 with projects that are not yet migrated.

dcs@ayanami:~/github$ mkdir TestProject
dcs@ayanami:~/github$ cd TestProject
dcs@ayanami:~/github/TestProject$ xsbt
Getting net.java.dev.jna jna 3.2.3 ...
:: retrieving :: org.scala-tools.sbt#boot-app
 confs: [default]
 1 artifacts copied, 0 already retrieved (838kB/35ms)
Getting Scala 2.8.1 (for sbt)...
:: retrieving :: org.scala-tools.sbt#boot-scala
 confs: [default]
 4 artifacts copied, 0 already retrieved (15296kB/232ms)
Getting org.scala-tools.sbt sbt_2.8.1 0.10.0 ...
:: retrieving :: org.scala-tools.sbt#boot-app
 confs: [default]
 34 artifacts copied, 0 already retrieved (6012kB/215ms)
[info] Set current project to root (in build file:/home/dcs/.sbt/plugins/)
[info] Set current project to default (in build file:/home/dcs/github/TestProject/)
> set name := "TestProject"
[info] Reapplying settings...
[info] Set current project to default (in build file:/home/dcs/github/TestProject/)
> set version := "1.0"
[info] Reapplying settings...
[info] Set current project to default (in build file:/home/dcs/github/TestProject/)
> set scalaVersion := "2.9.0-1"
[info] Reapplying settings...
Getting Scala 2.9.0-1 ...
:: retrieving :: org.scala-tools.sbt#boot-scala
 confs: [default]
 4 artifacts copied, 0 already retrieved (20447kB/186ms)
[info] Set current project to default (in build file:/home/dcs/github/TestProject/)
> session save
[info] Reapplying settings...
[info] Set current project to default (in build file:/home/dcs/github/TestProject/)
> exit
dcs@ayanami:~/github/TestProject$ find . -type d
.
./project
./project/target
./project/target/config-classes
./project/target/scala_2.8.1
./project/boot
./project/boot/other
./project/boot/other/net.java.dev.jna
./project/boot/other/net.java.dev.jna/jna
./project/boot/other/net.java.dev.jna/jna/3.2.3
./project/boot/scala-2.9.0-1
./project/boot/scala-2.9.0-1/lib
./project/boot/scala-2.8.1
./project/boot/scala-2.8.1/lib
./project/boot/scala-2.8.1/org.scala-tools.sbt
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-bin_2.7.7.final
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-src
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-bin_2.9.0.final
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-bin_2.8.1.final
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/xsbti
./target
./target/streams
./target/streams/$global
./target/streams/$global/$global
./target/streams/$global/$global/streams
./target/streams/$global/$global/streams/$global
dcs@ayanami:~/github/TestProject$ ls
build.sbt  project  target
dcs@ayanami:~/github/TestProject$ cat build.sbt


name := "TestProject"

version := "1.0"

scalaVersion := "2.9.0-1"

So far so good, but note that the directories for source are not created. The new version of SBT expects your IDE to do that (which is rather annoying for us vim users), or so it seems. However, the eclipse plugin can do that at the same time it creates the eclipse project. Here's an example:

dcs@ayanami:~/github/TestProject$ cat ~/.sbt/plugins/build.sbt 
resolvers += {
  val typesafeRepoUrl = new java.net.URL("http://repo.typesafe.com/typesafe/releases")
  val pattern = Patterns(false, "[organisation]/[module]/[sbtversion]/[revision]/[type]s/[module](-[classifier])-[revision].[ext]")
  Resolver.url("Typesafe Repository", typesafeRepoUrl)(pattern)
}

libraryDependencies <<= (libraryDependencies, sbtVersion) { (deps, version) => 
  deps :+ ("com.typesafe.sbteclipse" %% "sbteclipse" % "1.1" extra("sbtversion" -> version))
}
dcs@ayanami:~/github/TestProject$ xsbt
[info] Compiling 1 Scala source to /home/dcs/.sbt/plugins/project/target/scala_2.8.1/classes...
[info] Set current project to root (in build file:/home/dcs/.sbt/plugins/)
[info] Compiling 8 Scala sources to /home/dcs/.sbt/staging/a69240767cc8e721757e/target/scala-2.8.1.final/classes...
[info] Set current project to default (in build file:/home/dcs/github/TestProject/)
> eclipse create-src   
[info] Updating...
[info] Done updating.
[info] Successfully created Eclipse project files. Please select the appropriate Eclipse plugin for Scala 2.9.0-1!
> exit
dcs@ayanami:~/github/TestProject$ find . -type d
.
./project
./project/target
./project/target/config-classes
./project/target/scala_2.8.1
./project/boot
./project/boot/other
./project/boot/other/net.java.dev.jna
./project/boot/other/net.java.dev.jna/jna
./project/boot/other/net.java.dev.jna/jna/3.2.3
./project/boot/scala-2.9.0-1
./project/boot/scala-2.9.0-1/lib
./project/boot/scala-2.8.1
./project/boot/scala-2.8.1/lib
./project/boot/scala-2.8.1/org.scala-tools.sbt
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-bin_2.7.7.final
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-src
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-bin_2.9.0.final
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/compiler-interface-bin_2.8.1.final
./project/boot/scala-2.8.1/org.scala-tools.sbt/sbt/0.10.0/xsbti
./target
./target/streams
./target/streams/$global
./target/streams/$global/ivy-sbt
./target/streams/$global/ivy-sbt/$global
./target/streams/$global/ivy-configuration
./target/streams/$global/ivy-configuration/$global
./target/streams/$global/update
./target/streams/$global/update/$global
./target/streams/$global/project-descriptors
./target/streams/$global/project-descriptors/$global
./target/streams/$global/$global
./target/streams/$global/$global/streams
./target/streams/$global/$global/streams/$global
./target/scala-2.9.0.1
./target/scala-2.9.0.1/cache
./target/scala-2.9.0.1/cache/update
./src
./src/main
./src/main/resources
./src/main/java
./src/main/scala
./src/test
./src/test/resources
./src/test/java
./src/test/scala

That's it! I strongly recommend reading the wiki linked at the beginning of this post, but this will get you going for small stuff.

Saturday, May 21, 2011

Scala 2.9 optimizes for comprehensions way better!

Ok, I completely missed this. For comprehensions in Scala 2.9 was way better optimized with the parameter -optimize than they were before! Take this code:

class OptEx {
    def sum(l: Array[Int]) = {
        var acc = 0
        for (i <- 0 until l.length) acc += l(i)
        acc
    }
}
This is the java bytecode generated with Scala 2.8.1 for the method sum:
public int sum(int[]);
  Code:
   0:   new     #7; //class scala/runtime/IntRef
   3:   dup
   4:   iconst_0
   5:   invokespecial   #12; //Method scala/runtime/IntRef."":(I)V
   8:   astore_2
   9:   new     #14; //class scala/runtime/RichInt
   12:  dup
   13:  iconst_0
   14:  invokespecial   #15; //Method scala/runtime/RichInt."":(I)V
   17:  aload_1
   18:  arraylength
   19:  invokevirtual   #19; //Method scala/runtime/RichInt.until:(I)Lscala/collection/immutable/Range$ByOne;
   22:  new     #21; //class OptEx$$anonfun$sum$1
   25:  dup
   26:  aload_0
   27:  aload_1
   28:  aload_2
   29:  invokespecial   #24; //Method OptEx$$anonfun$sum$1."":(LOptEx;[ILscala/runtime/IntRef;)V
   32:  invokeinterface #30,  2; //InterfaceMethod scala/collection/immutable/Range$ByOne.foreach$mVc$sp:(Lscala/Functio
n1;)V
   37:  aload_2
   38:  getfield        #34; //Field scala/runtime/IntRef.elem:I
   41:  ireturn
And this is what Scala 2.9.0 does:
public int sum(int[]);
  Code:
   0:   new     #7; //class scala/runtime/IntRef
   3:   dup
   4:   iconst_0
   5:   invokespecial   #12; //Method scala/runtime/IntRef."":(I)V
   8:   astore  6
   10:  new     #14; //class scala/runtime/RichInt
   13:  dup
   14:  iconst_0
   15:  invokespecial   #15; //Method scala/runtime/RichInt."":(I)V
   18:  aload_1
   19:  arraylength
   20:  istore_3
   21:  astore_2
   22:  getstatic       #21; //Field scala/collection/immutable/Range$.MODULE$:Lscala/collection/immutable/Range$;
   25:  aload_2
   26:  invokevirtual   #25; //Method scala/runtime/RichInt.self:()I
   29:  iload_3
   30:  invokevirtual   #29; //Method scala/collection/immutable/Range$.apply:(II)Lscala/collection/immutable/Range;
   33:  dup
   34:  astore  8
   36:  invokevirtual   #34; //Method scala/collection/immutable/Range.length:()I
   39:  iconst_0
   40:  if_icmple       83
   43:  aload   8
   45:  invokevirtual   #37; //Method scala/collection/immutable/Range.last:()I
   48:  istore  4
   50:  aload   8
   52:  invokevirtual   #40; //Method scala/collection/immutable/Range.start:()I
   55:  istore  9
   57:  iload   9
   59:  iload   4
   61:  if_icmpne       89
   64:  iload   9
   66:  istore  5
   68:  aload   6
   70:  aload   6
   72:  getfield        #44; //Field scala/runtime/IntRef.elem:I
   75:  aload_1
   76:  iload   5
   78:  iaload
   79:  iadd
   80:  putfield        #44; //Field scala/runtime/IntRef.elem:I
   83:  aload   6
   85:  getfield        #44; //Field scala/runtime/IntRef.elem:I
   88:  ireturn
   89:  iload   9
   91:  istore  7
   93:  aload   6
   95:  aload   6
   97:  getfield        #44; //Field scala/runtime/IntRef.elem:I
   100: aload_1
   101: iload   7
   103: iaload
   104: iadd
   105: putfield        #44; //Field scala/runtime/IntRef.elem:I
   108: iload   9
   110: aload   8
   112: invokevirtual   #47; //Method scala/collection/immutable/Range.step:()I
   115: iadd
   116: istore  9
   118: goto    57

Time to take your old benchmarks out of the closet, people!

Thursday, May 19, 2011

Regex Again

I have been thinking about regex lately. I have never felt comfortable with how Scala regex works, but I could never settle on what should be done about. Recently, I have started more and more of thinking of regex like this:

class RegexF(pattern: String) extends String => Option[Seq[String]]

or, perhaps,

class RegexPF(pattern: String) extends PartialFunction[String, Seq[String]]

In fact, RegexPF.lift would (could) yield a RegexF. It then caught my attention that RegexF.apply has the same signature as Regex.unapplySeq, which is the standard way of handling regex in Scala!

Might this be what has been bugging me about Scala's regex all along? Should we translate

val YYYYMMDD = """(\d{4})-(\d{2})-(\d{2})""".r
val MMDDYYYY = """(\d{2})/(\d{2})/(\d{4})""".r

def getYear(s: String) = s match {
    case YYYYMMDD(year, _, _) => year
    case MMDDYYYY(_, _, year) => year
}

into

val YYYYMMDD = """(\d{4})-(\d{2})-(\d{2})""".r
val MMDDYYYY = """(\d{2})/(\d{2})/(\d{4})""".r andThen (fields => fields.last +: fields.init)

def getYear(s: String) = ((YYYYMMDD orElse MMDDYYYY) andThen (_.head))(s)

I can certainly see the advantages of pattern matching, but... it doesn't compose very well. And it has some performance issues, which is a big deal for most regex usages. And being a PartialFunction would not prevent a Regex from having extractors as well.

Saturday, May 14, 2011

A Cute Algorithm

These days I read about an algorithm challenge: given two sorted arrays, find the k-th minimum element from their merge.

Well, if you do merge them, you can just get the element at index k, and the merge can be done in O(n + m), where n and m are the respective size of each array.

The solution given is O(k) and pretty simple: keep an index into each array, and increase one or other until you reach k. It can be done in O(log k), though, and, fortunately for me, my first idea on how to solve it in O(k) is more easily adaptable.

My own O(k) version is like this: point an index at the k-th element on the first array, and another at the first element of the second array. If the element on the first array is smaller than the element on the first array, return that. Otherwise, as long as the element on the first array is bigger than the element on the second array, decrease the first index and increase the second. After doing that, you'll have the elements on each array that make up the k smallest elements, the k-th being the bigger between the top one in each array.

In code, something like this:

def kMin(a1: Array[Int], a2: Array[Int], k: Int) = {
    def recurse(k2: Int): Int =
        if (a1(k - k2 - 1) < a2(k2)) recurse(k2 + 1)
        else k2

    if (a1(k - 1) < a2(0)) a1(k - 1)
    else {
      val k2 = recurse(1)
      a1(k - k2 - 1) max a2(k2 - 1)
    }
}

Now, that code isn't particularly good, as there are some conditions that can break it. For instance, if the first array's size is smaller than k, you'll get an array index out of bounds exception. However, it gives the basis for explaining the O(k) algorithm.

Here we search linearly for the k smallest elements of both arrays together, but we know these arrays are sorted. So, instead of going one by one, we can use binary search instead, and turn it into O(log k)!

The concept is simple. We are looking into the k smallest elements of the two array together, so we know beforehand that the maximum number of elements we need to look into either array is k.

We'll search one array for the biggest element that is smaller than or equal to the k-th minimum, with the upper bound being the k-th element of that array, and the lower bound being 0 (meaning the k smallest elements are all in the other array).

To check if the number x-th is among the k-smallest ones, we see if that number is smaller than the (k - x)-th element on the other array. If it is, then x is among the k smallest. The intuitive explanation for that is that, if you take (k - x) elements from the one array and x elements from the other, you get exactly k elements. No element y > x in x's array will be smaller than x, since the array is sorted. And since the (k - x)-th element in the other array is also bigger than it, then no other element in the other array can be smaller either.

So, as long as we find an element that belongs in the k-th smallest, we move the lower bound. If we find an element that does not belong in the k-th smallest, we move the upper bound below it.

Once we find how many elements in one array belong in the k-smallest, we also know how many elements we must take from the other array. Pick the biggest among the biggest in each array, and you have the k-th smallest.

Here's the code below, which is much more concise than the above explanation. It finds the k-th smallest element, with k=1 being the smallest element of all. It assumes there are at least k elements overall on the arrays, though k may be bigger than the number of elements on one array. In fact, either array may be empty (but not both). One can find this code at my github repository, along with an sbt project and a Scalacheck test case.

    def kMin(a1: Array[Int], a2: Array[Int], k: Int): Int = {
      def select(k2: Int) = k2 match {
        case `k` => a2(k - 1)
        case 0   => a1(k - 1)
        case _   => a1(k - k2 - 1) max a2(k2 - 1)
      }
      
      def recurse(top: Int, bottom: Int): Int = {
        if (top == bottom) select(top)
        else {
          val x = (bottom + top) / 2 max bottom + 1
          if (a1(k - x) <= a2(x - 1)) recurse (x - 1, bottom)
          else recurse(top, x)
        }
      }
      
      recurse(k min a2.size, 0 max k - a1.size)
    }

Thursday, May 12, 2011

Scala 2.9 and Parallel collections

So, Scala 2.9.0 is out. Also, the Typesafe Stack is also out, which brings together Scala, Akka, and a few other things to get one up-and-running quickly. Much fun.

On the collection side of things, one of the first questions I saw was: do parallel collections share a common interface with standard collections. The answer is yes, they do, but not one that existed in 2.8.1.

You see, a trouble with parallel collections is that, now that they are available, people will probably be passing them around. If they could be passed to old code -- as it was briefly contemplated -- that old code could crash in mysterious ways. In fact, it happens with REPL itself.

For that reason, ALL of your code comes with a guarantee that it will only accept sequential collections. In other words, Iterable, Seq, Set, etc, they all now share a guarantee to be sequential, which means you cannot pass a parallel sequence to a method expecting Seq.

The parallel collections start with Par: ParIterable, ParSeq, ParSet and ParMap. No ParTraversable for now. These are guaranteed to be parallel. They can be found inside scala.collection.parallel, scala.collection.parallel.immutable, etc.

You can also get a parallel collection just by calling the ".par" method on it, and, similarly, the ".seq" method will return a sequential collection.

Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.

And, now, something fun to try out:

def p[T](coll: collection.GenIterable[T]) = coll foreach println; p(1 to 20); p((1 to 20).par)

Saturday, April 30, 2011

Expressive Code and the Alternative Vote

One of the joys of writing code in Scala is how expressive it looks. Instead of dealing with the minutia of handling data, I can concentrate on what the code is actually doing. With experienced Scala programmers, that goes without saying. Newcomers have a harder time, because they are still a bit confused by grammar and vocabulary to pay proper attention to what is being said, so to speak.

The most striking example of that comes from the use of Scala's collection. There are many powerful collection libraries out there, but you are usually made very aware that you are handling a collection -- code to be hidden inside a class, to avoid contaminating business logic with it. Scala's collections, on the other hand, can easily fit in the highest abstraction levels of the code.

Let me take an example from the upcoming British referendum about the adoption (or not) of Alternative Vote. In this voting system, each voter ranks the candidates in order of preference. Depending on the specifics of its implementation, it may be possible or not to leave candidates unranked. The winner is the candidate that manages to get 50% of the votes, with the the candidate with the least votes being removed and his or her votes being reassigned according to the voter's preferences until some candidate gets elected.

So, let's consider how one could implement the algorithm that decides who the winner is. Let's say we have a set of candidates, identified by their names, and a list of votes. Each vote is a list of candidates ranked by preference. From that, we have to produce the name of the winner. In other words:

def elect(candidates: Set[String], votes: Seq[Seq[String]]): String = {

So, where do we begin? There are three essential tasks: we need to tally the votes for every candidate, we need to see if the candidate with most votes has at least 50% of all votes, and we need to discover who's the candidate the the least amount of votes.

Let's say we did these tasks, and now we know who the candidate with least and most votes are, and we have a tally of votes for all candidates. In this case, we can return the winner very easily, if there's one:

    if (votesByCandidate(mostVotesCandidate) >= votes.size / 2) mostVotesCandidate

Which still leave us the problem of how to handle the (expected) usual case of no candidate reaching 50% on first tally. There's an easy solution for that, though: just remove the candidate with least votes from the pool of candidates, and try to get a winner out of that. It's easy, because we already have a function that does that:

    else elect(candidates - leastVotesCandidate, votes)

Of the three tasks posed earlier, two are pretty simple as well: deciding who's the candidate with least and most votes. We could sort all candidates by vote and take the first and last candidate, but we don't actually need to sort anything: just knowing top and bottom is enough. We can do that like this:


    val mostVotesCandidate = candidates maxBy votesByCandidate
    val leastVotesCandidate = candidates minBy votesByCandidate

Now all that's left is finding how many votes each candidate has. We could pick the first preference in each vote and do a tally on that, but some candidates may have been removed from the pool. Instead let's say the valid candidates of a vote are the ranked list of candidates for a vote that are still in the running. We can compute that for a vote by doing:

    def validCandidates(vote: Seq[String]): Seq[String] = vote filter candidates

This doesn't read right, actually. A minor complain some people have about collection methods is that filter seems to do exactly the opposite of what's wanted: if you say filter X (those for which X is true), then it will keep X instead of discarding it. So, when we say "filter candidates", it will keep these candidates in the vote, and discard the rest.

The other non-obvious thing about this line is "candidates" itself. What does it mean to say "filter candidates"? Well, "filter" takes a function which, given a value of the collection, will return true or false depending on whether that value must be kept or discarded. That means "candidates" must be a function which, given the name of a candidate, returns true or false.

However, "candidates" is a set! We declared it so in the very first line of code presented, didn't we? Well, in Scala a set is also a function that tests whether a value is present in the set or not, returning true or false accordingly. In fact, sequences and maps are also functions in Scala, the former from indices to values, and the latter from keys to values.

Well, enough about that. We can now just take the first candidate in the list of valid candidates and tally the votes... as long as all candidates are ranked in each vote. If that is not the case, then a vote may well not contain any candidates still in the running, in which case the list of valid candidates will be empty.

Since this example comes from the British referendum, and the AV proposed in that referendum does not require votes to rank all candidates, we'll deal with that. Let's say the first valid candidate of a vote may be some candidate or no one. That is,

    def firstValidCandidate(vote: Seq[String]): Option[String] = validCandidates(vote) headOption

We can then use this to get a list of first choices for all votes with a valid first candidate:

    val firstChoices = votes flatMap firstValidCandidate

The votes for a candidate are the number of first choices for that candidate. We'll make a map out of it to avoid recounting that every time.


    def votesFor(candidate: String) = firstChoices count (candidate ==)
    val votesByCandidate = candidates map (candidate => candidate -> votesFor(candidate)) toMap;

Finally, we have to do something about the possibility of no candidate reaching 50%, which is possible in a system where not all candidates are ranked. I don't know how the proposed system will do in that case, but I'll just choose the most voted candidate if there aren't more than two.

With that fix in, this is what the whole code looks like:

def elect(candidates: Set[String], votes: Seq[Seq[String]]): String = {
    def validCandidates(vote: Seq[String]): Seq[String] = vote filter candidates
    def firstValidCandidate(vote: Seq[String]): Option[String] = validCandidates(vote) headOption
    val firstChoices = votes flatMap firstValidCandidate
    def votesFor(candidate: String) = firstChoices count (candidate ==)
    val votesByCandidate = candidates map (candidate => candidate -> votesFor(candidate)) toMap;

    val mostVotesCandidate = candidates maxBy votesByCandidate
    val leastVotesCandidate = candidates minBy votesByCandidate

    if (votesByCandidate(mostVotesCandidate) >= votes.size / 2 || candidates.size <= 2) mostVotesCandidate
    else elect(candidates - leastVotesCandidate, votes)
}

While there's a Scala oddity here and there, the code is pretty clear for all that it is doing.

Tuesday, March 22, 2011

Scala popularity

I had 5 posts in this blog throughout 2010 -- two in January, two in June. One post January of this year. Given that, I'm pretty sure no one follows my blog except, perhaps, as a forgotten automatic tracker of some sort.

Well... I decided to blog about something back on Sunday, which is a terrible day to blog something if you want hits. Late Sunday. Looking at statistics for Monday, though, I see that I have hit three times more hits than my previous record in a single day.

It is a heartening indication of how much interest Scala is attracting nowadays.

Sunday, March 20, 2011

On Scala 2.9's road...

I suspect a lot of people are eagerly waiting for the parallel collections on Scala 2.9. The thing is... it's just not my thing. I like that it is being made available, but it's just not a pervasive feature for my small daily needs.

So, while I was somewhat bored by Scala 2.9, after the huge jump 2.8 was, there has been some nice improvements. For one thing, the jLine library used by REPL was replaced with one based on this (the canonical repository to the jLine actually used in Scala is here) giving a much superior experience. Now one can edit input that spans multiple lines (longer than the number of columns in the screen) without trouble, search the history, etc. There's even something to show its key-bindings: just type :keybindings.

And speaking of REPL, it doesn't stop there, by a long margin! There's :javap, which will happily decompile a class or file, :type which will show an expression's type without evaluating it, and :implicits to show what implicits are in scope. Add -v to that last one, and it will show those that come with Predef by default.

Those of you pasting code into REPL, or wanting to define companion objects, or pretty much any other feature that depends on the content of the next line, you'll be happy to know there's now :paste. Instead of instantly evaluating each line, it will wait until you hit ^D.

More recently, a few features came up that will help those that like to do Scala scripting. The -absolute-cp parameter will ensure relative paths on classpaths will be made absolute based on where the script is being run from, not where the compilation daemon was started at. If you don't even know what I'm talking about, then trust me: that will save you a lot of pain.

Another option, -max-idle, will let you specify how long the compilation daemon will stay up when idle, and even disable its auto-shutdown.

And just to make scripting even nicer, SBT's Process library is now available in Scala, as sys.process! Now we can do stuff like this:

import sys.process._

Process cat new URL("http://databinder.net/dispatch/About") !
"find src -name *.scala -exec grep null {} ;"  #|  "xargs test -z"  #&&  "echo null-free"  #||  "echo null detected"  !

Another interesting scripting feature is that not only will files containing a single object with a main method be runnable as if it were a script, but jar files produced by scala when it compiles scripts will be runnable like programs. For example:

% cat script.scala
object FiddleDeeMain {
  def main(args: Array[String]): Unit = {
    println(args mkString " ")
  }
}
% scala -nocompdaemon script.scala a b c
a b c

And, conversely,

% cat script2.scala
println(args mkString " ")

% scala -save script2.scala arg1 arg2
arg1 arg2

% scala script2.jar arg1 arg2
arg1 arg2


On the library side, a few things have happened too. Some changes where made to view, which made it much faster than it used to be. Arguably, its previous lack of performance resulted from a bug, so if you had performances issues with it, you might want to check it out again. Also on the performance front, another change with lots of potential is the introduction of a new hash code algorithm -- murmur3.

To those of you who like writing methods and classes with Numeric and Ordering, you'll probably like to know that you can now add "import Numeric.Implicits._" and "import Ordering.Implicits._" and avoid all that messy implicit parameter handling. For example:

import Numeric.Implicits._

def sum[N: Numeric](lst: List[N]) = lst reduceLeft (_ + _)

It might not seem much for a single method like that, but if you use this stuff often, I'm sure you see the advantages.

There are plenty of small changes that will make life easier, or more intuitive. For example, -5.abs will now work. Scaladoc is much faster. Many bugs have been fixed, even on places you'd never guess there were bugs (scary thought).

An interesting improvement is the introduction of DelayedInit. While most people won't ever hear of it -- unless it starts getting used in DSLs --, it enabled the rehabilitation of a much criticized feature: the Application trait. It is not that Application was a bad idea, but it was badly implemented. Given all that has been written about staying away from it, its new implementation was also given a new name, which we can now start using in our blogs everywhere: App.

On the realm of experimental stuff that may never make 2.9 at all, I'm pretty fond of -Yrich-exceptions. It adds some capabilities to exceptions on REPL. One example is lastException.show, which will display the source code location of the exception, if available (to use it, point SOURCEPATH to the source code).

There are plenty of other improvements, way too many to talk about: better docs, better error messages, better performance, bug fixes, more methods, new traits and classes, not to mention improvements made for people creating compiler plugins or working on Scala itself. If you want to get more dibs into Scala 2.9, there's a japanese site tracking the changes, though only looking through the commit log one can truly feel the scope of what Scala 2.9 is. Kudos to Scala's development team!

Monday, January 24, 2011

Testing with Scala for Fun and Profit

Sometimes I like Scala, and sometimes I really like Scala.

So, I was writing some algorithms to compute the median of a sequence of values, just for the fun of it, adapting from pseudo-code description of an algorithm. The problem with pseudo-code is that it is pseudo-precise, meaning my algorithm was pseudo-correct, so I wanted to test it to ensure it really worked.

As it happens, the median of values has a trivial implementation that could be used to test against. So, one import and one one-liner afterwards, I had a quick way to test it from the REPL, where I could then experiment with the failing test case to understand what went wrong:

import org.scalacheck.Prop._
forAll((lst: List[Double]) => lst.nonEmpty ==> (myAlgorithm(lst) == lst.sorted.apply((lst.size - 1) / 2))).check

That uses the Scalacheck library, the best way to test algorithms in my opinion. What I'm doing with forAll is saying that for all inputs (List[Double]) of that function, the conditions must hold true. Specifically, if the input is not empty, then the result of my algorithm must be equal to the result of the trivial implementation of median. That will result in a property (class Prop).

I then tell it to check that property, which will be done by automatically generating input, with some heuristics to improve the chance of catching boundary cases, and testing the condition for each such input. If 100 tests pass, it finishes by saying:

+ OK, passed 100 tests.

Or, if it fail at some point, it will say something like this:

! Falsified after 0 passed tests.                                             
> ARG_0: List("-1.0") (orig arg: List("-1.0", "8.988465674311579E307"))

To be honest, my one-liner was slightly longer, because I was using arrays and arrays in Java do not have a useful toString method, so I had to tell Scalacheck how to print the array. Both Scalatest and Specs support integration with Scalacheck too, so this can be easily turned into part of a test suite.

Now this helped me achieve correctness, but I was also interested in how fast the code could run. I had three different algorithms (including the trivial implementation), and two of them could be further refined by abstracting over the selection of a pivot (quicksort-style). At that point, I decided to build a small framework which would help me test everything automatically.

I wanted to test each algorithm three times for different input sizes. The basic measurement algorithm was done by one of the methods in my toolkit:

import java.lang.System.currentTimeMillis

def bench[A](n: Int)(body: => A): Long = {
  val start = currentTimeMillis()
  1 to n foreach { _ => body }
  currentTimeMillis() - start
}

This should be familiar to anyone who ever did microbenchmarking with Scala. With that in hand, another one liner got the results I wanted:

println(bench(50000)(myAlgorithm(Array.fill(500)(nextDouble))))

Which worked well enough for a while, but really didn't scale as I got more algorithm variations and tested them with different settings. So, to get the results for each algorithm, I wrote this:

import scala.util.Random.nextDouble

def benchmark(algorithm: Array[Double] => Double,
              arraySizes: List[Int]): List[Iterable[Long]] = 
    for (size <- arraySizes)
    yield for (iteration <- 1 to 3)
        yield bench(50000)(algorithm(Array.fill(size)(nextDouble)))

Which let me pass a list of sizes I wanted to test the stuff at, and run each benchmark three times, to give me a feel for the variation in the results. Next, I made a list of the algorithms I wanted tested:

val algorithms = sortingAlgorithm :: immutableAlgorithms

That's the list I started with, but it grew as I added other algorithms. As for the immutable algorithms, they were all the same method call, but passing different pivot selection as parameters. As it got a bit verbose, I decided to apply a bit of DRY. First, I made a list of my pivot selection algorithms:

val immutablePivotSelection: List[(String, Array[Double] => Double)] = List(
    "Random Pivot"      -> chooseRandomPivot,
    "Median of Medians" -> medianOfMedians,
    "Midpoint"          -> ((arr: Array[Double]) => arr((arr.size - 1) / 2))
)

Next, I used that list to produce a list of median algorithms using each of these pivot selections:

val immutableAlgorithms = for ((name, pivotSelection) <- immutablePivotSelection)
        yield name -> (findMedian(_: Array[Double])(pivotSelection))

With the list of algorithms in hand, it was a simple for comprehension to produce a list of results:

val benchmarkResults: List[String] = for {
    (name, algorithm) <- algorithms
    results <- benchmark(algorithm, arraySizes).transpose
} yield formattingString format (name, formatResults(results))

The transposition let me see each size on a different column, making it easier to compare the algorithms when displayed together. Anyway, once I had that ready, I could also easily use the list of algorithms to test them:

def test(algorithm: Array[Double] => Double, 
         reference: Array[Double] => Double): String = {
    def prettyPrintArray(arr: Array[Double]) = arr mkString ("Array(", ", ", ")")
    val resultEqualsReference = forAll { (arr: Array[Double]) => 
        arr.nonEmpty ==> (algorithm(arr) == reference(arr)) :| prettyPrintArray(arr)
    }
    Test.check(Test.Params(), resultEqualsReference)(Pretty.Params(verbosity = 0))
}

val testResults = for ((name, algorithm) <- algorithms)
    yield formattingString format (name, test(algorithm, sortingAlgorithm._2))

I could even make the test method more general, but parameterizing the type of the algorithm, and adding a parameter for a no-parameter function generating the input. This could be easily done, but it was not needed by the time I was finished.

I find the final result of a certain elegance (then again, I'm obviously biased), that's not the reason I really like Scala. What I really, really like about it, is how I could start very small, with a few easy commands on the REPL, and then use that as the basis for an increasingly more flexible framework to do the tests I wanted.

If anyone is interested in seeing the full code, it can be found here.