Extractors in Scala

Extraction and reporting are intertwined. Data is presented to you in some kind of structured source, and you must pull it into meaningful program variables to be able to manipulate it and return a result. Some of the best language designs make this easy. Perl (Practical Extraction and Reporting Language) recognized this use case, and built a neat little feature into the language that transforms serialized data into variables in a single line of code.

The form looks similar to this:

($var1, $var2, $var3) = some_sequential_data_source

So, if I create an array reference in Perl:

$a = [1,2,3]

I can extract data from that array into meaningful variables with one line of code:

($one, $two, $three) = @$a;

This has big payoff when reading all the metadata in a UNIX filename:

# from man perlfunc and [perldoc](http://perldoc.perl.org/functions/stat.html)

($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat($filename);

It is definitely a convenience. Try doing that in C++ in a single line!

In Scala, there is a mechanism that can be constructed that appears to resemble this Perl-ism. It's called an extractor. Although there is a lot more code than one line to make it, plus a lot of implicit behavior under the hood, you can still employ a

sink = source

kind of pattern in your own code. With extractors you can transform a data source (even with variable arguments) into a set of coherent, meaningful variables in one line. Yes, you can kind of imitate Perl in Scala!

Just like the left hand side of the Perl-ism, out of the box you can write something like:

scala> val Array(one, two, three) = Array(1,2,3)
one: Int = 1
two: Int = 2
three: Int = 3

As you may have noticed, the left hand side names some variables to recieve data, and the right hand side supplies them. Source to Sink. Furthermore, you can create your own objects, with their very own transformation rules. This is done through the unapply() method, which can be added to an object. It is called implicitly like a constructor, when you use it on the left hand side of an assignment.

Unapply() is kind of the opposite of a mock constuctor for an object. Since objects are singleton entities that have no actual type, they have no constructor. But a mock constructor of a kind can be created by adding an apply() method.

object SillyPerson {
   def apply (name: List[String]) = name.mkString(" ");
   }

The apply() construct can only be used in a right-hand-side capacity (aka: as a Source, or supplier of data):

scala> val s = SillyPerson("Elmer"::"J."::"Fudpucker"::Nil)
s: String = Elmer J. Fudpucker

Unapply() is like a deconstructor. It can be designed to undo everything that an apply() method does. If we add one, we can now "deconstruct" a structured data source -- even the apply method itself, which is a good test.

// based on examples from "Programming in Scala", Odersky, Spoon & Venners

object SillyPerson {
   def apply (name: List[String]) = name.mkString(" ")
   def unapply (name: String): Option[(String, String, String)] = 
       { 
        val bits = name.split(" ")
        if (bits.length == 3) Some(bits(0), bits(1), bits(2))
        else None
        }
    }

scala> val SillyPerson(first, middle, last) = SillyPerson("Elmer"::"J."::"Fudpucker"::Nil)
first: String = Elmer
middle: String = J.
last: String = Fudpucker

Above, the left hand side of = represents an implicit call to the unapply() method: the object's acting pseudo-deconstructor and data sink. The right-hand-side represents a call to the objects pseudo-constructor and data source. Assigning the source to the sink should result in the same data we supplied to the source. The two methods should symmetrically compose and decompose the data to be considered correctly-implemented.

Note the unapply() method provides a return type of

Option[(String, String, String)]

Which gives structure and type-casting to the receiving variables: first, middle and last. The Some and None classes are wrappers around the data which tell the calling method there is either something or nothing returned, which is better than sending Null.

Below is an example of a varargs extractor, an unapply() that allows variable length lists to be processed and transformed. Seauences are used instead of a fixed Option type, allowing for more flexibility in handling paramters. In this case, a special variant of unapply is used, called unapplySeq(). This variant allows for sequences to be used in the Options return type, which can be iterated over by the caller. There is other special syntax when the unapplySeq() is actually called.

The basic description of this program is that a royal court (TheCourt) has been assembled. The Lord, Lady, an advisor, a general and some servants are all on a list of approved court-attenders. The list is ordered in a most-important to least-important ranking. The list starts off with a sloppy form, so it is first cleaned-up. Then someone in the court gets hold of it, and runs apply(). The apply() method messes up the list, reversing it and adding in an enemy spy. Last, the unapplySeq() method is run, and sets it all back correctly, removing the spy from the list and placing the ranks it back in the right order.

// based on sect 24.5 Programming in scala
// varags extractor example.

object TheCourt
   {
   // apply puts the list in the wrong order and there is an unwanted guest.

   def apply(cleanList: String): String =
      {
      (cleanList.split(",") :+ "EnemySpy").reverse.mkString(",")
      }

   // unapply dissects the list and undoes the damage

   def unapplySeq(wronglyOrderedList: String): Option[(String, String, Seq[String])] =
      {
      val rightList = wronglyOrderedList.split(",").reverse
      if (rightList.length > 1)
        Some(rightList(0), rightList(1), rightList.drop(2).dropRight(1))
      else
         None
       }
   }

val sloppyList = "Olaf the Magnificent, Helga the Great," +
   "Advisor Sollop,  General Cranston      ,       Jeeves,  Seppings  ," +
   "Mrs. North,Grell,    and whatsisface";

val cleanList = sloppyList.split(",").map(_.trim).mkString(",")

// assign the manipulations of the apply method to the unapply method. It should "undo" when combined

val TheCourt(lord, lady, theRest @ _*) = TheCourt(cleanList)

println("Original List: (cleaned up a bit)")
println("--------------")
println(cleanList)
println
println("Modified List: (apply)")
println("----------------------")
println(TheCourt(cleanList))
println
println ("Fixed and Restored List: (unapply)")
println ("----------------------------------")
println ("The Lord: [" + lord + "]")
println ("The Lady: [" + lady + "]")
for (name <- theRest)
   println ("  The court: [" + name + "]")