Working with Anorm and hierarchical data

Play Framework’s Anorm library offers Scala applications a way to access relational data. Anorm’s design is minimalist, and so Anorm-reliant applications end up dealing with some of the nuances of relational databases.

One of those nuances concerns SQL joins: joining tables in a SQL query tends to blur the hierarchical, many-to-one relationship they may have.

For example, when the following tables are joined on country ID to combine city data with country names…

city country
id country_id name population id name
1 1 Berlin 3462000 1 Germany
2 1 Hamburg 1796000 2 France
3 1 Cologne 1006000 3 Spain
4 2 Lyon 1488000
5 2 Marseille 1489000
6 2 Paris 10620000
7 3 Barcelona 5570000
8 3 Madrid 6574000

Population values from urban agglomeration data, United Nations’ 2011 World Urbanization Prospects.

…the hierarchy between countries and cities is not immediately evident in the result set.

But what if that hierarchy is critical to the desired presentation of the data? This blog post offers a solution. It walks through play-hierarchical-data, a simple application written with Play, Anorm, and an in-memory H2 database. The application retrieves the table data from above and displays it, placing cities in descending order of population by country.

play-hierarchical-data application

play-hierarchical-data on GitHubCode from the application is interspersed below. The full application is available on GitHub.

Case classes and parsers

The application follows the convention of the computer database sample application, which uses Anorm. In particular, it establishes Country and City as case classes corresponding to the database tables. Using Anorm’s parser API, it then builds up to a list method that joins the tables with SQL and returns a List[(Country, City)].

Sorting

After invoking the list() method…


val countriesAndCitiesUnsorted: List[(Country, City)] = City.list

…the application sorts the resulting data based on the desired presentation. It applies a primary sort on country name; where the country names are the same, it applies a secondary sort on population:


val countriesAndCities = countriesAndCitiesUnsorted.sortWith {

  case ((country0, city0), (country1, city1)) =>

    country0.name < country1.name ||
      (country0.name == country1.name &&
       city0.population > city1.population)
}

Restoring the hierarchy

To restore the data’s hierarchical relationship, the application relies on a foldLeft() operation. The operation loops over the List’s (Country, City) tuples and accumulates a Map[Country, Seq[City]]. On each loop iteration, if a given element’s Country does not yet appear in the Map, the Country becomes a new key, with the City as its value (packaged as a single-item Seq). If, on the other hand, the Country does appear in the Map, the Country’s Seq is supplemented with the City:


val countriesToCities =
  countriesAndCities.foldLeft(ListMap.empty[Country, Seq[City]]) {

  case (theMap, (country, newCity)) => {

    val cities = theMap.get(country) match {
      case None => Seq(newCity)
      case Some(existingCities) => existingCities :+ newCity
    }

    theMap + ((country, cities))
  }
}

Of the various immutable Map types, the application relies on a ListMap, as it’s guaranteed to maintain insertion order, and therefore the List[(Country, City)]’s sorting.

Displaying the result

With a simple template that iterates over the Map’s Country keys and their Seq[City] values, the application produces the desired output:

Screenshot of play-hierarchical-data

Closing thoughts: Why not groupBy()?

The List.foldLeft() operation described above groups data. So, why not work with List.groupBy() instead, which also produces a Map? That method, and the Map it produces, do not attempt to maintain the List’s order, which is critical to achieving the desired presentation.

Working Around Bad Dependency Declarations with sbt

When I noticed the Apache team had released a 1.1 version of FOP, I was excited to try it out. In a Play 2 application that already uses FOP 1.0, bringing in the update should’ve been an easy change: in the application’s Build.scala, simply locate FOP in the appDependencies

"org.apache.xmlgraphics" % "fop" % "1.0"

…and update the version number. But doing so leads to strange sbt errors:

::::::::::::::::::::::::::::::::::::::::::::::                       
::          UNRESOLVED DEPENDENCIES         ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.apache.avalon.framework#avalon-framework-api;4.2.0: not found
:: org.apache.avalon.framework#avalon-framework-impl;4.2.0: not found
::::::::::::::::::::::::::::::::::::::::::::::

What’s going on?

A Bad Dependency Declaration

One of FOP’s dependencies is Avalon Framework. The version used by FOP 1.0 had a group ID of org.apache.avalon.framework, while version 4.2, used by FOP 1.1, has a group ID of avalon-framework. Per Apache issue FOP-2151, though, the pom.xml for FOP 1.1 still references org.apache.avalon.framework. This bad dependency declaration prevents sbt from finding Avalon Framework and leads to the errors above.

The Workaround

My first thought was to add FOP 1.1 to my local repository and correct its pom.xml. But that solution would not have been portable: anybody with whom I would want to share the application would need to make the same addition.

A better solution lies in sbt’s “explicit URLs” feature (official documentation; Stack Overflow question). It’s intended for dependencies not available in any repository, but we can also use it to map a bad dependency declaration to the correct repository URL. We simply add addDependencies lines for Avalon Framework, placing them before the one for FOP:

"org.apache.avalon.framework" % "avalon-framework-api" % "4.2.0" from "http://repo1.maven.org/maven2/avalon-framework/avalon-framework-api/4.2.0/avalon-framework-api-4.2.0.jar",
"org.apache.avalon.framework" % "avalon-framework-impl" % "4.2.0" from "http://repo1.maven.org/maven2/avalon-framework/avalon-framework-impl/4.2.0/avalon-framework-impl-4.2.0.jar",
"org.apache.xmlgraphics" % "fop" % "1.1"

With that change in place, sbt can find Avalon Framework 4.2, and we’re in business with PDF generation via FOP 1.1!

Setting Up Samba

The steps we’ve looked at so far around setting up a home server are, in a sense, just a warmup. The ultimate step that brings the server to life is to set up Samba, the Linux software suite for serving files to Windows clients.

Choosing a Security Mode

Samba offers several security modes. The default is basic user-level security. Under this mode, you allow certain Linux users to connect to a Samba instance by creating passwords for them with smbpasswd.

I had no reason to deviate from this default, and I like that it associates actions taken within a share with a specific Linux user (rather than a general “share user”).

Selecting Share Directories

While I wanted to associate actions, and files themselves, with Linux users, I didn’t want to enforce security within the circle of Samba-enabled Linux users. Rather, any Samba user was to have access to any Samba-accessible file, whether it was her own, someone else’s, or a public file.

This desire led me away from setting up sharing within the /home hierarchy and toward  new top-level directories–two, to be exact.

User-Originated Data vs. Other Stuff

Why two top-level directories? Well, user data on a PC is typically a mix of user-originated files and files sourced from elsewhere, digital music being a prime example. Traditional Windows directory structures combine the two on a single volume, which is unfortunate: it’s easiest to do backups at the volume level, but they’re much less important for stuff we could simply re-download than for items we create.

To promote better backup practices, then, I created one top-level directory (creatively, /data1) for data that is user-originated and should be frequently backed up, and another (/data2) for sourced-elsewhere files that could be backed up less often.

Since /home data is definitely worth backing up frequently, I took the step of moving it into /data1 as linux-home (and implementing a /home > /data1/linux-home symlink). I didn’t want to provide Samba access to that data, so a /data1/all-os directory became the top-level /data1 share, with a /data2/all-os counterpart on /data2.

Putting Linux Permissions to Work

With a security mode chosen and share directories in place, the next step was to negotiate with the Linux permissions system and put it to work for me. I had the concept of the “core users” of the server being the only ones having Samba access, so I created a core-users Linux group and made it the primary group for that subset of users. I also made it the group of /data1/all-os and /data2/all-os.

Making Future Files and Directories Group-Writeable

A user’s “umask” sets the permissions on files and directories he creates. Users’ umask on my system, 0022, gives read privileges to a file’s group and read-execute privileges to a directory’s group. So far, so good: those permissions enable a file or directory created on the Samba share by one user to be viewed by other users. However, that umask denies write privileges to groups, which isn’t so hot in this context: it would prevent users from changing each others’ files.

Luckily, Samba offers force create mode (for files) and force directory mode, which override the permissions set based on umask. By setting those options to 0774 and 0775, respectively, files and directories created through Samba would be group-writeable.

Getting SELinux Types Right

There’s one final, critical piece to the puzzle. Originally, I didn’t realize this step was needed, and leaving it out had brought this project to its knees–as in, my Samba share just did not work! …And I was dumbfounded as to why.

The critical step was to get the types right in my SELinux contexts.

By default, new top-level directories, like /data1 and /data2, as well as directories created under them, are given type etc_runtime_t. In a typical SELinux configuration, Samba cannot access the etc_runtime_t type.

To get things working, I had to set up file specifications to apply the samba_share_t type to /data1/all-os, to /data2/all-os, and to any files and directories that would get created under them. The syntax, which was taken from the samba_selinux man page (snapshot available on Dan Walsh’s blog), was as follows:

semanage fcontext -a -t samba_share_t "/data1/all-os(/.*)?"
semanage fcontext -a -t samba_share_t "/data2/all-os(/.*)?"
restorecon -R -v /data1/all-os
restorecon -R -v /data2/all-os

Samba shares have samba_share_t type

It’s Alive!

Once this range of steps was addressed, our Samba server came to life!

From Windows: Mapping drive letter to Samba share, copying up a file, reading it back

The end product works so seamlessly with Windows clients, one could easily forget there’s a Linux machine at the other end of the connection.

Appendix: Relevant Sections of smb.conf, as Written Out by SWAT

[global]
        workgroup = MYGROUP
        server string = Samba Server Version %v
        log file = /var/log/samba/log.%m
        max log size = 50
        force create mode = 0774
        force directory mode = 0775
        cups options = raw

[data1]
        comment = Frequently backed-up data
        path = /data1/all-os
        read only = No

[data2]
        comment = Infrequently backed-up data
        path = /data2/all-os
        read only = No

Battling “new-host,” Round Two

With the home server‘s hostname problem apparently resolved, I was able to access its Samba shares from Windows machines using the desired syntax (\\jtown\...). (Configuring the shares will be the topic of the next posting.) However, I had difficulties connecting to the server by name with SSH.

Router identifies server as “new-host”

A check of the administration GUI on my Verizon FiOS router, which also functions as a DNS and DHCP server, uncovered the issue: The router was still calling the server “new-host”! As a result, DNS lookups on “jtown,” the desired hostname, were failing.

DHCP request omits hostname

Wireshark review of traffic between the server and the router confirmed the underlying problem: the server was not including its hostname in its DHCP request for an IP address. So, the router was choosing to continue calling it “new-host.”

The Red Hat documentation looked like it would guide me to a solution. It includes a section on DHCP client configuration, which talks about a promising DHCP_HOSTNAME option. The only problem was, the option belongs in a device configuration file in /etc/sysconfig/network-scripts, but with NetworkManager handling my networking, there was no such file! How, then, to proceed?

The Fix

The information in Red Hat bug #723374 put me on the path to a fix. If you’re facing the same problem I did, hopefully this approach will work for you, too.

If you’re using a networking device other than wired Ethernet on eth0, you’ll want to change the filename in step #1 accordingly.

  1. Create an /etc/dhclient-eth0.conf file. (Putting the file in /etc/dhcp works, too, but unlike /etc, non-root users can’t list that directory’s contents.) Creating the file while su‘ed to root will give non-root users privileges to read it but not write to it, which is desirable.
  2. Setting up the file per the dhclient.conf man page, write out a send host-name statement with the desired hostname (comments are supported with #):
    # Added by D. Manchester, 2 May 2012
    send host-name "jtown";
  3. Restart the NetworkManager service (sudo service NetworkManager restart).

DHCP request includes hostname

With that additional configuration file in place, Wireshark indicates that the server includes its hostname in its DHCP request. And indeed, the router now identifies the server correctly, and DNS lookups on “jtown” work!

Router identifies server as “jtown”

Shares Accessible Even Without Fix?

The fact that the home server’s Samba shares were accessible via “jtown” even without this fix offers a useful reminder: While it apparently can work with DNS, SMB/CIFS, the protocol implemented by Samba, does not require DNS for name resolution. Rather, a client can discover a server via NetBIOS broadcast, and indeed, responding to such broadcasts is the primary purpose of Samba’s nmbd daemon.

Next: Configuring Samba!

Installing the Operating System and Battling “new-host”

With the home server‘s BIOS up to date, it was time to install the operating system. Samba, which would be the server’s cornerstone, is available for most any variant of Linux, so I had a lot of leeway in picking a distribution.

My experiences with Ubuntu had been very good, so I could’ve gone with that. But, I wanted to get more exposure to other branches of the Linux family tree, so I chose version 6.2 of Scientific Linux, a Red Hat derivative.

The Scientific Linux installation generally came off smoothly. The one thing that behaved oddly afterwards was the system’s hostname.

Hostname Weirdness

Even though I had specified a meaningful hostname in the Scientific Linux installer, the system initially identified itself as “new-host”. Even odder, the name would sometimes change upon reboot: “new-host” became “new-host-2″, and then “new-host-3″; and after booting the system from an Ubuntu Live CD (but rebooting under Scientific Linux), it even called itself “ubuntu”!

Some poking around and Googling confirmed what was happening. The hostname I had entered into the installer hadn’t stuck. In its place, the system was accepting a hostname from my Verizon FiOS router (an Actiontec MI424WR)–the origin of the “new-host” names.

The Fix

Entering hostname

The fix was to use the system-config-network utility. On its DNS configuration screen, I entered my desired name into the Hostname field. To ensure the other parameters available on that screen were pulled down from my router, I blanked them out.

Upon rebooting, the login screen appeared with the correct hostname!

Update, 4 May 2012: While this fix got the home server identifying itself by the correct hostname, an additional fix was necessary to get our broadband router (and DNS server) to do the same. See “Battling ‘new-host,’ Round Two” for the details.

Some Notes on Configuring Linux Networking

  • Red Hat’s documentation asserts that, as of version 6.2, system-config-network is no longer needed, with its functionality subsumed by NetworkManager and its Network Connections GUI utility (nm-connection-editor). That assertion appears to be incorrect, though, as that GUI provides no ability to set the hostname. (Watch this space, though.)
  • Based on what find tells me (sudo find /etc -mmin -1 -print), files are updated as follows when I alter values on system-config-network‘s DNS configuration screen and save changes (default networking profile located in /etc/sysconfig/networking/profiles/default):
    This file in default profile updated… Updates pushed to…
    hosts /etc/hosts
    network /etc/sysconfig/network
    resolv.conf /etc/resolv.conf
  • system-config-network updates to the hosts files are purely additive; system-config-network will not remove old hostnames! So, if you name a system host1, but later change its name to host2, the relevant hosts lines will look something like:

    127.0.0.1     localhost.localdomain   localhost   host1  host2

    I don’t imagine this detritus would generally be harmful. Mostly, it just allows the host to ping/connect to itself via old hostnames–a capability of dubious value. I guess it could be problematic if an old hostname were assigned to a different machine and the hosts file were charged with name resolution for that machine, too. The file would then associate two different IP addresses with a single name, with unpredictable results.

BIOS Updates, the Linux Way

Once the home server was assembled, I checked its BIOS against the latest version available on Intel’s Web site. The installed version was several revisions out of date, so I decided to update it before proceeding with other setup tasks.

Intel offers ISO images you can burn to CD and run to update the BIOS of a system not yet possessing an OS installation. I had read elsewhere, though, about setting up a USB drive/thumb drive from Linux that boots to old-school DOS, with the BIOS update then run from there. Intrigued, I elected to use that approach, relying on an existing Ubuntu machine we have to do the setup work.

The Approach

To be clear, I cannot claim credit for coming up with this approach. I read about it in several places on the Web, particularly this post from blogger Daniel. Daniel, in turn, indicates that the information was “[l]ifted entirely”(!) from this post by Michael. What I lay out here is simply a variation on what Daniel and Michael present, with a little more detail for those who may be less familiar with the tools and techniques in the mix.

As is always the case with this stuff, no warranty is expressed or implied. You use the information supplied here at your own risk!

Collecting the Right Files

You’ll need three things to use this approach to update your BIOS:

  • QEMU emulator. May already be installed on your PC. If not, you can probably download it with your package manager.
  • DOS image. Daniel and Michael went with Balder, and it worked fine for me, too.
  • BIOS update files. For the home server’s Johnstown motherboard, Intel makes the files available here > Download Type: BIOS > Result with Status: Latest > Download for Iflash BIOS Update.

Setting Up the USB Drive

The instructions that follow assume that, when you plug in your USB drive, your Linux system assigns a device name of /dev/sdc to it (as indicated by sudo fdisk -l) and auto-mounts it. It’s actually quite likely that a different device name gets assigned; in that case, references below to /dev/sdc should be updated accordingly.

Creating FAT16 partition

  1. Plug the USB drive into your PC. After it auto-mounts, unmount it with umount.
  2. Start the cfdisk partition editor against the drive:
    sudo cfdisk /dev/sdc
  3. In the cfdisk UI, create a primary bootable partition using FAT16 (filesystem type 06).
  4. Start QEMU, associating the Balder image with A: and booting it while assigning C: to the USB drive (more specifically, to the partition on the drive created in the previous step):
    sudo qemu -fda balder10.img -boot a -hda /dev/sdc
  5. In QEMU, format the USB drive and make it bootable:
    FORMAT /S C:
  6. Close QEMU.
  7. Remount the USB drive by unplugging it and plugging it back in.
    You could also do this at the command line.
  8. Copy over the BIOS update files.
  9. Unmount the USB drive with umount.

With that, your USB drive should be ready! To check it out within your Linux session, boot it in QEMU:

sudo qemu -hda /dev/sdc

Booted USB drive in QEMU

Doing the Update

Boot your PC from your USB drive. Follow your manufacturer’s guidance on using the BIOS update files. (If it’s an Intel update, you’ll likely run IFLASH2 with parameters found in an Intel README file.)

Putting It Together

Parts before assembly

The orders from the home server’s shopping list arrived to great anticipation. As I unpacked them, I found myself looking at an impressive array of parts. One thing I didn’t find myself looking at, though, was a set of instructions on how to put it all together. “Would my good friends at Logic Supply leave me high and dry like that?,” I wondered.

A quick check of their Web site assuaged my concerns, turning up an assembly guide. With the guide and Intel’s motherboard diagram in hand, I was ready to build.

Assembling the system was pretty straightforward–mostly snapping/screwing/sticking components together. One thing that required more effort, though, but that was also fascinating, was the system’s cooling scheme.

Fanless Cooling

Finned heat sinks still in place; heat pipes alongside

As shipped from Intel, the Johnstown motherboard has finned heat sinks on the CPU and the northbridge. This would seem to reflect an assumption that cooling will occur via airflow over the motherboard, assisted by a fan when needed. But Logic Supply’s case for this motherboard is a fanless enclosure. How, then, is heat to be dissipated?

By ditching the finned heat sinks and turning the case itself into a big heat sink. In place of the finned sinks, Logic Supply has you install a part with two vertical heat pipes, placing heat paste on the pipes’ tops and bottoms. The pipes are sized such that they contact the underside of the case’s cover when it’s in place. So, when everything is put together and the system is running, heat flows from the motherboard, through the heat pipes, and into the case, where it dissipates into the ambient air!

Powering Up

Assembled system without cover

Once the system was assembled, but before the cover was on, I was tempted to power things up. With the cover being a key piece of the cooling system, though, and having read horror stories of how fast a processor can self-destruct without proper cooling, I thought better of it and dutifully screwed on the cover.

With the cover in place and the power cord and video-out cables hooked up, hardware assembly was complete, so I tried turning the system on. Success!

Powering up

Next: Updating the BIOS; doing the software setup.