Yong Ouyang: 2013

Tuesday 18 June 2013

Use Teamcity as the Continuous Integration and Build Management Server

I've been using Teamcity as the continuous integration and build management server for the last few years. Not until recently I tried out Hudson (or Jenkins) as the alternative solution. It is quite frustrated to learn that Hudson, being an open-sourced project, doesn't even support remote build (where I can submit my local changes to Teamcity and run a selected builds, and commit once all the builds are successful).

In the scenario where the project has quite a number of builds for various tests, e.g. unit tests, integration tests, external tests, smoke tests, etc., running these tests with my local changes to check whether I break any existing tests or not is rather crucial in agile development approach, where I always want to have a feedback in the shortest amount of time. Of course, without being able to do a remote run with multiple test suites running in parallel, you may argue that I can still run the test suites locally one by one, and only check in when all the tests pass locally, in which case, Hudson is sufficient enough and it is FREE!

I don't intend to devote my blog to compare Teamcity and Hudson, but given the experiences of using both, I can conclude that Teamcity is no doubt the first choice in terms of CI and build management, as its feature is a perfect match for a developer with TDD (Test-Driven Development) and agile mind set. Having said that, I am gonna demonstrate how easy it is to setup the Teamcity for my project hosted on Git. JetBrains does offer a Teamcity professional version which is FREE and comes with a limit of 20 build configurations and 3 build agents.

Installation

Download the latest version for Linux (or whichever suits your OS)

Unzip the downloaded file, e.g. tar -zxvf TeamCity-7.1.5.tar.gz

youyang@monkey-dev-01:/app> ll
total 12
drwxr-xr-x  6 youyang users 4096 Mar 22 02:31 monkey
drwxr-xr-x 12 youyang users 4096 Jun 14 09:31 TeamCity
drwxr-xr-x 13 youyang users 4096 Jun 15 09:11 TeamCity-BuildAgent
youyang@monkey-dev-01:/app>

start up teamcity, e.g. ./app/TeamCity/bin/startup.sh , on port 8111 by default

Configuration

browse to http://localhost:8111 to verify that Teamcity has started up successfully, which will take a minute or two before it asks you to setup an admin account

once the admin account is created and logged in, go to the Administration menu to create a new Project. Specify name and description on the General tab, shown as below

create a new VCS root on the VCS Roots tab, and choose Git (or whatever VCS you use), then fill in the information as shown below

go back to the project config, and create a build configuration

complete a minimum 3-step configuration for the build. Fill in the build name and description for Step 1; Choose the VCS root specified before in Step 2; Specify the build step with Ant (or other build runners the project is using), as shown below

this is the minimum configuration for a project with only one build configuration, which is good enough for the demo

Teamcity Agent

With a project setup on Teamcity server, we still need to install agents so that we can run the builds on them. Go Administration->Install Build Agents, notice that the link for the zip file distribution is http://monkey-dev-01:8111/update/buildAgent.zip

On the machine that you want to install your build agent, wget the zip file from, e.g. http://monkey-dev-01:8111/update/buildAgent.zip, unzip to, e.g. /app/TeamCity-BuildAgent folder

create the buildAgent.properties from the template file buildAgent.dist.properties under, e.g. /app/TeamCity-BuildAgent/conf folder

setup the following properties in the buildAgent.properties file

# name of the agent appearing on Teamcity server
name=Monkey-Dev-01
# setup the JDK
env.JAVA_HOME=/app/monkey/java

start up the agent in the background by typing "/app/TeamCity-BuildAgent/bin/agent.sh start"

Wait a few minutes before your agent shown under the Agents->Connected tab on Teamcity server, as shown below

Once the agent is connected, you are ready to run your first build on Teamcity:

If you are using Intellij IDE, install the Teamcity plugin so that you can send a remote run to the Teamcity server by selecting which builds to run.

Friday 10 May 2013

Bitwise Operations in Java

The bitwise operations in Java are rarely discussed, and you can sense that by reading the official tutorials on this topic. However, there are interesting usages that you should probably be aware of.

The bitwise operators operates on integral types (int and long). Before explaining how bitwise operators work, you should have an idea of how integers are represented in the binary form. As mentioned by the official primitive data type tutorials, the int data type is a 32-bit signed 2's complement integer, whose value ranges from -2^32 to 2^32-1, inclusively. On the other hand, the long data type is a 64-bit signed 2's complement integer, whose value ranges from -2^64 to 2^64-1, inclusively. In both cases, the highest order bit of int and long shows the sign of the value and is not part of the numeric value, where 0 being positive and 1 being negative.

For example, an int value of 1 in binary form looks like this

00000000000000000000000000000001

while an int value of -1 in binary form looks like this

11111111111111111111111111111111

The negative value is transformed in 2 steps:

Step 1: bitwise complement of +1, such that
00000000000000000000000000000001 
becomes
11111111111111111111111111111110

Step 2: add 1, such that
11111111111111111111111111111110 
becomes
11111111111111111111111111111111

Given a negative value, where the highest (leftmost) bit is 1, you can work out what value the binary form represents in 2 steps:

Step 1: bitwise complement of the binary value, such that
11111111111111111111111111111011
becomes
00000000000000000000000000000100

Step 2: add 1, such that
00000000000000000000000000000100
becomes
00000000000000000000000000000101

Therefore, the original binary form represents -5.

The bitwise operators

Operator	Name	Example	Result	Description
`a & b`	and	6 & 9	0	1 if both bits are 1, or 0 if either bit is 0
`a \| b`	or	6 \| 9	15	1 if either bit is 1, or 0 if both bits are 0
`a ^ b`	xor	6 ^ 9	15	1 if both bits are different, 0 if both bits are the same
`~a`	not	~9	-10	Inverts the bits. (equivalent to +1 and then change sign)
`n << p`	left shift	60 << 2	240	Shifts the bits of n left p positions. Zero bits are shifted into the low-order positions.
`n >> p`	right shift	60 >> 2	15	Shifts the bits of n right p positions. If n is a 2's complement signed number, the sign bit is shifted into the high-order positions.
`n >>> p`	unsigned right shift	-60 >>> 20	4095	Shifts the bits of n right p positions. Zeros are shifted into the high-order positions.

Packing and Unpacking

Multiple int values of limited range can be "packed" into one int, by using the bitwise operators. For example, we have the following integer variables age (range 0~255, or 8-bit), gender (0~1, or 1-bit), weight (range 0~255, or 8-bit), and height (range 0-255, or 8-bit). These integers can be packed and unpacked into/from a 32-bit integer like this:

Represent boolean values in a byte

Suppose we have 8 boolean values that are true or false, and we want to store them in a single byte, and sent it across the network, and then unpack it and use it to toggle 8 different things:

Multiply and Divide

When using x << 3, it is equivalent to multiply x by 8 (or 2^3); similarly, when using x >> 2, it is equivalent to divide x by 4 (or 2^2). In fact, a smart enough compiler will replace your multiplication or division expressions using bit shift operators such that it is more compact and faster. So there is no need to explicitly use bit shift operator if you want multiply or divide by a power of 2!

Thursday 18 April 2013

I/O Redirection in Bash

Since Bash is a superset of Bourne shell, the IO redirection is the same across the Bourne shell family. To find out exactly which shell you are using in Unix or Linux, typing in command

echo $SHELL

shall display an executable command with its path, for example

Path	Shell
/bin/sh	Bourne shell
/bin/bash	Bourne Again SHell
/bin/csh	C SHell
/bin/ksh	Korn SHell

Bash takes input from standard input (stdin), writes output to standard output (stdout), and writes error output to standard error (stderr). Standard input is connected to the terminal keyboard, while standard output and error are connected to the terminal screen, by default. Redirection of I/O is accomplished by using a redirection metacharacter followed by the desired destination (stdout & stderr) or source (stdin). Bash uses file descriptor numbers to refer to the standard I/O, where 0 is standard input, 1 is standard output, and 2 is standard error. Below are the common redirection for the Bourne shell family:

Meta Character	Action
>	Redirect standard output
>>	Append to standard output
2>	Redirect standard error
2>&1	Redirect standard error to standard ouput
2>&1\|	Redirect standard error to standard ouput, and pipe them to another command
<	Redirect standard input
\|	Pipe standard output to another command

Please note that because < and > are referring to standard input and output respectively, the numbers 0 and 1 are not required in this case.

Thursday 4 April 2013

Sorting Algorithm Performance Comparisons with Caliper's Micro Benchmarking

After inspired by Martin Thompson's presentation on Performance Testing Java Application, I decided to try out the micro benchmarking framework Caliper on the sorting algorithms that I recently wrote in order to understand better how different sorting algorithms behaves given an array of randomly distributed values.

It was actually a bit involving to find out where I could download the caliper JAR file, as it was not available on their google code base. You can download it from the maven repository, or grab it from my git described at the bottom of this article.

I created a SortingAlgorithmsBenchmark class that extends SimpleBenchmark, and some public methods, whose names have prefix 'time'. Then use the caliper Runner to run the benchmarks on the SortingAlgorithmsBenchmark class, as shown below

import com.google.caliper.Runner;
import com.google.caliper.SimpleBenchmark;
import sorting.*;

import java.util.Arrays;
import java.util.Random;

public class SortingAlgorithmsBenchmark extends SimpleBenchmark {

    private static final int SIZE = 10000;
    private static final int MAX_VALUE = 10000;
    private int[] values;

    @Override
    protected void setUp() throws Exception {
        values = new int[SIZE];
        Random generator = new Random();
        for (int i = 0; i < values.length; i++) {
            values[i] = generator.nextInt(MAX_VALUE);
        }
    }

    public void timeBubbleSort(int reps) {
        for (int i = 0; i < reps; i++) {
            BubbleSort.sort(values);
        }
    }

    public void timeInsertionSort(int reps) {
        for (int i = 0; i < reps; i++) {
            InsertionSort.sort(values);
        }
    }

    public void timeSelectionSort(int reps) {
        for (int i = 0; i < reps; i++) {
            SelectionSort.sort(values);
        }
    }

    public void timeMergeSort(int reps) {
        for (int i = 0; i < reps; i++) {
            MergeSort.sort(values);
        }
    }

    public void timeShellSort(int reps) {
        for (int i = 0; i < reps; i++) {
            ShellSort.sort(values);
        }
    }

    public void timeArraysSort(int reps) {
        for (int i = 0; i < reps; i++) {
            Arrays.sort(values);
        }
    }

    public static void main(String[] args) {
        Runner.main(SortingAlgorithmsBenchmark.class, args);
    }

}

Here are the results when running the benchmarks with a random array of size 10,000. Apparently, bubble sort and selection sort are the slowest of all, with selection sort behaving rather unstable (you will notice the selection sort benchmark is quite different across different runs. It is pleased to see that the sorting algorithm (Dual-Pivot Quick Sort) used by Arrays performs fairly well with a medium size array.

0% Scenario{vm=java, trial=0, benchmark=BubbleSort} 75703105.85 ns; σ=3428843.80 ns @ 10 trials
17% Scenario{vm=java, trial=0, benchmark=InsertionSort} 73305.66 ns; σ=712.37 ns @ 6 trials
33% Scenario{vm=java, trial=0, benchmark=SelectionSort} 49624130.05 ns; σ=345774.06 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=MergeSort} 478932.15 ns; σ=683.99 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=ShellSort} 262061.27 ns; σ=11605.29 ns @ 10 trials
83% Scenario{vm=java, trial=0, benchmark=ArraysSort} 8063.53 ns; σ=20167.07 ns @ 10 trials

    benchmark       us linear runtime
   BubbleSort 75703.11 ==============================
InsertionSort    73.31 =
SelectionSort 49624.13 ===================
    MergeSort   478.93 =
    ShellSort   262.06 =
   ArraysSort     8.06 =

vm: java
trial: 0

Increasing the array size to 100,000, and excluding the bubble sort and selection sort in the benckmarks, we have another set of results, where insertion sort starts to behave terribly in this case, and the dual-pivot quick sort is still the quickest.

0% Scenario{vm=java, trial=0, benchmark=InsertionSort} 2031873339.00 ns; σ=3247182.51 ns @ 3 trials
25% Scenario{vm=java, trial=0, benchmark=MergeSort} 5701257.43 ns; σ=37183.00 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=ShellSort} 2277851.07 ns; σ=21077.62 ns @ 3 trials
75% Scenario{vm=java, trial=0, benchmark=ArraysSort} 210179.59 ns; σ=164843.22 ns @ 10 trials

    benchmark      us linear runtime
InsertionSort 2031873 ==============================
    MergeSort    5701 =
    ShellSort    2278 =
   ArraysSort     210 =

vm: java
trial: 0

You can find the source code for the sorting algorithms here, and the benchmarks runner here, all of which are part of my github project DataStructureAndAlgorithms

Wednesday 3 April 2013

A Different Builder Pattern Example in Java

Builder design pattern is often used to separate the construction of a complex object from its various representations. I will show you a nicer way of writing a Builder pattern, rather than the example shown in the wiki.

First of all, I created a POJO called Vehicle, which has some simple properties about its model, its make, etc. Then, nested within this POJO, I created a Builder that holds some default properties which can be used to build a vehicle. The builder can be created by a static method newBuilder(), and we can override the vehicle properties by chaining the with* methods, and finally return the constructed vehicle using the build() method.

public class Vehicle {

    private String model;
    private String make;
    private int numberOfWheels;
    private double height;
    private double width;
    private double length;
    private int numberOfDoors;

    private Vehicle(String model, String make, int numberOfWheels, double height, double width, double length, int numberOfDoors) {
        this.model = model;
        this.make = make;
        this.numberOfWheels = numberOfWheels;
        this.height = height;
        this.width = width;
        this.length = length;
        this.numberOfDoors = numberOfDoors;
    }

    public static class Builder {

        private String model = "M5";
        private String make = "BMW";
        private int numberOfWheels = 4;
        private double height = 1.45;
        private double width = 1.9;
        private double length = 4.62;
        private int numberOfDoors = 5;

        public static Builder newBuilder() {
            return new Builder();
        }

        public Builder withModel(String model) {
            this.model = model;
            return this;
        }

        public Builder withMake(String make) {
            this.make = make;
            return this;
        }

        public Vehicle build() {
            return new Vehicle(model, make, numberOfWheels, height, width,length, numberOfDoors);
        }
    }
}

Let me use a unit test, written in Groovy, to demonstrate exactly how we use the builder. I created a test to show how we can build a default vehicle without providing any overrides to the default vehicle properties, and another test to show how we can override the default vehicle properties to create a different vehicle.

import org.junit.Test

class VehicleTest {

    @Test
    public void canBuildADefaultVehicle() {
        def defaultVehicle = Vehicle.Builder.newBuilder().build()
        assert defaultVehicle.model == "M5"
        assert defaultVehicle.make == "BMW"
        assert defaultVehicle.numberOfWheels == 4
        assert defaultVehicle.height == 1.45
        assert defaultVehicle.width == 1.9
        assert defaultVehicle.length == 4.62
        assert defaultVehicle.numberOfDoors == 5
    }

    @Test
    public void canBuildADifferentVehicle() {
        def defaultVehicle = Vehicle.Builder.newBuilder().withModel("M3").build()
        assert defaultVehicle.model == "M3"
        assert defaultVehicle.make == "BMW"
        assert defaultVehicle.numberOfWheels == 4
        assert defaultVehicle.height == 1.45
        assert defaultVehicle.width == 1.9
        assert defaultVehicle.length == 4.62
        assert defaultVehicle.numberOfDoors == 5
    }
}

Some says the chaining syntax makes the building block easier to read. :)

Tuesday 2 April 2013

Solving Farmer-Wolf-Goat-Cabbage riddle with Groovy/Java

The Riddle - Farmer Wolf Goat Cabbage

A farmer is on the west bank of a river with a wolf, a goat and a cabbage in his care. Without his presence the wolf would eat the goat or the goat would eat the cabbage. The wolf is not interested in the cabbage. The farmer wishes to bring his three charges across the river. However the boat available to him can only carry one of the wolf, goat and cabbage besides himself. The puzzle is: is there a sequence of river crossings so that the farmer can transfer himself and the other three all intact to the east bank?

The Solution

It is a task that given an initial state, a final state, and a set of rules, one will start from the initial state and try to reach to the final state by passing through some intermediate states. Each move will transform from one state to the other, and there might be multiple valid moves from a given state. Such a collection of interconnected states can be represented by State Space Graph.
Let's use 'f', 'c', 'g', 'w' to denote the farmer, the cabbage, the goat, and the wolf, and use '|' to separate the river where left of the '|' denotes west bank and right of the '|' denotes east bank. Initially, they are all at the west bank of the river, which is represented as 'fcgw |' as shown below. We can solve the riddle by figuring out what the possible and valid moves are, using either Breadth-First Search or Depth-First Search, on a state space graph shown below

Depth First Search

I use DFS to find the first possible solution to the riddle, where it looks like this

or with a possibility of backtracking like this

Breadth First Search

I can build a complete state space graph using BFS, excluding the red states as shown below

Sample Source Code

Implemented in a mix of Groovy and Java, where Java is mainly used for the POJOs, the sample source code, including the unit test, demonstrates how to build a complete state space graph (which is also a directed graph) and how to find all possible solutions based on the state space graph.
DirectedGraph.java

import java.util.*;

public class DirectedGraph<T> implements Iterable<T> {

    // key is a Node, value is a set of Nodes connected by outgoing edges from the key
    private final Map<T, Set<T>> graph = new HashMap<T, Set<T>>();

    public boolean addNode(T node) {
        if (graph.containsKey(node)) {
            return false;
        }

        graph.put(node, new HashSet<T>());
        return true;
    }

    public void addNodes(Collection<T> nodes) {
        for (T node : nodes) {
            addNode(node);
        }
    }

    public void addEdge(T src, T dest) {
        validateSourceAndDestinationNodes(src, dest);

        // Add the edge by adding the dest node into the outgoing edges
        graph.get(src).add(dest);
    }

    public void removeEdge(T src, T dest) {
        validateSourceAndDestinationNodes(src, dest);

        graph.get(src).remove(dest);
    }

    public boolean edgeExists(T src, T dest) {
        validateSourceAndDestinationNodes(src, dest);

        return graph.get(src).contains(dest);
    }

    public Set<T> edgesFrom(T node) {
        // Check that the node exists.
        Set<T> edges = graph.get(node);
        if (edges == null)
            throw new NoSuchElementException("Source node does not exist.");

        return Collections.unmodifiableSet(edges);
    }

    public Iterator<T> iterator() {
        return graph.keySet().iterator();
    }

    public int size() {
        return graph.size();
    }

    public boolean isEmpty() {
        return graph.isEmpty();
    }

    private void validateSourceAndDestinationNodes(T src, T dest) {
        // Confirm both endpoints exist
        if (!graph.containsKey(src) || !graph.containsKey(dest))
            throw new NoSuchElementException("Both nodes must be in the graph.");
    }

}

BoatLocation.java

public enum BoatLocation {
    WestBank, EastBank
}

RiverRole.java, implemented Comparable interface in order to be used in a SortedSet

public class RiverRole implements Comparable<RiverRole> {

    public final String name;
    public final boolean canSailTheBoat;

    public RiverRole(String name, boolean canSailTheBoat) {
        this.name = name;
        this.canSailTheBoat = canSailTheBoat;
    }

    @Override
    public int compareTo(RiverRole o) {
        return this.name.compareTo(o.name);
    }

    @Override
    public String toString() {
        return name;
    }
}

RiverState.java

import java.util.Collection;
import java.util.SortedSet;
import java.util.TreeSet;

public class RiverState {

    private SortedSet<RiverRole> westBank;
    private SortedSet<RiverRole> eastBank;
    private BoatLocation boatLocation;

    public RiverState(Collection<RiverRole> westBank, Collection<RiverRole> eastBank, BoatLocation boatLocation) {
        this.westBank = new TreeSet<RiverRole>(westBank);
        this.eastBank = new TreeSet<RiverRole>(eastBank);
        this.boatLocation = boatLocation;
    }

    public SortedSet<RiverRole> getWestBank() {
        return new TreeSet<RiverRole>(westBank);
    }

    public SortedSet<RiverRole> getEastBank() {
        return new TreeSet<RiverRole>(eastBank);
    }

    public BoatLocation getBoatLocation() {
        return boatLocation;
    }

    @Override
    public String toString() {
        return "State{westBank=" + westBank + ", eastBank=" + eastBank + ", boatLocation=" + boatLocation + "}";
    }

    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof RiverState)) return false;

        RiverState riverState = (RiverState) o;

        return eastBank.equals(riverState.eastBank) &&
               westBank.equals(riverState.westBank) &&
               boatLocation.equals(riverState.boatLocation);
    }

    public int hashCode() {
        int result;
        result = westBank.hashCode();
        result = 31 * result + eastBank.hashCode();
        result = 31 * result + boatLocation.hashCode();
        return result;
    }
}

FarmerWolfGoatRiddle.groovy

import graph.DirectedGraph
import org.codehaus.groovy.runtime.MethodClosure

import static fun.rivercrossing.BoatLocation.EastBank
import static fun.rivercrossing.BoatLocation.WestBank

public class FarmerWolfGoatRiddle {

    /*  Farmer Wolf Goat Puzzle
        A farmer is on the east bank of a river with a wolf, goat and cabbage in his care. Without his presence the wolf would eat
        the goat or the goat would eat the cabbage. The wolf is not interested in the cabbage. The farmer wishes to bring his three
        charges across the river. However the boat available to him can only carry one of the wolf, goat and cabbage besides
        himself. The puzzle is: is there a sequence of river crossings so that the farmer can transfer himself and the other three all
        intact to the west bank?
    */

    RiverRole farmer = new RiverRole("farmer", true);
    RiverRole wolf = new RiverRole("wolf", false);
    RiverRole goat = new RiverRole("goat", false);
    RiverRole cabbage = new RiverRole("cabbage", false);

    RiverState initialState = new RiverState([farmer, wolf, goat, cabbage], [], BoatLocation.WestBank)
    RiverState finalState = new RiverState([], [farmer, wolf, goat, cabbage], EastBank)

    Closure<Boolean> areRolesHarmonisedOnRiverBank = { Collection<RiverRole> roles ->
        // describe clearly what the rules are
        if (!roles.contains(farmer)) {
            if (roles.contains(wolf) && roles.contains(goat)) return false
            if (roles.contains(goat) && roles.contains(cabbage)) return false
        }
        return true
    }


    public Collection<Deque<RiverState>> findAllPossibleSolutions() {
        DirectedGraph<RiverState> graph = buildCompleteGraph()

        Collection<Deque<RiverState>> solutions = []
        Deque<RiverState> currentSolution = new ArrayDeque<RiverState>()

        explore(graph, initialState, currentSolution, solutions)

        return solutions
    }

    // using Depth First Search (Stack Implementation) to identify a solution
    private void explore(DirectedGraph<RiverState> graph, RiverState currentNode, Deque<RiverState> currentSolution, Collection<Deque<RiverState>> solutions) {
        currentSolution.push(currentNode)
        graph.edgesFrom(currentNode).each { RiverState node ->
            if (node == finalState) {
                // add the current solution, including the final node, to the solutions
                currentSolution.push(node)
                solutions.add(new ArrayDeque<RiverState>(currentSolution))
                currentSolution.pop()
            } else {
                // recursively explore
                explore(graph, node, currentSolution, solutions)
            }
        }
        currentSolution.pop()
    }

    // using Breath First Search to build the complete graph
    private DirectedGraph<RiverState> buildCompleteGraph() {
        DirectedGraph<RiverState> graph = new DirectedGraph<RiverState>()
        graph.addNode(initialState)

        Set<RiverState> visitedNodes = new HashSet<RiverState>()

        // use a queue to keep the states to be visited in sequence
        Deque<RiverState> currentStates = new ArrayDeque<RiverState>()
        currentStates.add(initialState)

        while (!currentStates.isEmpty()) {
            if (currentStates.peek() != finalState) {
                def currentState = currentStates.removeFirst()
                if (!visitedNodes.contains(currentState)) {
                    visitedNodes.add(currentState)
                    def possibleStates = findNextPossibleStates(currentState)
                    def validStates = possibleStates.findAll { RiverState possibleState ->
                        // matched state should not be one of the visited states, and it should pass the validations
                        return !visitedNodes.contains(possibleState) && passRuleValidations(possibleState)
                    }
                    // add successors and edges
                    graph.addNodes(validStates)
                    validStates.each { graph.addEdge(currentState, it) }

                    // add to current states so that we can visit them next
                    currentStates.addAll(validStates)
                }
            } else {
                currentStates.removeFirst() // remove the final state from the currentStates queue
            }
        }

        return graph
    }

    private boolean passRuleValidations(RiverState state) {
        return areRolesHarmonisedOnRiverBank(state.eastBank) && areRolesHarmonisedOnRiverBank(state.westBank)
    }

    private List<RiverState> findNextPossibleStates(RiverState currentState) {
        // from the last visited state, find out who can sail the boat, and where is the sailor
        List<RiverState> possibleStates = []

        transitFrom(currentState, possibleStates)

        return possibleStates
    }

    private void transitFrom(RiverState currentState, List<RiverState> possibleStates) {
        // check where the boat is
        BoatLocation boatOrigin = currentState.boatLocation
        BoatLocation boatDestination

        // we use method closure to get a copy of west bank or east bank roles
        MethodClosure origin, destination

        // multiple assignments
        (origin, destination, boatDestination) = boatOrigin == WestBank ?
            [currentState.&getWestBank, currentState.&getEastBank, EastBank] :
            [currentState.&getEastBank, currentState.&getWestBank, WestBank]

        def possibleSailors = origin().findAll { RiverRole role -> role.canSailTheBoat }

        possibleSailors?.each { RiverRole sailor ->
            // pick 0 passengers onto the boat
            Collection<RiverRole> onBoat = [sailor]
            possibleStates.add(createNextPossibleState(origin, destination, onBoat, boatDestination))

            Collection<RiverRole> possiblePassengers = origin() as Collection<RiverRole>
            possiblePassengers.remove(sailor)

            // pick 1 passenger onto the boat
            possiblePassengers.each { RiverRole passenger ->
                onBoat = [passenger, sailor]
                possibleStates.add(createNextPossibleState(origin, destination, onBoat, boatDestination))
            }
        }
    }

    private RiverState createNextPossibleState(MethodClosure origin, MethodClosure destination, Collection<RiverRole> onBoat, BoatLocation boatDestination) {
        Collection<RiverRole> newDestination = destination() as Collection<RiverRole>
        Collection<RiverRole> newOrigin = origin() as Collection<RiverRole>

        // boat transits from origin to destination
        newOrigin.removeAll(onBoat)
        newDestination.addAll(onBoat)

        return boatDestination == WestBank ? new RiverState(newDestination, newOrigin, boatDestination) : new RiverState(newOrigin, newDestination, boatDestination)
    }


}

FarmerWolfGoatRiddleTest.groovy

import org.junit.Test

import static fun.rivercrossing.BoatLocation.EastBank
import static fun.rivercrossing.BoatLocation.WestBank


class FarmerWolfGoatRiddleTest {

    FarmerWolfGoatRiddle riddle = new FarmerWolfGoatRiddle()

    @Test
    public void findAllPossibleSolutions() {
        def solutions = riddle.findAllPossibleSolutions()
        // the ordering of solutions will be random
        assert solutions.size() == 2
        assert solutions[0].size() == 8
        assert solutions[1].size() == 8

        def assertAllSolutions = { Deque<RiverState> solution1, Deque<RiverState> solution2 ->
            assertFirstSolution(solution1)
            assertSecondSolution(solution2)
        }

        // assert with the correct expectations regardless of which is the first solution
        if (solutions[0].contains(new RiverState([riddle.wolf], [riddle.farmer, riddle.goat, riddle.cabbage], EastBank))) {
            assertAllSolutions(solutions[0], solutions[1])
        } else {
            assertAllSolutions(solutions[1], solutions[0])
        }
    }

    private void assertFirstSolution(Deque<RiverState> solution) {
        assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], [], WestBank)
        assert solution.poll() == new RiverState([riddle.wolf, riddle.cabbage], [riddle.farmer, riddle.goat], EastBank)
        assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.cabbage], [riddle.goat], WestBank)
        assert solution.poll() == new RiverState([riddle.wolf], [riddle.farmer, riddle.goat, riddle.cabbage], EastBank)
        assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.goat], [riddle.cabbage], WestBank)
        assert solution.poll() == new RiverState([riddle.goat], [riddle.farmer, riddle.wolf, riddle.cabbage], EastBank)
        assert solution.poll() == new RiverState([riddle.farmer, riddle.goat], [riddle.wolf, riddle.cabbage], WestBank)
        assert solution.poll() == new RiverState([], [riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], EastBank)
        assert solution.poll() == null
    }

    private void assertSecondSolution(Deque<RiverState> solution) {
        assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], [], WestBank)
        assert solution.poll() == new RiverState([riddle.wolf, riddle.cabbage], [riddle.farmer, riddle.goat], EastBank)
        assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.cabbage], [riddle.goat], WestBank)
        assert solution.poll() == new RiverState([riddle.cabbage], [riddle.farmer, riddle.goat, riddle.wolf], EastBank)
        assert solution.poll() == new RiverState([riddle.farmer, riddle.cabbage, riddle.goat], [riddle.wolf], WestBank)
        assert solution.poll() == new RiverState([riddle.goat], [riddle.farmer, riddle.wolf, riddle.cabbage], EastBank)
        assert solution.poll() == new RiverState([riddle.farmer, riddle.goat], [riddle.wolf, riddle.cabbage], WestBank)
        assert solution.poll() == new RiverState([], [riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], EastBank)
        assert solution.poll() == null
    }
}

Sunday 31 March 2013

Fibonacci numbers in Python and Java

Fibonacci numbers, introduced by Leonardo Pisano Bigollo nearly a thousand years ago, were used to solve a fascinating puzzle about the growth of an idealised rabbit population. Below are the first 11 Fibonacci numbers F_n where n=0, 1, 2,..., 10:

sequence	F₀	F₁	F₂	F₃	F₄	F₅	F₆	F₇	F₈	F₉	F₁₀
number	0	1	1	2	3	5	8	13	21	34	55

or we can describe the Fibonacci numbers as

F_n = F_n-1 + F_n-2 (when n>=2)

To write a program to return a Fibonacci number given a sequence, there are many possible implementations. I shall present 2 solutions implemented in both Python and Java, where the first one is using recursion, and the second one is using a while loop, however, the performances of these 2 solutions are drastically different when the sequence is bigger.
Below are the results when running a Python test script. The recursion way is already showing very poor performance comparing to the while loop way.

> python test_fib.py 
.Working out Fibonacci(40)=102334155 recursively took 67.310534 seconds
.Working out Fibonacci(40)=102334155 with while loop took 1.8e-05 seconds
Working out Fibonacci(50)=12586269025 with while loop took 1.6e-05 seconds
Working out Fibonacci(60)=1548008755920 with while loop took 1.8e-05 seconds
..
----------------------------------------------------------------------
Ran 4 tests in 67.312s

OK

If we look at the results when running a similar timed test in Java, the same recursion is not as bad as in Python, though still far poorer than the while loop implementation.

> /somewhere/jdk1.6.0_33/bin/java -classpath ".:" fun.Fib
Working out Fibonacci(40)=102334155 recursively took 408 ms
Working out Fibonacci(40)=102334155 with while loop took 2033 ns
Working out Fibonacci(50)=12586269025 with while loop took 2115 ns
Working out Fibonacci(60)=1548008755920 with while loop took 2368 ns

The reason why recursion is slow in this case is because there are a lot of duplicated operations as the sequence grows. For example, to solve F(5), we need to solve F(4) once, F(3) twice, F(2) three times, and F(1) twice, as shown below

or the number of recursion required given a sequence n is

[1+(n-2)]*(n-2)/2 + (n-3) or n²/2-n/2-2  (when n>=4)

The complexity of using recursion is then actually O(n²), where, on the other hand, the while loop is O(n).
Below are the source code that are used in the test.
fib.py

import datetime


class Fib:
    def recursively(self, n):
        if n == 0:
            return 0
        elif n <= 2:
            return 1
        else:
            return self.recursively(n - 1) + self.recursively(n - 2)

    def whileLoop(self, n):
        if n == 0:
            return 0
        previous, fib, currentIndex = 0, 1, 1
        while currentIndex < n:
            previous, fib = fib, previous + fib
            currentIndex += 1

        return fib

    def timedRecursively(self, n):
        start = datetime.datetime.now()
        result = self.recursively(n)
        end = datetime.datetime.now()
        elapsed = end - start
        print "Working out Fibonacci({})={} recursively took {} seconds".format(str(n), str(result), str(elapsed.total_seconds()))
        return result

    def timedWhileLoop(self, n):
        start = datetime.datetime.now()
        result = self.whileLoop(n)
        end = datetime.datetime.now()
        elapsed = end - start
        print "Working out Fibonacci({})={} with while loop took {} seconds".format(str(n), str(result), str(elapsed.total_seconds()))
        return result

test_fib.py

import unittest
from fib import Fib


class TestFib(unittest.TestCase):
    def setUp(self):
        self.fib = Fib()

    def test_recursively(self):
        self.assertEqual(self.fib.recursively(0), 0)
        self.assertEqual(self.fib.recursively(1), 1)
        self.assertEqual(self.fib.recursively(2), 1)
        self.assertEqual(self.fib.recursively(3), 2)
        self.assertEqual(self.fib.recursively(4), 3)
        self.assertEqual(self.fib.recursively(5), 5)
        self.assertEqual(self.fib.recursively(6), 8)
        self.assertEqual(self.fib.recursively(7), 13)
        self.assertEqual(self.fib.recursively(8), 21)
        self.assertEqual(self.fib.recursively(9), 34)
        self.assertEqual(self.fib.recursively(10), 55)

    def test_whileLoop(self):
        self.assertEqual(self.fib.whileLoop(0), 0)
        self.assertEqual(self.fib.whileLoop(1), 1)
        self.assertEqual(self.fib.whileLoop(2), 1)
        self.assertEqual(self.fib.whileLoop(3), 2)
        self.assertEqual(self.fib.whileLoop(4), 3)
        self.assertEqual(self.fib.whileLoop(5), 5)
        self.assertEqual(self.fib.whileLoop(6), 8)
        self.assertEqual(self.fib.whileLoop(7), 13)
        self.assertEqual(self.fib.whileLoop(8), 21)
        self.assertEqual(self.fib.whileLoop(9), 34)
        self.assertEqual(self.fib.whileLoop(10), 55)

    def test_timedRecursively(self):
        self.assertEqual(self.fib.timedRecursively(40), 102334155)

    def test_timedWhileLoop(self):
        self.assertEqual(self.fib.timedWhileLoop(40), 102334155)
        self.assertEqual(self.fib.timedWhileLoop(50), 12586269025)
        self.assertEqual(self.fib.timedWhileLoop(60), 1548008755920)

unittest.main()

Fib.java

package fun;

public class Fib {


    public static void main(String[] args) {
        recursively(1);
        whileLoop(1);

        long start = System.nanoTime();

        long result = recursively(40);
        long elapsed = (System.nanoTime() - start) / 1000000;
        System.out.println(String.format("Working out Fibonacci(%s)=%s recursively took %s ms", 40, result, elapsed));

        start = System.nanoTime();
        result = whileLoop(40);
        elapsed = (System.nanoTime() - start);
        System.out.println(String.format("Working out Fibonacci(%s)=%s with while loop took %s ns", 40, result, elapsed));

        start = System.nanoTime();
        result = whileLoop(50);
        elapsed = (System.nanoTime() - start);
        System.out.println(String.format("Working out Fibonacci(%s)=%s with while loop took %s ns", 50, result, elapsed));

        start = System.nanoTime();
        result = whileLoop(60);
        elapsed = (System.nanoTime() - start);
        System.out.println(String.format("Working out Fibonacci(%s)=%s with while loop took %s ns", 60, result, elapsed));
    }

    static long recursively(long i) {
        if (i == 0) return 0;
        if (i <= 2) return 1;
        else return recursively(i - 1) + recursively(i - 2);
    }

    static long whileLoop(long i) {
        long previous = 0, fib = 1, currentIndex = 1;
        while (currentIndex < i) {
            long newFib = previous + fib;
            previous = fib;
            fib = newFib;
            currentIndex++;
        }
        return fib;
    }
}

Saturday 30 March 2013

First impression of Python

Born some two decades ago, Python has certainly accumulated enough interests to become one of the most popular programming languages used in the financial industry.

Designed to be a write-less-do-more language, Python has certain aspects that differ from Java:

It is an interpreted language, which means no compilation is required.
It is not a strongly typed language
Its statement grouping is done by indentation instead of open and close braces
It does not require declaration of variable or argument
It supports complex number, for example (using Python interactive interpreter)

>>> 1j + 2J
(-2+0j)
>>> (1+2j) * complex(2,3)
(-4+7j)

Strings can be subscripted. For example, the first character of a string has index 0.

>>> word="HelloWorld"
>>> word[4]
'o'
>>> word[1:3]   # using the slice notation, returning characters from index 1 to 2
'el'

It supports multiple assignment, for example, 'a,b=b,a+b', where the expression on the right-hand side 'b,a+b' are evaluated first before any of the assignments take place
Not surprisingly, it supports lamda forms

>>> def incresedBy(n):
...     return lambda x: x + n
...
>>> f = incresedBy(42)
>>> f(1)
43

It supports map and reduce functions as part of its functional programming offerings
It supports Tuples, which are immutable and usually contain an heterogeneous sequence of elements

>>> t=123,321,'hello'
>>> t
(123, 321, 'hello')

It allows multiple inheritance, for example

class DerivedClassName(Base1, Base2, Base3):
    <statement-1>
    .
    .
    .
    <statement-N>

Useful Sed command examples in UNIX

Substitute a regular expression into a new value

sed 's|old|new|' < file1 > file2

echo "old, and some other old stuffs" | sed 's|old|new|'

echo "old, and some other old stuffs" | sed 's|old|new|g'

Pick any delimiter you like

echo "old, and some other old stuffs" | sed 's/old/new/g'
echo "old, and some other old stuffs" | sed 's:old:new:g'

Use & as the matched string

echo "old, and some other old stuffs" | sed 's|[a-z]*|(&)|'

echo "old, 123 and 456 some other old stuffs" | sed -r 's|[0-9]+|& &|'

Friday 29 March 2013

Useful Find command examples in UNIX

Find command is one of the most useful commands in UNIX, as there are many options out there to accomplish various search tasks.

Find files which have been modified for more than or less than a certain time

find . -mtime +30

find . -mtime -15

find . -mtime 10

-mmin n, which refers to file's data was last modified n minutes ago
-amin n, which refers to file's data was last accessed n minutes ago
-atime n, which refers to file's data was last accessed n days ago
-type f, which searches for regular file
-type d, which searches for directory,
-exec command, which executes command by passing in the find results,
-perm /mode, which searches files with given permission

Remove files which are older than n days

find /some-dir/* -mtimes +30 -type f -exec rm {} \;

find /some-dir/* -mtimes +30 -type f -print0 | xargs -0 rm

Find files which contain particular strings

find . -name "*.txt" -print | xargs grep "some-string"

Using Python SimpleHTTPServer to share files

There are cases when I want to share some files within the intranet to other users.

Python comes with a simple built-in module called SimpleHTTPServer that can turn any directory in the system into the web server directory. I literally just need a single command line to accomplish this task, given Python is shipped with openSUSE!

Assuming I am running linux, and I want to share my home directory ~

cd ~

Start up the http server on port 10001

youyang@monkey-dev-01:~> python -m SimpleHTTPServer 10001

Now the http server is running on port 10001. Open a browser and type the following address, where monkey-dev-01 happens to be my hostname

http://monkey-dev-01:10001

Since there is no index.html under my home directory, the files in the home directory will be listed. Now my purpose is served, but there is one slight problem -- the program is running on the foreground, meaning I can't do anything on that open terminal as it continues to display the http requests when I browse the directories. To change the running process from foreground to background, I need to press CTRL+Z to pause the process, as shown below

^Z
[1]+  Stopped                 python -m SimpleHTTPServer 10001

and type 'bg' to move the process to run in the background

youyang@monkey-dev-01:~> bg
[1]+ python -m SimpleHTTPServer 10001 &

We can also use 'jobs -l' to display the jobs in current session

youyang@monkey-dev-01:~> jobs -l
[1]+  4127 Running                 python -m SimpleHTTPServer 10001 &

We can also start up the http server running in the background directly, by appending a '&' at the end of the command

youyang@monkey-dev-01:~> python -m SimpleHTTPServer 10001&

Unix Cheat Sheet


List a directory		Alternatively, type 'ls --help' or 'man ls' for more information
	ls [path]	list given path without any options. Without the path, it is listing current directory
	ls -l [path]	list given path in long listing format
	ls -h [path]	list given path in human readable format
	ls -a [path]	list given path, including hidden files (which has prefix dot)
	ls -lrt [path]	list given path, combining different options. long listing format, sort by modification time in reverse order
	ls [path]	list given path, redirect listing results into a file
	ls [path] \| more	list given path, showing listing results on one screen at a time

Change to a directory
	cd <directory>	change to the given directory
	cd ~	change to user's home directory
	cd /	change to root directory
	cd ..	go to parent directory
	cd ../<directory>	go to a sibling directory

Print current working directory
	pwd	display the current working directory

Create directory(ies)
	mkdir <directory>	create the directory under current working directory
	mkdir -p <dir1>/<dir2>	create directory2 under directory1; create parent directory2 as needed

Remove files or directory(ies)
	rm <file>	remove the file
	rm -rf <directory>	remove the contents of given directory recursively without prompt

Move a file or a directory
	mv <source> <dest>	rename source as dest, and move source to dest

Copy a file or a directory
	cp <source> <dest>	make a copy from source to dest
	cp -r <source> <dest>	copy recursively from source directory to dest directory

Download a file from http website
	wget <url>	download the resource to current directory
	curl <url>	download the resource to current directory

View a text file
	less <file>	view file page by page, with a lot of features
	more <file>	view file with screen length
	cat <file>	view file, but it will scroll to the end of file
	cat <file> \| more	view file, and piped the file content to more command

Create a text file
	touch <file>	create an empty file
	vi <file>	use vi text editor to create a file (need to save the file though)
	echo "foo" >> <file>	create a file with content 'foo'
	cat > <file>	enter multi-line texts and press control-d to save the content to a file

Change a file's modification or access time
	stat <file1>	display the stat of given file
	touch -m -d '1 Mar 2013 02:53' file.txt	change the modification time of file.txt to be '1 Mar 2013 02:53', and create the file if it doesn't exist
	touch -a -d '1 Mar 2013 02:53' file.txt	change the access time of file.txt to be '1 Mar 2013 02:53', and create the file if it doesn't exist

Compare two files
	diff <file1> <file2>	show the differences between file1 and file2
	sdiff <file1> <file2>	show file1 and file2 side by side

Pipes and redirections
	<command> > <file>	redirect command output to a file, e.g. ll > file.txt writes current directory listings to file.txt
	<command> >> <file>	appends command output to the end of a file
	<command> < <file>	redirect the standard input from the file
	<command> < <file1> > <file2>	redirect the standard input from the file1, and then redirect the standard output to file2
	<command1> \| <command2>	pipe the output of command1 as the input of command2

File permissions		read=4, write=2, execute=1
	chmod 700 <file(s)>	file(s) owner can read, write and execute
	chmod 600 <file(s)>	file(s) owner can read and write, but not execute
	chmod 400 <file(s)>	file(s) owner can read only
	chmod 755 <file(s)>	file(s) owner can read, write and execute; both group owner and others can read and write
	chmod 644 <file(s)>	file(s) owner can read and write; both group owner and others can read only
	chmod u+x <file(s)>	grant file(s) owner execute permission
	chmod u-x <file(s)>	remove execute permission from file(s) owner
	chmod a+rw <file(s)>	grant everyone read and write permission to the file(s)

Other useful comands
	df -h	report file system disk space usage in human readable format
	du -h	estimate file space usage in human readable format
	date	display or set the current system date and time
	top	display linux tasks, as well as CPU and memory stats
	ps -aef	display a snapshot of current processes
	ln -s <target> <link_name>	create a symbolic link to the target with link_name
	alias	display current defined aliases or create an alias
	!<command>	invoke previous run of the given command

Wednesday 20 March 2013

Most import aspects of software development

After a few years of software development, it is a good time to recap what are the most important aspects that I should be aware of when working any project.

The code base should be well tested that it has unit-tests, integration/acceptance-tests, external/smoke-tests. Unit test is about testing a single class by mocking its dependencies if necessary. Integration or acceptance tests should be testing the APIs provided by your application, particular front-to-back workflows, story-based features (as someone, I should be able to do something, etc.). External or smoke tests are verifying that the contracts between your application and your external dependencies are intact. There are a few reasons why automated tests are important:
1. developers have the confidence to refactor or make functional changes as they can run the automated tests to check whether there are any breaking changes
2. no more painful and error prone manual tests conducted by the QA team -- cost savings, although UAT (User-Acceptance Test) is still required
3. greatly shortens the release cycle because you spend a lot less time doing the manual tests
Modeling of the business objects is important because it not only affects how easy it is to query your business objects from the persistent layer, but also the performance of querying. One of the project that I worked on was storing a map of data into an oracle database as a CLOB data type (imagine a map is marshalled as XML string and stored as a DB column). It turned out to be rather inefficient to query any field inside that CLOB column, as you have to load the whole CLOB to your application!
Availability and scalability are also crucial to any production system. Even Facebook does not guarantee that their query engine is available 100% of time (as they still have to do the manual failover of their name nodes, part of their Hadoop DFS cluster, from time to time). I won't treat Facebook as a mission critical application that has to be available all the time, but you can sense that to achieve a 100% availability, you will have to really think about your failover or DR (Disaster-Recovery) plan. Scalability is a tricky topic, as you can either scale horizontally by adding more nodes, or scale vertically by adding more cores/memories to the existing nodes (further reading on Amdahl's law). Depends on where the bottleneck of growth is, you might have database scalability, application scalability, caching scalability, etc.

Tuesday 19 March 2013

An efficient Stack implementation with getMin() and getMax() methods

When we use stack to keep a collection of numbers, we might want to know what are the minimum and maximum numbers in the stack.

Suppose we are pushing {72,12,55,78,22,34} into a stack, this is what happens

Action	Stack Elements (Top ... Bottom)	Min	Max
push(72)	72	72	72
push(12)	12,72	12	72
push(55)	55,12,72	12	72
push(78)	78,55,12,72	12	78
push(22)	22,78,55,12,72	12	78
push(34)	34,22,78,55,12,72	12	78

If we pop the stack above 3 times, the max should change to 72; if we pop 2 more times, the min should change to 72. It sounds like we need a data structure that keeps the current min and max when we push or pop the stack. Indeed, a stack is a perfect choice here. When pushing a number to the stack, we also push the current min and max values into the min and max stacks respectively. Similarly, when poping out a number from the stack, we also pop out from the min and max stacks, such that the top of the min and max stacks will always have the min and max values of the current stack respectively.

The complexity of the getMin() and getMax() methods are O(1), regardless of how many items are in the stack, same as the push(item) and pop() methods.

Here is an implementation of the Stack with getMin() and getMax() methods

import java.util.Stack;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public class SmartStack<E extends Comparable<E>> {

    private ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
    private ReentrantReadWriteLock.ReadLock readLock = readWriteLock.readLock();
    private ReentrantReadWriteLock.WriteLock writeLock = readWriteLock.writeLock();

    private Stack<E> mins = new Stack<E>();
    private Stack<E> maxs = new Stack<E>();
    private Stack<E> delegate = new Stack<E>();

    public E push(E item) {
        writeLock.lock();
        try {
            // stores the current min to the mins stack
            if (mins.isEmpty() || (!mins.isEmpty() && item.compareTo(mins.peek()) < 0)) {
                mins.push(item);
            } else {
                mins.push(mins.peek());
            }

            // stores the current max to the maxs stack
            if (maxs.isEmpty() || (!maxs.isEmpty() && item.compareTo(maxs.peek()) > 0)) {
                maxs.push(item);
            } else {
                maxs.push(maxs.peek());
            }

            return delegate.push(item);
        } finally {
            writeLock.unlock();
        }
    }

    public E pop() {
        writeLock.lock();
        try {
            E item = delegate.pop();
            // pop both mins and maxs stacks, such that peeking on them will tell us the current min or max
            mins.pop();
            maxs.pop();
            return item;
        } finally {
            writeLock.unlock();
        }
    }

    public E peek() {
        readLock.lock();
        try {
            return delegate.peek();
        } finally {
            readLock.unlock();
        }
    }

    public E getMin() {
        readLock.lock();
        try {
            return mins.peek();
        } finally {
            readLock.unlock();
        }
    }

    public E getMax() {
        readLock.lock();
        try {
            return maxs.peek();
        } finally {
            readLock.unlock();
        }
    }

    public boolean empty() {
        readLock.lock();
        try {
            return delegate.empty();
        } finally {
            readLock.unlock();
        }
    }

    public int search(E item) {
        readLock.lock();
        try {
            return delegate.search(item);
        } finally {
            readLock.unlock();
        }
    }
    
}

Here is the unit test of the above class written in Groovy

import org.junit.Before
import org.junit.Test

class SmartStackTest {

    SmartStack stack

    @Before
    public void before() {
        stack = new SmartStack()
        stack.push(72)
        stack.push(12)
        stack.push(55)
        stack.push(78)
        stack.push(22)
        stack.push(34)
    }

    @Test
    public void canGetCurrentMinValue() {
        assert stack.min == 12

        5.times { stack.pop() }

        assert stack.min == 72
    }

    @Test
    public void canGetCurrentMaxValue() {
        assert stack.max == 78

        4.times { stack.pop() }

        assert stack.max == 72
    }

    @Test
    public void canPeek() {
        assert stack.peek() == 34
    }

    @Test
    public void isEmpty() {
        assert !stack.empty()
    }

    @Test
    public void canSearch() {
        // the 1-based position from the top of the stack where the object is located
        assert stack.search(34) == 1
        assert stack.search(22) == 2
        assert stack.search(78) == 3
        assert stack.search(55) == 4
        assert stack.search(12) == 5
        assert stack.search(72) == 6
        // not in the stack should return -1
        assert stack.search(11112) == -1
    }
    
}

Sorting objects with one-way dependency using Topological Sort with Directed Graphs

Suppose there are a bunch of objects to be evaluated, each of which might depend on a number of other objects being evaluated first before it can be evaluated. The objects being depended on might depend on other objects. There is no cyclic dependency that object A depends on object B, and object B depends on object C, and object C depends on object A.

To solve this problem, we want to sort these objects such that when traversing through these objects, they should be ready for evaluation and their dependencies, if applicable, should have been evaluated. The sorting algorithm and data structure that are applicable for this problem is Topological Sort with Directed Graphs.

If the edges in a graph have a direction, the graph is called a directed graph, as shown above. The arrows in the figure show the direction of the edges. We also include two vertices here to represent isolated objects that don't depend on other objects.

The arrows in the figure represent evaluation direction. For example, to evaluate E, we must first evaluate A and B. To evaluate H, we might want a sorted collection like this:

ADBECFDGH or ADGBECFH

There are many other possible ordering, depending on the approach you take to implement the topological sort.

DirectedGraph.java

import java.util.*;

public class DirectedGraph<T> implements Iterable<T> {

    // key is a Node, value is a set of Nodes connected by outgoing edges from the key
    private final Map<T, Set<T>> graph = new HashMap<T, Set<T>>();

    public boolean addNode(T node) {
        if (graph.containsKey(node)) {
            return false;
        }

        graph.put(node, new HashSet<T>());
        return true;
    }

    public void addNodes(Collection<T> nodes) {
        for (T node : nodes) {
            addNode(node);
        }
    }

    public void addEdge(T src, T dest) {
        validateSourceAndDestinationNodes(src, dest);

        // Add the edge by adding the dest node into the outgoing edges
        graph.get(src).add(dest);
    }

    public void removeEdge(T src, T dest) {
        validateSourceAndDestinationNodes(src, dest);

        graph.get(src).remove(dest);
    }

    public boolean edgeExists(T src, T dest) {
        validateSourceAndDestinationNodes(src, dest);

        return graph.get(src).contains(dest);
    }

    public Set<T> edgesFrom(T node) {
        // Check that the node exists.
        Set<T> edges = graph.get(node);
        if (edges == null)
            throw new NoSuchElementException("Source node does not exist.");

        return Collections.unmodifiableSet(edges);
    }

    public Iterator<T> iterator() {
        return graph.keySet().iterator();
    }

    public int size() {
        return graph.size();
    }

    public boolean isEmpty() {
        return graph.isEmpty();
    }

    private void validateSourceAndDestinationNodes(T src, T dest) {
        // Confirm both endpoints exist
        if (!graph.containsKey(src) || !graph.containsKey(dest))
            throw new NoSuchElementException("Both nodes must be in the graph.");
    }

}

TolopogicalSort.java

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class TopologicalSort {

    public static <T> List<T> sort(DirectedGraph<T> graph) {
        DirectedGraph<T> reversedGraph = reverseGraph(graph);

        List<T> result = new ArrayList<T>();
        Set<T> visited = new HashSet<T>();

        /* We'll also maintain a third set consisting of all nodes that have
         * been fully expanded.  If the graph contains a cycle, then we can
         * detect this by noting that a node has been explored but not fully
         * expanded.
         */
        Set<T> expanded = new HashSet<T>();

        // Fire off a Depth-First Search from each node in the graph
        for (T node : reversedGraph)
            explore(node, reversedGraph, result, visited, expanded);

        return result;
    }


    /**
     * Recursively performs a Depth-First Search from the specified node, marking all nodes
     * encountered by the search.
     *
     * @param node     The node to begin the search from.
     * @param graph    The graph in which to perform the search.
     * @param result   A list holding the topological sort of the graph.
     * @param visited  A set of nodes that have already been visited.
     * @param expanded A set of nodes that have been fully expanded.
     */
    private static <T> void explore(T node, DirectedGraph<T> graph, List<T> result, Set<T> visited, Set<T> expanded) {
        if (visited.contains(node)) {
            // if this node has already been expanded, then it's already been assigned a
            // position in the final topological sort and we don't need to explore it again.
            if (expanded.contains(node)) return;

            // if it hasn't been expanded, it means that we've just found a node that is currently being explored,
            // and therefore is part of a cycle.  In that case, we should report an error.
            throw new IllegalArgumentException("A cycle was detected within the Graph when exploring node " + node.toString());
        }

        visited.add(node);

        // recursively explore all predecessors of this node
        for (T predecessor : graph.edgesFrom(node))
            explore(predecessor, graph, result, visited, expanded);

        result.add(node);
        expanded.add(node);
    }

    private static <T> DirectedGraph<T> reverseGraph(DirectedGraph<T> graph) {
        DirectedGraph<T> result = new DirectedGraph<T>();

        // Add all the nodes from the original graph
        for (T node : graph) {
            result.addNode(node);
        }

        // Scan over all the edges in the graph, adding their reverse to the reverse graph.
        for (T node : graph) {
            for (T endpoint : graph.edgesFrom(node)) {
                result.addEdge(endpoint, node);
            }
        }

        return result;
    }

}

Below demonstrate the setup of the directed graph and will print out the sorted result.

    public static void main(String[] args) {
        DirectedGraph graph = new DirectedGraph()
        graph.addNode("A");
        graph.addNode("B");
        graph.addNode("C");
        graph.addNode("D");
        graph.addNode("E");
        graph.addNode("F");
        graph.addNode("G");
        graph.addNode("H");
        graph.addNode("I");
        graph.addNode("J");

        graph.addEdge("A", "D");
        graph.addEdge("A", "E");
        graph.addEdge("B", "E");
        graph.addEdge("C", "F");
        graph.addEdge("D", "G");
        graph.addEdge("E", "H");
        graph.addEdge("F", "H");
        graph.addEdge("G", "H");

        List sorted = TopologicalSort.sort(graph)

        System.out.println(sorted)
    }

Merge Sort in Java

The merge sort is a much more efficient sorting algorithm than those simple ones, for example, bubble sort, selection sort, and insertion sort. While the simple sorting algorithms have O(N²) complexity, the merge sort is O(N*logN).

The merge sort routine tries to break down an array into 2 sub-arrays, sort these sub-arrays, and merge them back, in a recursive way, as you shall see in the implementation below, as well as the graph below.

import java.util.Arrays;
import java.util.Random;

public class MergeSortTest {

    static int numberOfElements = 10;
    static int maxValue = 100;

    public static void main(String[] args) {
        int[] original = createArrayWithRandomValues();
        // workspace array is used to keep the partial sorted results
        int[] workspace = new int[original.length];
        mergeSort(original, workspace, 0, numberOfElements - 1);

        System.out.println("The sorted array is " + Arrays.toString(original));
    }

    private static int[] createArrayWithRandomValues() {
        Random random = new Random();
        int[] original = new int[numberOfElements];
        for (int i = 0; i < numberOfElements; i++) {
            original[i] = random.nextInt(maxValue);
        }
        System.out.println("The original array is " + Arrays.toString(original));
        return original;
    }

    private static void mergeSort(int[] original, int[] workSpace, int lowerBound, int upperBound) {
        if (lowerBound != upperBound) {
            // find midpoint
            int mid = (lowerBound + upperBound) / 2;
            // sort lower half
            mergeSort(original, workSpace, lowerBound, mid);
            // sort upper half
            mergeSort(original, workSpace, mid + 1, upperBound);
            // merge them
            merge(original, workSpace, lowerBound, mid, upperBound);
        }
    }

    private static void merge(int[] original, int[] workSpace, int lowerBound, int middle, int upperBound) {
        // Copy both parts into the workspace array
        System.arraycopy(original, lowerBound, workSpace, lowerBound, upperBound + 1 - lowerBound);

        int leftPointer = lowerBound;
        int rightPointer = middle + 1;
        int originalPointer = lowerBound;

        // Copy the smallest values from either the left or the right side back to the original array
        while (leftPointer <= middle && rightPointer <= upperBound) {
            if (workSpace[leftPointer] <= workSpace[rightPointer]) {
                // copy and move the pointer on workspace to the next smallest value on left (if there is any)
                original[originalPointer++] = workSpace[leftPointer++];
            } else {
                // copy and move the pointer on workspace to the next smallest value on right (if there is any)
                original[originalPointer++] = workSpace[rightPointer++];
            }
        }
        // Copy the rest of the left side of the array into the target array
        while (leftPointer <= middle) {
            original[originalPointer++] = workSpace[leftPointer++];
        }

        // if there are elements left in the right side of the workspace, those
        // are the largest values of the merged array, so no action is required
    }
    
}

Monday 18 March 2013

Insertion Sort in Java

Insertion sort is about twice as fast as the bubble sort, and faster than selection sort.

In an insertion sort routine, we will pick a marker (starting from index=1), store the marked number, and compare the marked number with its neighbours on the left. Shift all the neighbours, which are larger than the marked number, one position to the right. After the shift, insert the marked number to the empty slot. Repeat this procedure until the marker index points to the last number on the right.

Below demonstrates one of the implementations of the insertion sort algorithm.

import java.util.Arrays;
import java.util.Random;

public class InsertionSortTest {

    static int numberOfElements = 10;
    static int maxValue = 100;

    public static void main(String[] args) {
        int[] original = createArrayWithRandomValues();

        insertionSort(original);

        System.out.println("The sorted array is " + Arrays.toString(original));
    }

    private static int[] createArrayWithRandomValues() {
        Random random = new Random();
        int[] original = new int[numberOfElements];
        for (int i = 0; i < numberOfElements; i++) {
            original[i] = random.nextInt(maxValue);
        }
        System.out.println("The original array is " + Arrays.toString(original));
        return original;
    }

    private static void insertionSort(int[] original) {
        int markerIndex, emptySlotIndex;
        for (markerIndex = 1; markerIndex < original.length; markerIndex++) {
            int marked = original[markerIndex]; // use a temp variable to store the marked number
            emptySlotIndex = markerIndex;
            while (emptySlotIndex > 0 && original[emptySlotIndex - 1] >= marked) {
                original[emptySlotIndex] = original[emptySlotIndex - 1]; // shift element right
                emptySlotIndex--; // moves the empty slot index one position to the left
            }
            original[emptySlotIndex] = marked; // insert marked element
        }
    }
    
}

Selection Sort in Java

Selection sort is slightly faster than bubble sort because the number of required swaps is reduced from O(N²) to O(N), while the number of comparisons remains O(N²).

Below demonstrates one of the implementations of the selection sort algorithm.

import java.util.Arrays;
import java.util.Random;

public class SelectionSortTest {

    static int numberOfElements = 10;
    static int maxValue = 100;

    public static void main(String[] args) {
        int[] original = createArrayWithRandomValues();

        selectionSort(original);

        System.out.println("The sorted array is " + Arrays.toString(original));
    }

    private static int[] createArrayWithRandomValues() {
        Random random = new Random();
        int[] original = new int[numberOfElements];
        for (int start = 1; start <= numberOfElements; start++) {
            original[start - 1] = random.nextInt(maxValue);
        }
        System.out.println("The original array is " + Arrays.toString(original));
        return original;
    }


    private static void selectionSort(int[] original) {
        int outer, inner, minIndex;
        for (outer = 0; outer < original.length - 1; outer++) { // from 1st element to next-to-last element
            minIndex = outer; // each round the min index starts from the outer index
            for (inner = outer + 1; inner < original.length; inner++) { // from outer + 1 element to the last element
                if (original[inner] < original[minIndex]) { // remember the index for the new minimum
                    minIndex = inner;
                }
            }

            // swap the elements between outer index and min index
            int temp = original[outer];
            original[outer] = original[minIndex];
            original[minIndex] = temp;

        }
    }
    
}

Bubble Sort in Java

Given an array of n (n>1) numbers, we want to sort them such that the smallest number is on the left, and the largest number is on the right. This is where all the sorting fun comes from.

Bubble sort is one of the slowest of all sorting algorithms, but it is conceptually the simplest.

In bubble sort routine, we start by comparing two numbers in the array, say the 1st (array[0]) and 2nd (array[1]) numbers. If the 1st number is larger than the 2nd number, we swap them, otherwise don't do anything. Then we move over one position and compare the 2nd and 3rd numbers, and so on, until we've completed the comparison of the second last and the last number. At this point, the largest number has been bubbled up all the way to the right (array[n-1]). On the 2nd round, we only need to do the comparison and swap from array[0] to array[n-2], which will ensure array[n-2] has the 2nd largest value in the array. Continue like this until the (n-1)th round.

The algorithm complexity, when expressed in Big O notation, is O(N²). This is because, given an array of 10 numbers, you will need 9 comparisons on the first round to bubble the largest number to the right, and 8 comparisons on the second round, and so on, while the last round requires 1 comparison. That is

9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 45, or n*(n-1)/2,

where n is the number of elements in the array. On the other hand, a swap is only required when the number on the left is larger than the one on the right, although in the worst case scenario, when the initial array is sorted such that the largest number is on the left and the smallest number is on the right, a swap is necessary for every comparison. Therefore, the number of swaps and comparisons are proportional to n².This is how we work out bubble sort's complexity O(N²).

Below demonstrates one of the implementations of the bubble sort algorithm.

import java.util.Arrays;
import java.util.Random;

public class BubbleSortTest {

    static int numberOfElements = 10;
    static int maxValue = 100;

    public static void main(String[] args) {
        int[] original = createArrayWithRandomValues();

        bubbleSort(original);

        System.out.println("The sorted array is " + Arrays.toString(original));
    }

    private static int[] createArrayWithRandomValues() {
        Random random = new Random();
        int[] original = new int[numberOfElements];
        for (int start = 1; start <= numberOfElements; start++) {
            original[start - 1] = random.nextInt(maxValue);
        }
        System.out.println("The original array is " + Arrays.toString(original));
        return original;
    }

    private static void bubbleSort(int[] original) {
        int outer, inner;
        for (outer = original.length - 1; outer > 1; outer--) { // from last element to the 2nd element
            for (inner = 0; inner < outer; inner++) { // from the 1st element to outer index - 1
                if (original[inner] > original[inner + 1]) { // bubble up the bigger value to the right
                    int temp = original[inner];
                    original[inner] = original[inner + 1];
                    original[inner + 1] = temp;
                }
            }
        }
    }

}