Tuesday, 18 June 2013

Use Teamcity as the Continuous Integration and Build Management Server

I've been using Teamcity as the continuous integration and build management server for the last few years. Not until recently I tried out Hudson (or Jenkins) as the alternative solution. It is quite frustrated to learn that Hudson, being an open-sourced project, doesn't even support remote build (where I can submit my local changes to Teamcity and run a selected builds, and commit once all the builds are successful).

In the scenario where the project has quite a number of builds for various tests, e.g. unit tests, integration tests, external tests, smoke tests, etc., running these tests with my local changes to check whether I break any existing tests or not is rather crucial in agile development approach, where I always want to have a feedback in the shortest amount of time. Of course, without being able to do a remote run with multiple test suites running in parallel, you may argue that I can still run the test suites locally one by one, and only check in when all the tests pass locally, in which case, Hudson is sufficient enough and it is FREE!

I don't intend to devote my blog to compare Teamcity and Hudson, but given the experiences of using both, I can conclude that Teamcity is no doubt the first choice in terms of CI and build management, as its feature is a perfect match for a developer with TDD (Test-Driven Development) and agile mind set. Having said that, I am gonna demonstrate how easy it is to setup the Teamcity for my project hosted on Git. JetBrains does offer a Teamcity professional version which is FREE and comes with a limit of 20 build configurations and 3 build agents.

Installation

  • Download the latest version for Linux (or whichever suits your OS)

  • Unzip the downloaded file, e.g. tar -zxvf TeamCity-7.1.5.tar.gz
    youyang@monkey-dev-01:/app> ll
    total 12
    drwxr-xr-x  6 youyang users 4096 Mar 22 02:31 monkey
    drwxr-xr-x 12 youyang users 4096 Jun 14 09:31 TeamCity
    drwxr-xr-x 13 youyang users 4096 Jun 15 09:11 TeamCity-BuildAgent
    youyang@monkey-dev-01:/app> 
    

  • start up teamcity, e.g. ./app/TeamCity/bin/startup.sh , on port 8111 by default


  • Configuration

  • browse to http://localhost:8111 to verify that Teamcity has started up successfully, which will take a minute or two before it asks you to setup an admin account

  • once the admin account is created and logged in, go to the Administration menu to create a new Project. Specify name and description on the General tab, shown as below

  • create a new VCS root on the VCS Roots tab, and choose Git (or whatever VCS you use), then fill in the information as shown below

  • go back to the project config, and create a build configuration

  • complete a minimum 3-step configuration for the build. Fill in the build name and description for Step 1; Choose the VCS root specified before in Step 2; Specify the build step with Ant (or other build runners the project is using), as shown below

  • this is the minimum configuration for a project with only one build configuration, which is good enough for the demo

  • Teamcity Agent

  • With a project setup on Teamcity server, we still need to install agents so that we can run the builds on them. Go Administration->Install Build Agents, notice that the link for the zip file distribution is http://monkey-dev-01:8111/update/buildAgent.zip

  • On the machine that you want to install your build agent, wget the zip file from, e.g. http://monkey-dev-01:8111/update/buildAgent.zip, unzip to, e.g. /app/TeamCity-BuildAgent folder

  • create the buildAgent.properties from the template file buildAgent.dist.properties under, e.g. /app/TeamCity-BuildAgent/conf folder

  • setup the following properties in the buildAgent.properties file

  • # name of the agent appearing on Teamcity server
    name=Monkey-Dev-01
    # setup the JDK
    env.JAVA_HOME=/app/monkey/java
    
  • start up the agent in the background by typing "/app/TeamCity-BuildAgent/bin/agent.sh start"

  • Wait a few minutes before your agent shown under the Agents->Connected tab on Teamcity server, as shown below

  • Once the agent is connected, you are ready to run your first build on Teamcity:

  • If you are using Intellij IDE, install the Teamcity plugin so that you can send a remote run to the Teamcity server by selecting which builds to run.


  • Friday, 10 May 2013

    Bitwise Operations in Java

    The bitwise operations in Java are rarely discussed, and you can sense that by reading the official tutorials on this topic. However, there are interesting usages that you should probably be aware of.

    The bitwise operators operates on integral types (int and long). Before explaining how bitwise operators work, you should have an idea of how integers are represented in the binary form. As mentioned by the official primitive data type tutorials, the int data type is a 32-bit signed 2's complement integer, whose value ranges from -2^32 to 2^32-1, inclusively. On the other hand, the long data type is a 64-bit signed 2's complement integer, whose value ranges from -2^64 to 2^64-1, inclusively. In both cases, the highest order bit of int and long shows the sign of the value and is not part of the numeric value, where 0 being positive and 1 being negative.

    For example, an int value of 1 in binary form looks like this
    00000000000000000000000000000001
    
    while an int value of -1 in binary form looks like this
    11111111111111111111111111111111
    
    The negative value is transformed in 2 steps:
    Step 1: bitwise complement of +1, such that
    00000000000000000000000000000001 
    becomes
    11111111111111111111111111111110
    
    Step 2: add 1, such that
    11111111111111111111111111111110 
    becomes
    11111111111111111111111111111111
    
    Given a negative value, where the highest (leftmost) bit is 1, you can work out what value the binary form represents in 2 steps:
    Step 1: bitwise complement of the binary value, such that
    11111111111111111111111111111011
    becomes
    00000000000000000000000000000100
    
    Step 2: add 1, such that
    00000000000000000000000000000100
    becomes
    00000000000000000000000000000101
    
    Therefore, the original binary form represents -5.

    The bitwise operators

    OperatorNameExampleResultDescription
    a & band6 & 901 if both bits are 1, or 0 if either bit is 0
    a | bor 6 | 9151 if either bit is 1, or 0 if both bits are 0
    a ^ bxor 6 ^ 9151 if both bits are different, 0 if both bits are the same
    ~anot ~9 -10Inverts the bits. (equivalent to +1 and then change sign)
    n << pleft shift60 << 2240Shifts the bits of n left p positions. Zero bits are shifted into the low-order positions.
    n >> pright shift60 >> 215Shifts the bits of n right p positions. If n is a 2's complement signed number, the sign bit is shifted into the high-order positions.
    n >>> punsigned right shift-60 >>> 204095Shifts the bits of n right p positions. Zeros are shifted into the high-order positions.

    Packing and Unpacking

    Multiple int values of limited range can be "packed" into one int, by using the bitwise operators. For example, we have the following integer variables age (range 0~255, or 8-bit), gender (0~1, or 1-bit), weight (range 0~255, or 8-bit), and height (range 0-255, or 8-bit). These integers can be packed and unpacked into/from a 32-bit integer like this:

    Represent boolean values in a byte

    Suppose we have 8 boolean values that are true or false, and we want to store them in a single byte, and sent it across the network, and then unpack it and use it to toggle 8 different things:


    Multiply and Divide

    When using x << 3, it is equivalent to multiply x by 8 (or 2^3); similarly, when using x >> 2, it is equivalent to divide x by 4 (or 2^2). In fact, a smart enough compiler will replace your multiplication or division expressions using bit shift operators such that it is more compact and faster. So there is no need to explicitly use bit shift operator if you want multiply or divide by a power of 2!

    Thursday, 18 April 2013

    I/O Redirection in Bash

    Since Bash is a superset of Bourne shell, the IO redirection is the same across the Bourne shell family. To find out exactly which shell you are using in Unix or Linux, typing in command
    echo $SHELL
    
    shall display an executable command with its path, for example
    PathShell
    /bin/shBourne shell
    /bin/bashBourne Again SHell
    /bin/cshC SHell
    /bin/kshKorn SHell

    Bash takes input from standard input (stdin), writes output to standard output (stdout), and writes error output to standard error (stderr). Standard input is connected to the terminal keyboard, while standard output and error are connected to the terminal screen, by default. Redirection of I/O is accomplished by using a redirection metacharacter followed by the desired destination (stdout & stderr) or source (stdin). Bash uses file descriptor numbers to refer to the standard I/O, where 0 is standard input, 1 is standard output, and 2 is standard error. Below are the common redirection for the Bourne shell family:

    Meta CharacterAction
    >Redirect standard output
    >>Append to standard output
    2>Redirect standard error
    2>&1Redirect standard error to standard ouput
    2>&1|Redirect standard error to standard ouput, and pipe them to another command
    <Redirect standard input
    |Pipe standard output to another command

    Please note that because < and > are referring to standard input and output respectively, the numbers 0 and 1 are not required in this case.

    Thursday, 4 April 2013

    Sorting Algorithm Performance Comparisons with Caliper's Micro Benchmarking

    After inspired by Martin Thompson's presentation on Performance Testing Java Application, I decided to try out the micro benchmarking framework Caliper on the sorting algorithms that I recently wrote in order to understand better how different sorting algorithms behaves given an array of randomly distributed values.

    It was actually a bit involving to find out where I could download the caliper JAR file, as it was not available on their google code base. You can download it from the maven repository, or grab it from my git described at the bottom of this article.

    I created a SortingAlgorithmsBenchmark class that extends SimpleBenchmark, and some public methods, whose names have prefix 'time'. Then use the caliper Runner to run the benchmarks on the SortingAlgorithmsBenchmark class, as shown below
    import com.google.caliper.Runner;
    import com.google.caliper.SimpleBenchmark;
    import sorting.*;
    
    import java.util.Arrays;
    import java.util.Random;
    
    public class SortingAlgorithmsBenchmark extends SimpleBenchmark {
    
        private static final int SIZE = 10000;
        private static final int MAX_VALUE = 10000;
        private int[] values;
    
        @Override
        protected void setUp() throws Exception {
            values = new int[SIZE];
            Random generator = new Random();
            for (int i = 0; i < values.length; i++) {
                values[i] = generator.nextInt(MAX_VALUE);
            }
        }
    
        public void timeBubbleSort(int reps) {
            for (int i = 0; i < reps; i++) {
                BubbleSort.sort(values);
            }
        }
    
        public void timeInsertionSort(int reps) {
            for (int i = 0; i < reps; i++) {
                InsertionSort.sort(values);
            }
        }
    
        public void timeSelectionSort(int reps) {
            for (int i = 0; i < reps; i++) {
                SelectionSort.sort(values);
            }
        }
    
        public void timeMergeSort(int reps) {
            for (int i = 0; i < reps; i++) {
                MergeSort.sort(values);
            }
        }
    
        public void timeShellSort(int reps) {
            for (int i = 0; i < reps; i++) {
                ShellSort.sort(values);
            }
        }
    
        public void timeArraysSort(int reps) {
            for (int i = 0; i < reps; i++) {
                Arrays.sort(values);
            }
        }
    
        public static void main(String[] args) {
            Runner.main(SortingAlgorithmsBenchmark.class, args);
        }
    
    }
    

    Here are the results when running the benchmarks with a random array of size 10,000. Apparently, bubble sort and selection sort are the slowest of all, with selection sort behaving rather unstable (you will notice the selection sort benchmark is quite different across different runs. It is pleased to see that the sorting algorithm (Dual-Pivot Quick Sort) used by Arrays performs fairly well with a medium size array.

    0% Scenario{vm=java, trial=0, benchmark=BubbleSort} 75703105.85 ns; σ=3428843.80 ns @ 10 trials
    17% Scenario{vm=java, trial=0, benchmark=InsertionSort} 73305.66 ns; σ=712.37 ns @ 6 trials
    33% Scenario{vm=java, trial=0, benchmark=SelectionSort} 49624130.05 ns; σ=345774.06 ns @ 3 trials
    50% Scenario{vm=java, trial=0, benchmark=MergeSort} 478932.15 ns; σ=683.99 ns @ 3 trials
    67% Scenario{vm=java, trial=0, benchmark=ShellSort} 262061.27 ns; σ=11605.29 ns @ 10 trials
    83% Scenario{vm=java, trial=0, benchmark=ArraysSort} 8063.53 ns; σ=20167.07 ns @ 10 trials
    
        benchmark       us linear runtime
       BubbleSort 75703.11 ==============================
    InsertionSort    73.31 =
    SelectionSort 49624.13 ===================
        MergeSort   478.93 =
        ShellSort   262.06 =
       ArraysSort     8.06 =
    
    vm: java
    trial: 0
    

    Increasing the array size to 100,000, and excluding the bubble sort and selection sort in the benckmarks, we have another set of results, where insertion sort starts to behave terribly in this case, and the dual-pivot quick sort is still the quickest.

    0% Scenario{vm=java, trial=0, benchmark=InsertionSort} 2031873339.00 ns; σ=3247182.51 ns @ 3 trials
    25% Scenario{vm=java, trial=0, benchmark=MergeSort} 5701257.43 ns; σ=37183.00 ns @ 3 trials
    50% Scenario{vm=java, trial=0, benchmark=ShellSort} 2277851.07 ns; σ=21077.62 ns @ 3 trials
    75% Scenario{vm=java, trial=0, benchmark=ArraysSort} 210179.59 ns; σ=164843.22 ns @ 10 trials
    
        benchmark      us linear runtime
    InsertionSort 2031873 ==============================
        MergeSort    5701 =
        ShellSort    2278 =
       ArraysSort     210 =
    
    vm: java
    trial: 0
    

    You can find the source code for the sorting algorithms here, and the benchmarks runner here, all of which are part of my github project DataStructureAndAlgorithms

    Wednesday, 3 April 2013

    A Different Builder Pattern Example in Java

    Builder design pattern is often used to separate the construction of a complex object from its various representations. I will show you a nicer way of writing a Builder pattern, rather than the example shown in the wiki.

    First of all, I created a POJO called Vehicle, which has some simple properties about its model, its make, etc. Then, nested within this POJO, I created a Builder that holds some default properties which can be used to build a vehicle. The builder can be created by a static method newBuilder(), and we can override the vehicle properties by chaining the with* methods, and finally return the constructed vehicle using the build() method.

    public class Vehicle {
    
        private String model;
        private String make;
        private int numberOfWheels;
        private double height;
        private double width;
        private double length;
        private int numberOfDoors;
    
        private Vehicle(String model, String make, int numberOfWheels, double height, double width, double length, int numberOfDoors) {
            this.model = model;
            this.make = make;
            this.numberOfWheels = numberOfWheels;
            this.height = height;
            this.width = width;
            this.length = length;
            this.numberOfDoors = numberOfDoors;
        }
    
        public static class Builder {
    
            private String model = "M5";
            private String make = "BMW";
            private int numberOfWheels = 4;
            private double height = 1.45;
            private double width = 1.9;
            private double length = 4.62;
            private int numberOfDoors = 5;
    
            public static Builder newBuilder() {
                return new Builder();
            }
    
            public Builder withModel(String model) {
                this.model = model;
                return this;
            }
    
            public Builder withMake(String make) {
                this.make = make;
                return this;
            }
    
            public Vehicle build() {
                return new Vehicle(model, make, numberOfWheels, height, width,length, numberOfDoors);
            }
        }
    }
    

    Let me use a unit test, written in Groovy, to demonstrate exactly how we use the builder. I created a test to show how we can build a default vehicle without providing any overrides to the default vehicle properties, and another test to show how we can override the default vehicle properties to create a different vehicle.

    import org.junit.Test
    
    class VehicleTest {
    
        @Test
        public void canBuildADefaultVehicle() {
            def defaultVehicle = Vehicle.Builder.newBuilder().build()
            assert defaultVehicle.model == "M5"
            assert defaultVehicle.make == "BMW"
            assert defaultVehicle.numberOfWheels == 4
            assert defaultVehicle.height == 1.45
            assert defaultVehicle.width == 1.9
            assert defaultVehicle.length == 4.62
            assert defaultVehicle.numberOfDoors == 5
        }
    
        @Test
        public void canBuildADifferentVehicle() {
            def defaultVehicle = Vehicle.Builder.newBuilder().withModel("M3").build()
            assert defaultVehicle.model == "M3"
            assert defaultVehicle.make == "BMW"
            assert defaultVehicle.numberOfWheels == 4
            assert defaultVehicle.height == 1.45
            assert defaultVehicle.width == 1.9
            assert defaultVehicle.length == 4.62
            assert defaultVehicle.numberOfDoors == 5
        }
    }
    

    Some says the chaining syntax makes the building block easier to read. :)

    Tuesday, 2 April 2013

    Solving Farmer-Wolf-Goat-Cabbage riddle with Groovy/Java


  • The Riddle - Farmer Wolf Goat Cabbage
  • A farmer is on the west bank of a river with a wolf, a goat and a cabbage in his care. Without his presence the wolf would eat the goat or the goat would eat the cabbage. The wolf is not interested in the cabbage. The farmer wishes to bring his three charges across the river. However the boat available to him can only carry one of the wolf, goat and cabbage besides himself. The puzzle is: is there a sequence of river crossings so that the farmer can transfer himself and the other three all intact to the east bank?
  • The Solution
  • It is a task that given an initial state, a final state, and a set of rules, one will start from the initial state and try to reach to the final state by passing through some intermediate states. Each move will transform from one state to the other, and there might be multiple valid moves from a given state. Such a collection of interconnected states can be represented by State Space Graph.
    Let's use 'f', 'c', 'g', 'w' to denote the farmer, the cabbage, the goat, and the wolf, and use '|' to separate the river where left of the '|' denotes west bank and right of the '|' denotes east bank. Initially, they are all at the west bank of the river, which is represented as 'fcgw |' as shown below. We can solve the riddle by figuring out what the possible and valid moves are, using either Breadth-First Search or Depth-First Search, on a state space graph shown below
       

  • Depth First Search
  • I use DFS to find the first possible solution to the riddle, where it looks like this

    or with a possibility of backtracking like this

  • Breadth First Search
  • I can build a complete state space graph using BFS, excluding the red states as shown below

  • Sample Source Code
  • Implemented in a mix of Groovy and Java, where Java is mainly used for the POJOs, the sample source code, including the unit test, demonstrates how to build a complete state space graph (which is also a directed graph) and how to find all possible solutions based on the state space graph.
    DirectedGraph.java
    import java.util.*;
    
    public class DirectedGraph<T> implements Iterable<T> {
    
        // key is a Node, value is a set of Nodes connected by outgoing edges from the key
        private final Map<T, Set<T>> graph = new HashMap<T, Set<T>>();
    
        public boolean addNode(T node) {
            if (graph.containsKey(node)) {
                return false;
            }
    
            graph.put(node, new HashSet<T>());
            return true;
        }
    
        public void addNodes(Collection<T> nodes) {
            for (T node : nodes) {
                addNode(node);
            }
        }
    
        public void addEdge(T src, T dest) {
            validateSourceAndDestinationNodes(src, dest);
    
            // Add the edge by adding the dest node into the outgoing edges
            graph.get(src).add(dest);
        }
    
        public void removeEdge(T src, T dest) {
            validateSourceAndDestinationNodes(src, dest);
    
            graph.get(src).remove(dest);
        }
    
        public boolean edgeExists(T src, T dest) {
            validateSourceAndDestinationNodes(src, dest);
    
            return graph.get(src).contains(dest);
        }
    
        public Set<T> edgesFrom(T node) {
            // Check that the node exists.
            Set<T> edges = graph.get(node);
            if (edges == null)
                throw new NoSuchElementException("Source node does not exist.");
    
            return Collections.unmodifiableSet(edges);
        }
    
        public Iterator<T> iterator() {
            return graph.keySet().iterator();
        }
    
        public int size() {
            return graph.size();
        }
    
        public boolean isEmpty() {
            return graph.isEmpty();
        }
    
        private void validateSourceAndDestinationNodes(T src, T dest) {
            // Confirm both endpoints exist
            if (!graph.containsKey(src) || !graph.containsKey(dest))
                throw new NoSuchElementException("Both nodes must be in the graph.");
        }
    
    }
    
    BoatLocation.java
    public enum BoatLocation {
        WestBank, EastBank
    }
    
    RiverRole.java, implemented Comparable interface in order to be used in a SortedSet
    public class RiverRole implements Comparable<RiverRole> {
    
        public final String name;
        public final boolean canSailTheBoat;
    
        public RiverRole(String name, boolean canSailTheBoat) {
            this.name = name;
            this.canSailTheBoat = canSailTheBoat;
        }
    
        @Override
        public int compareTo(RiverRole o) {
            return this.name.compareTo(o.name);
        }
    
        @Override
        public String toString() {
            return name;
        }
    }
    
    RiverState.java
    import java.util.Collection;
    import java.util.SortedSet;
    import java.util.TreeSet;
    
    public class RiverState {
    
        private SortedSet<RiverRole> westBank;
        private SortedSet<RiverRole> eastBank;
        private BoatLocation boatLocation;
    
        public RiverState(Collection<RiverRole> westBank, Collection<RiverRole> eastBank, BoatLocation boatLocation) {
            this.westBank = new TreeSet<RiverRole>(westBank);
            this.eastBank = new TreeSet<RiverRole>(eastBank);
            this.boatLocation = boatLocation;
        }
    
        public SortedSet<RiverRole> getWestBank() {
            return new TreeSet<RiverRole>(westBank);
        }
    
        public SortedSet<RiverRole> getEastBank() {
            return new TreeSet<RiverRole>(eastBank);
        }
    
        public BoatLocation getBoatLocation() {
            return boatLocation;
        }
    
        @Override
        public String toString() {
            return "State{westBank=" + westBank + ", eastBank=" + eastBank + ", boatLocation=" + boatLocation + "}";
        }
    
        public boolean equals(Object o) {
            if (this == o) return true;
            if (!(o instanceof RiverState)) return false;
    
            RiverState riverState = (RiverState) o;
    
            return eastBank.equals(riverState.eastBank) &&
                   westBank.equals(riverState.westBank) &&
                   boatLocation.equals(riverState.boatLocation);
        }
    
        public int hashCode() {
            int result;
            result = westBank.hashCode();
            result = 31 * result + eastBank.hashCode();
            result = 31 * result + boatLocation.hashCode();
            return result;
        }
    }
    
    FarmerWolfGoatRiddle.groovy
    import graph.DirectedGraph
    import org.codehaus.groovy.runtime.MethodClosure
    
    import static fun.rivercrossing.BoatLocation.EastBank
    import static fun.rivercrossing.BoatLocation.WestBank
    
    public class FarmerWolfGoatRiddle {
    
        /*  Farmer Wolf Goat Puzzle
            A farmer is on the east bank of a river with a wolf, goat and cabbage in his care. Without his presence the wolf would eat
            the goat or the goat would eat the cabbage. The wolf is not interested in the cabbage. The farmer wishes to bring his three
            charges across the river. However the boat available to him can only carry one of the wolf, goat and cabbage besides
            himself. The puzzle is: is there a sequence of river crossings so that the farmer can transfer himself and the other three all
            intact to the west bank?
        */
    
        RiverRole farmer = new RiverRole("farmer", true);
        RiverRole wolf = new RiverRole("wolf", false);
        RiverRole goat = new RiverRole("goat", false);
        RiverRole cabbage = new RiverRole("cabbage", false);
    
        RiverState initialState = new RiverState([farmer, wolf, goat, cabbage], [], BoatLocation.WestBank)
        RiverState finalState = new RiverState([], [farmer, wolf, goat, cabbage], EastBank)
    
        Closure<Boolean> areRolesHarmonisedOnRiverBank = { Collection<RiverRole> roles ->
            // describe clearly what the rules are
            if (!roles.contains(farmer)) {
                if (roles.contains(wolf) && roles.contains(goat)) return false
                if (roles.contains(goat) && roles.contains(cabbage)) return false
            }
            return true
        }
    
    
        public Collection<Deque<RiverState>> findAllPossibleSolutions() {
            DirectedGraph<RiverState> graph = buildCompleteGraph()
    
            Collection<Deque<RiverState>> solutions = []
            Deque<RiverState> currentSolution = new ArrayDeque<RiverState>()
    
            explore(graph, initialState, currentSolution, solutions)
    
            return solutions
        }
    
        // using Depth First Search (Stack Implementation) to identify a solution
        private void explore(DirectedGraph<RiverState> graph, RiverState currentNode, Deque<RiverState> currentSolution, Collection<Deque<RiverState>> solutions) {
            currentSolution.push(currentNode)
            graph.edgesFrom(currentNode).each { RiverState node ->
                if (node == finalState) {
                    // add the current solution, including the final node, to the solutions
                    currentSolution.push(node)
                    solutions.add(new ArrayDeque<RiverState>(currentSolution))
                    currentSolution.pop()
                } else {
                    // recursively explore
                    explore(graph, node, currentSolution, solutions)
                }
            }
            currentSolution.pop()
        }
    
        // using Breath First Search to build the complete graph
        private DirectedGraph<RiverState> buildCompleteGraph() {
            DirectedGraph<RiverState> graph = new DirectedGraph<RiverState>()
            graph.addNode(initialState)
    
            Set<RiverState> visitedNodes = new HashSet<RiverState>()
    
            // use a queue to keep the states to be visited in sequence
            Deque<RiverState> currentStates = new ArrayDeque<RiverState>()
            currentStates.add(initialState)
    
            while (!currentStates.isEmpty()) {
                if (currentStates.peek() != finalState) {
                    def currentState = currentStates.removeFirst()
                    if (!visitedNodes.contains(currentState)) {
                        visitedNodes.add(currentState)
                        def possibleStates = findNextPossibleStates(currentState)
                        def validStates = possibleStates.findAll { RiverState possibleState ->
                            // matched state should not be one of the visited states, and it should pass the validations
                            return !visitedNodes.contains(possibleState) && passRuleValidations(possibleState)
                        }
                        // add successors and edges
                        graph.addNodes(validStates)
                        validStates.each { graph.addEdge(currentState, it) }
    
                        // add to current states so that we can visit them next
                        currentStates.addAll(validStates)
                    }
                } else {
                    currentStates.removeFirst() // remove the final state from the currentStates queue
                }
            }
    
            return graph
        }
    
        private boolean passRuleValidations(RiverState state) {
            return areRolesHarmonisedOnRiverBank(state.eastBank) && areRolesHarmonisedOnRiverBank(state.westBank)
        }
    
        private List<RiverState> findNextPossibleStates(RiverState currentState) {
            // from the last visited state, find out who can sail the boat, and where is the sailor
            List<RiverState> possibleStates = []
    
            transitFrom(currentState, possibleStates)
    
            return possibleStates
        }
    
        private void transitFrom(RiverState currentState, List<RiverState> possibleStates) {
            // check where the boat is
            BoatLocation boatOrigin = currentState.boatLocation
            BoatLocation boatDestination
    
            // we use method closure to get a copy of west bank or east bank roles
            MethodClosure origin, destination
    
            // multiple assignments
            (origin, destination, boatDestination) = boatOrigin == WestBank ?
                [currentState.&getWestBank, currentState.&getEastBank, EastBank] :
                [currentState.&getEastBank, currentState.&getWestBank, WestBank]
    
            def possibleSailors = origin().findAll { RiverRole role -> role.canSailTheBoat }
    
            possibleSailors?.each { RiverRole sailor ->
                // pick 0 passengers onto the boat
                Collection<RiverRole> onBoat = [sailor]
                possibleStates.add(createNextPossibleState(origin, destination, onBoat, boatDestination))
    
                Collection<RiverRole> possiblePassengers = origin() as Collection<RiverRole>
                possiblePassengers.remove(sailor)
    
                // pick 1 passenger onto the boat
                possiblePassengers.each { RiverRole passenger ->
                    onBoat = [passenger, sailor]
                    possibleStates.add(createNextPossibleState(origin, destination, onBoat, boatDestination))
                }
            }
        }
    
        private RiverState createNextPossibleState(MethodClosure origin, MethodClosure destination, Collection<RiverRole> onBoat, BoatLocation boatDestination) {
            Collection<RiverRole> newDestination = destination() as Collection<RiverRole>
            Collection<RiverRole> newOrigin = origin() as Collection<RiverRole>
    
            // boat transits from origin to destination
            newOrigin.removeAll(onBoat)
            newDestination.addAll(onBoat)
    
            return boatDestination == WestBank ? new RiverState(newDestination, newOrigin, boatDestination) : new RiverState(newOrigin, newDestination, boatDestination)
        }
    
    
    }
    
    FarmerWolfGoatRiddleTest.groovy
    import org.junit.Test
    
    import static fun.rivercrossing.BoatLocation.EastBank
    import static fun.rivercrossing.BoatLocation.WestBank
    
    
    class FarmerWolfGoatRiddleTest {
    
        FarmerWolfGoatRiddle riddle = new FarmerWolfGoatRiddle()
    
        @Test
        public void findAllPossibleSolutions() {
            def solutions = riddle.findAllPossibleSolutions()
            // the ordering of solutions will be random
            assert solutions.size() == 2
            assert solutions[0].size() == 8
            assert solutions[1].size() == 8
    
            def assertAllSolutions = { Deque<RiverState> solution1, Deque<RiverState> solution2 ->
                assertFirstSolution(solution1)
                assertSecondSolution(solution2)
            }
    
            // assert with the correct expectations regardless of which is the first solution
            if (solutions[0].contains(new RiverState([riddle.wolf], [riddle.farmer, riddle.goat, riddle.cabbage], EastBank))) {
                assertAllSolutions(solutions[0], solutions[1])
            } else {
                assertAllSolutions(solutions[1], solutions[0])
            }
        }
    
        private void assertFirstSolution(Deque<RiverState> solution) {
            assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], [], WestBank)
            assert solution.poll() == new RiverState([riddle.wolf, riddle.cabbage], [riddle.farmer, riddle.goat], EastBank)
            assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.cabbage], [riddle.goat], WestBank)
            assert solution.poll() == new RiverState([riddle.wolf], [riddle.farmer, riddle.goat, riddle.cabbage], EastBank)
            assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.goat], [riddle.cabbage], WestBank)
            assert solution.poll() == new RiverState([riddle.goat], [riddle.farmer, riddle.wolf, riddle.cabbage], EastBank)
            assert solution.poll() == new RiverState([riddle.farmer, riddle.goat], [riddle.wolf, riddle.cabbage], WestBank)
            assert solution.poll() == new RiverState([], [riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], EastBank)
            assert solution.poll() == null
        }
    
        private void assertSecondSolution(Deque<RiverState> solution) {
            assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], [], WestBank)
            assert solution.poll() == new RiverState([riddle.wolf, riddle.cabbage], [riddle.farmer, riddle.goat], EastBank)
            assert solution.poll() == new RiverState([riddle.farmer, riddle.wolf, riddle.cabbage], [riddle.goat], WestBank)
            assert solution.poll() == new RiverState([riddle.cabbage], [riddle.farmer, riddle.goat, riddle.wolf], EastBank)
            assert solution.poll() == new RiverState([riddle.farmer, riddle.cabbage, riddle.goat], [riddle.wolf], WestBank)
            assert solution.poll() == new RiverState([riddle.goat], [riddle.farmer, riddle.wolf, riddle.cabbage], EastBank)
            assert solution.poll() == new RiverState([riddle.farmer, riddle.goat], [riddle.wolf, riddle.cabbage], WestBank)
            assert solution.poll() == new RiverState([], [riddle.farmer, riddle.wolf, riddle.goat, riddle.cabbage], EastBank)
            assert solution.poll() == null
        }
    }
    

    Sunday, 31 March 2013

    Fibonacci numbers in Python and Java

    Fibonacci numbers,  introduced by Leonardo Pisano Bigollo nearly a thousand years ago, were used to solve a fascinating puzzle about the growth of an idealised rabbit population. Below are the first 11 Fibonacci numbers Fn where n=0, 1, 2,..., 10:

    sequenceF0F1F2F3F4F5F6F7F8F9F10
    number011235813213455

    or we can describe the Fibonacci numbers as
    Fn = Fn-1 + Fn-2 (when n>=2)
    
    To write a program to return a Fibonacci number given a sequence, there are many possible implementations. I shall present 2 solutions implemented in both Python and Java, where the first one is using recursion, and the second one is using a while loop, however, the performances of these 2 solutions are drastically different when the sequence is bigger.
    Below are the results when running a Python test script. The recursion way is already showing very poor performance comparing to the while loop way.
    > python test_fib.py 
    .Working out Fibonacci(40)=102334155 recursively took 67.310534 seconds
    .Working out Fibonacci(40)=102334155 with while loop took 1.8e-05 seconds
    Working out Fibonacci(50)=12586269025 with while loop took 1.6e-05 seconds
    Working out Fibonacci(60)=1548008755920 with while loop took 1.8e-05 seconds
    ..
    ----------------------------------------------------------------------
    Ran 4 tests in 67.312s
    
    OK
    
    If we look at the results when running a similar timed test in Java, the same recursion is not as bad as in Python, though still far poorer than the while loop implementation.
    > /somewhere/jdk1.6.0_33/bin/java -classpath ".:" fun.Fib
    Working out Fibonacci(40)=102334155 recursively took 408 ms
    Working out Fibonacci(40)=102334155 with while loop took 2033 ns
    Working out Fibonacci(50)=12586269025 with while loop took 2115 ns
    Working out Fibonacci(60)=1548008755920 with while loop took 2368 ns
    
    The reason why recursion is slow in this case is because there are a lot of duplicated operations as the sequence grows. For example, to solve F(5), we need to solve F(4) once, F(3) twice, F(2) three times, and F(1) twice, as shown below

    or the number of recursion required given a sequence n is
    [1+(n-2)]*(n-2)/2 + (n-3) or n2/2-n/2-2  (when n>=4)
    
    The complexity of using recursion is then actually O(n2), where, on the other hand, the while loop is O(n).
    Below are the source code that are used in the test.
    fib.py
    import datetime
    
    
    class Fib:
        def recursively(self, n):
            if n == 0:
                return 0
            elif n <= 2:
                return 1
            else:
                return self.recursively(n - 1) + self.recursively(n - 2)
    
        def whileLoop(self, n):
            if n == 0:
                return 0
            previous, fib, currentIndex = 0, 1, 1
            while currentIndex < n:
                previous, fib = fib, previous + fib
                currentIndex += 1
    
            return fib
    
        def timedRecursively(self, n):
            start = datetime.datetime.now()
            result = self.recursively(n)
            end = datetime.datetime.now()
            elapsed = end - start
            print "Working out Fibonacci({})={} recursively took {} seconds".format(str(n), str(result), str(elapsed.total_seconds()))
            return result
    
        def timedWhileLoop(self, n):
            start = datetime.datetime.now()
            result = self.whileLoop(n)
            end = datetime.datetime.now()
            elapsed = end - start
            print "Working out Fibonacci({})={} with while loop took {} seconds".format(str(n), str(result), str(elapsed.total_seconds()))
            return result
    

    test_fib.py
    import unittest
    from fib import Fib
    
    
    class TestFib(unittest.TestCase):
        def setUp(self):
            self.fib = Fib()
    
        def test_recursively(self):
            self.assertEqual(self.fib.recursively(0), 0)
            self.assertEqual(self.fib.recursively(1), 1)
            self.assertEqual(self.fib.recursively(2), 1)
            self.assertEqual(self.fib.recursively(3), 2)
            self.assertEqual(self.fib.recursively(4), 3)
            self.assertEqual(self.fib.recursively(5), 5)
            self.assertEqual(self.fib.recursively(6), 8)
            self.assertEqual(self.fib.recursively(7), 13)
            self.assertEqual(self.fib.recursively(8), 21)
            self.assertEqual(self.fib.recursively(9), 34)
            self.assertEqual(self.fib.recursively(10), 55)
    
        def test_whileLoop(self):
            self.assertEqual(self.fib.whileLoop(0), 0)
            self.assertEqual(self.fib.whileLoop(1), 1)
            self.assertEqual(self.fib.whileLoop(2), 1)
            self.assertEqual(self.fib.whileLoop(3), 2)
            self.assertEqual(self.fib.whileLoop(4), 3)
            self.assertEqual(self.fib.whileLoop(5), 5)
            self.assertEqual(self.fib.whileLoop(6), 8)
            self.assertEqual(self.fib.whileLoop(7), 13)
            self.assertEqual(self.fib.whileLoop(8), 21)
            self.assertEqual(self.fib.whileLoop(9), 34)
            self.assertEqual(self.fib.whileLoop(10), 55)
    
        def test_timedRecursively(self):
            self.assertEqual(self.fib.timedRecursively(40), 102334155)
    
        def test_timedWhileLoop(self):
            self.assertEqual(self.fib.timedWhileLoop(40), 102334155)
            self.assertEqual(self.fib.timedWhileLoop(50), 12586269025)
            self.assertEqual(self.fib.timedWhileLoop(60), 1548008755920)
    
    unittest.main()
    

    Fib.java
    package fun;
    
    public class Fib {
    
    
        public static void main(String[] args) {
            recursively(1);
            whileLoop(1);
    
            long start = System.nanoTime();
    
            long result = recursively(40);
            long elapsed = (System.nanoTime() - start) / 1000000;
            System.out.println(String.format("Working out Fibonacci(%s)=%s recursively took %s ms", 40, result, elapsed));
    
            start = System.nanoTime();
            result = whileLoop(40);
            elapsed = (System.nanoTime() - start);
            System.out.println(String.format("Working out Fibonacci(%s)=%s with while loop took %s ns", 40, result, elapsed));
    
            start = System.nanoTime();
            result = whileLoop(50);
            elapsed = (System.nanoTime() - start);
            System.out.println(String.format("Working out Fibonacci(%s)=%s with while loop took %s ns", 50, result, elapsed));
    
            start = System.nanoTime();
            result = whileLoop(60);
            elapsed = (System.nanoTime() - start);
            System.out.println(String.format("Working out Fibonacci(%s)=%s with while loop took %s ns", 60, result, elapsed));
        }
    
        static long recursively(long i) {
            if (i == 0) return 0;
            if (i <= 2) return 1;
            else return recursively(i - 1) + recursively(i - 2);
        }
    
        static long whileLoop(long i) {
            long previous = 0, fib = 1, currentIndex = 1;
            while (currentIndex < i) {
                long newFib = previous + fib;
                previous = fib;
                fib = newFib;
                currentIndex++;
            }
            return fib;
        }
    }
    

    Saturday, 30 March 2013

    First impression of Python

    Born some two decades ago, Python has certainly accumulated enough interests to become one of the most popular programming languages used in the financial industry.

    Designed to be a write-less-do-more language, Python has certain aspects that differ from Java:
    1. It is an interpreted language, which means no compilation is required. 
    2. It is not a strongly typed language
    3. Its statement grouping is done by indentation instead of open and close braces
    4. It does not require declaration of variable or argument
    5. It supports complex number, for example (using Python interactive interpreter)
    6. >>> 1j + 2J
      (-2+0j)
      >>> (1+2j) * complex(2,3)
      (-4+7j)
      
    7. Strings can be subscripted. For example, the first character of a string has index 0.
    8. >>> word="HelloWorld"
      >>> word[4]
      'o'
      >>> word[1:3]   # using the slice notation, returning characters from index 1 to 2
      'el'
      
    9. It supports multiple assignment, for example, 'a,b=b,a+b', where the expression on the right-hand side 'b,a+b' are evaluated first before any of the assignments take place
    10. Not surprisingly, it supports lamda forms
    11. >>> def incresedBy(n):
      ...     return lambda x: x + n
      ...
      >>> f = incresedBy(42)
      >>> f(1)
      43
      
    12. It supports map and reduce functions as part of its functional programming offerings
    13. It supports Tuples, which are immutable and usually contain an heterogeneous sequence of elements
    14. >>> t=123,321,'hello'
      >>> t
      (123, 321, 'hello')
      
    15. It allows multiple inheritance, for example
    16. class DerivedClassName(Base1, Base2, Base3):
          <statement-1>
          .
          .
          .
          <statement-N>
      

    Useful Sed command examples in UNIX

    1. Substitute a regular expression into a new value
    2. A simple example is to change the first occurrence of 'old' on each line from file1 to 'new' and save as file2
      sed 's|old|new|' < file1 > file2
      where 's' is the substitute command, '|' is the delimiter, 'old' is the search pattern in regular expression, and 'new' is the replacement string. We can also test string substitution using 'echo' like this
      echo "old, and some other old stuffs" | sed 's|old|new|'
      which will output 'new, and some other old stuffs'. Note that in this case it only replace the first occurrence of 'old' with 'new'. If we want to replace all 'old' with 'new', we could append a global replacement symbol 'g' after the last delimiter '|'
      echo "old, and some other old stuffs" | sed 's|old|new|g'
      which will output 'new, and some other new stuffs'
    3. Pick any delimiter you like
    4. We were using '|' as the delimiter in the above examples. In fact, we can pick any character as the delimiter, as long as it is not in the string we are looking for. For example,
      echo "old, and some other old stuffs" | sed 's/old/new/g'
      echo "old, and some other old stuffs" | sed 's:old:new:g'
      
      have the same effect.
    5. Use & as the matched string
    6. We could search for a pattern and add some characters on top of the original string like this
      echo "old, and some other old stuffs" | sed 's|[a-z]*|(&)|'
      which will output '(old), and some other old stuffs'. The special character '&' represents the matching pattern. We could have any number of '&' in the replacement string
      echo "old, 123 and 456 some other old stuffs" | sed -r 's|[0-9]+|& &|'
      which will output 'old, 123 123 and 456 some other old stuffs'. Note that we use '-r' here to indicate we are using extended regular expression such that sed will support the use of '+'.

    Friday, 29 March 2013

    Useful Find command examples in UNIX

    Find command is one of the most useful commands in UNIX, as there are many options out there to accomplish various search tasks.

    • Find files which have been modified for more than or less than a certain time
    • We can use 'find * -mtime n' to search files under current directory based on modification time. The numeric argument n can be specified as +n, meaning greater than n days; -n, meaning less than n days; or n, meaning exactly n days. For example,
      to find files which were last modified for more than 30 days ago
      find . -mtime +30
      to find files which were last modified for less than 15 days ago
      find . -mtime -15
      to find files which were last modified for exactly 10 days ago
      find . -mtime 10
      There are other options like
      -mmin n, which refers to file's data was last modified n minutes ago
      -amin n, which refers to file's data was last accessed n minutes ago
      -atime n, which refers to file's data was last accessed n days ago
      -type f, which searches for regular file
      -type d, which searches for directory,
      -exec command, which executes command by passing in the find results,
      -perm /mode, which searches files with given permission 
      
    • Remove files which are older than n days
    • Since we understand how to search for files which were last modified for more than n days, we can then use the rm command to remove them
      find /some-dir/* -mtimes +30 -type f -exec rm {} \;
      
      '-exec' is used to pass in commands like 'rm'; '{}' is replaced by the file name that 'find' returns; and the '-exec' command has to end with '\;'.
      Another way will be
      find /some-dir/* -mtimes +30 -type f -print0 | xargs -0 rm
      where 'xargs' reads 'find' results and execute 'rm' on each of the 'find' results
    • Find files which contain particular strings
    • Below is an example to search all txt files starting from current directory for string "some-string"
      find . -name "*.txt" -print | xargs grep "some-string"
      

    Using Python SimpleHTTPServer to share files

    There are cases when I want to share some files within the intranet to other users.

    Python comes with a simple built-in module called SimpleHTTPServer that can turn any directory in the system into the web server directory. I literally just need a single command line to accomplish this task, given Python is shipped with openSUSE!

    Assuming I am running linux, and I want to share my home directory ~
    cd ~
    Start up the http server on port 10001
    youyang@monkey-dev-01:~> python -m SimpleHTTPServer 10001
    Now the http server is running on port 10001. Open a browser and type the following address, where monkey-dev-01 happens to be my hostname
    http://monkey-dev-01:10001
    Since there is no index.html under my home directory, the files in the home directory will be listed. Now my purpose is served, but there is one slight problem -- the program is running on the foreground, meaning I can't do anything on that open terminal as it continues to display the http requests when I browse the directories. To change the running process from foreground to background, I need to press CTRL+Z to pause the process, as shown below
    ^Z
    [1]+  Stopped                 python -m SimpleHTTPServer 10001
    
    and type 'bg' to move the process to run in the background
    youyang@monkey-dev-01:~> bg
    [1]+ python -m SimpleHTTPServer 10001 &
    
    We can also use 'jobs -l' to display the jobs in current session
    youyang@monkey-dev-01:~> jobs -l
    [1]+  4127 Running                 python -m SimpleHTTPServer 10001 &
    
    We can also start up the http server running in the background directly, by appending a '&' at the end of the command
    youyang@monkey-dev-01:~> python -m SimpleHTTPServer 10001&

    Unix Cheat Sheet


    List a directoryAlternatively, type 'ls --help' or 'man ls' for more information
    ls [path]list given path without any options. Without the path, it is listing current directory
    ls -l [path]list given path in long listing format
    ls -h [path]list given path in human readable format
    ls -a [path]list given path, including hidden files (which has prefix dot)
    ls -lrt [path]list given path, combining different options. long listing format, sort by modification time in reverse order
    ls [path] list given path, redirect listing results into a file
    ls [path] | morelist given path, showing listing results on one screen at a time

    Change to a directory
    cd <directory>change to the given directory
    cd ~change to user's home directory
    cd /change to root directory
    cd ..go to parent directory
    cd ../<directory>go to a sibling directory

    Print current working directory
    pwddisplay the current working directory

    Create directory(ies)
    mkdir <directory>create the directory under current working directory
    mkdir -p <dir1>/<dir2>create directory2 under directory1; create parent directory2 as needed

    Remove files or directory(ies)
    rm <file>remove the file
    rm -rf <directory>remove the contents of given directory recursively without prompt

    Move a file or a directory
    mv <source> <dest>rename source as dest, and move source to dest

    Copy a file or a directory
    cp <source> <dest>make a copy from source to dest
    cp -r <source> <dest>copy recursively from source directory to dest directory

    Download a file from http website
    wget <url>download the resource to current directory
    curl <url>download the resource to current directory

    View a text file
    less <file>view file page by page, with a lot of features
    more <file>view file with screen length
    cat <file>view file, but it will scroll to the end of file
    cat <file> | moreview file, and piped the file content to more command

    Create a text file
    touch <file>create an empty file
    vi <file>use vi text editor to create a file (need to save the file though)
    echo "foo" >> <file>create a file with content 'foo'
    cat > <file>enter multi-line texts and press control-d to save the content to a file

    Change a file's modification or access time
    stat <file1>display the stat of given file
    touch -m -d '1 Mar 2013 02:53' file.txtchange the modification time of file.txt to be '1 Mar 2013 02:53', and create the file if it doesn't exist
    touch -a -d '1 Mar 2013 02:53' file.txtchange the access time of file.txt to be '1 Mar 2013 02:53', and create the file if it doesn't exist

    Compare two files
    diff <file1> <file2>show the differences between file1 and file2
    sdiff <file1> <file2>show file1 and file2 side by side

    Pipes and redirections
    <command> > <file>redirect command output to a file, e.g. ll > file.txt writes current directory listings to file.txt
    <command> >> <file>appends command output to the end of a file
    <command> < <file>redirect the standard input from the file
    <command> < <file1> > <file2>redirect the standard input from the file1, and then redirect the standard output to file2
    <command1> | <command2>pipe the output of command1 as the input of command2

    File permissionsread=4, write=2, execute=1
    chmod 700 <file(s)>file(s) owner can read, write and execute
    chmod 600 <file(s)>file(s) owner can read and write, but not execute
    chmod 400 <file(s)>file(s) owner can read only
    chmod 755 <file(s)>file(s) owner can read, write and execute; both group owner and others can read and write
    chmod 644 <file(s)>file(s) owner can read and write; both group owner and others can read only
    chmod u+x <file(s)>grant file(s) owner execute permission
    chmod u-x <file(s)>remove execute permission from file(s) owner
    chmod a+rw <file(s)>grant everyone read and write permission to the file(s)

    Other useful comands
    df -hreport file system disk space usage in human readable format
    du -hestimate file space usage in human readable format
    datedisplay or set the current system date and time
    topdisplay linux tasks, as well as CPU and memory stats
    ps -aefdisplay a snapshot of current processes
    ln -s <target> <link_name>create a symbolic link to the target with link_name
    aliasdisplay current defined aliases or create an alias
    !<command>invoke previous run of the given command

    Wednesday, 20 March 2013

    Most import aspects of software development

    After a few years of software development, it is a good time to recap what are the most important aspects that I should be aware of when working any project.
    1. The code base should be well tested that it has unit-tests, integration/acceptance-tests, external/smoke-tests. Unit test is about testing a single class by mocking its dependencies if necessary. Integration or acceptance tests should be testing the APIs provided by your application, particular front-to-back workflows, story-based features (as someone, I should be able to do something, etc.). External or smoke tests are verifying that the contracts between your application and your external dependencies are intact. There are a few reasons why automated tests are important:
      1. developers have the confidence to refactor or make functional changes as they can run the automated tests to check whether there are any breaking changes
      2. no more painful and error prone manual tests conducted by the QA team -- cost savings, although UAT (User-Acceptance Test) is still required
      3. greatly shortens the release cycle because you spend a lot less time doing the manual tests
    2. Modeling of the business objects is important because it not only affects how easy it is to query your business objects from the persistent layer, but also the performance of querying. One of the project that I worked on was storing a map of data into an oracle database as a CLOB data type (imagine a map is marshalled as XML string and stored as a DB column). It turned out to be rather inefficient to query any field inside that CLOB column, as you have to load the whole CLOB to your application!
    3. Availability and scalability are also crucial to any production system. Even Facebook does not guarantee that their query engine is available 100% of time (as they still have to do the manual failover of their name nodes, part of their Hadoop DFS cluster, from time to time). I won't treat Facebook as a mission critical application that has to be available all the time, but you can sense that to achieve a 100% availability, you will have to really think about your failover or DR (Disaster-Recovery) plan. Scalability is a tricky topic, as you can either scale horizontally by adding more nodes, or scale vertically by adding more cores/memories to the existing nodes (further reading on Amdahl's law). Depends on where the bottleneck of growth is, you might have database scalability, application scalability, caching scalability, etc.

    Tuesday, 19 March 2013

    An efficient Stack implementation with getMin() and getMax() methods

    When we use stack to keep a collection of numbers, we might want to know what are the minimum and maximum numbers in the stack. 

    Suppose we are pushing {72,12,55,78,22,34} into a stack, this is what happens

    Action Stack Elements (Top ... Bottom) Min Max
    push(72)727272
    push(12)12,721272
    push(55)55,12,721272
    push(78)78,55,12,721278
    push(22)22,78,55,12,721278
    push(34)34,22,78,55,12,721278

    If we pop the stack above 3 times, the max should change to 72; if we pop 2 more times, the min should change to 72. It sounds like we need a data structure that keeps the current min and max when we push or pop the stack. Indeed, a stack is a perfect choice here. When pushing a number to the stack, we also push the current min and max values into the min and max stacks respectively. Similarly, when poping out a number from the stack, we also pop out from the min and max stacks, such that the top of the min and max stacks will always have the min and max values of the current stack respectively.

    The complexity of the getMin() and getMax() methods are O(1), regardless of how many items are in the stack, same as the push(item) and pop() methods.

    Here is an implementation of the Stack with getMin() and getMax() methods
    import java.util.Stack;
    import java.util.concurrent.locks.ReentrantReadWriteLock;
    
    public class SmartStack<E extends Comparable<E>> {
    
        private ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
        private ReentrantReadWriteLock.ReadLock readLock = readWriteLock.readLock();
        private ReentrantReadWriteLock.WriteLock writeLock = readWriteLock.writeLock();
    
        private Stack<E> mins = new Stack<E>();
        private Stack<E> maxs = new Stack<E>();
        private Stack<E> delegate = new Stack<E>();
    
        public E push(E item) {
            writeLock.lock();
            try {
                // stores the current min to the mins stack
                if (mins.isEmpty() || (!mins.isEmpty() && item.compareTo(mins.peek()) < 0)) {
                    mins.push(item);
                } else {
                    mins.push(mins.peek());
                }
    
                // stores the current max to the maxs stack
                if (maxs.isEmpty() || (!maxs.isEmpty() && item.compareTo(maxs.peek()) > 0)) {
                    maxs.push(item);
                } else {
                    maxs.push(maxs.peek());
                }
    
                return delegate.push(item);
            } finally {
                writeLock.unlock();
            }
        }
    
        public E pop() {
            writeLock.lock();
            try {
                E item = delegate.pop();
                // pop both mins and maxs stacks, such that peeking on them will tell us the current min or max
                mins.pop();
                maxs.pop();
                return item;
            } finally {
                writeLock.unlock();
            }
        }
    
        public E peek() {
            readLock.lock();
            try {
                return delegate.peek();
            } finally {
                readLock.unlock();
            }
        }
    
        public E getMin() {
            readLock.lock();
            try {
                return mins.peek();
            } finally {
                readLock.unlock();
            }
        }
    
        public E getMax() {
            readLock.lock();
            try {
                return maxs.peek();
            } finally {
                readLock.unlock();
            }
        }
    
        public boolean empty() {
            readLock.lock();
            try {
                return delegate.empty();
            } finally {
                readLock.unlock();
            }
        }
    
        public int search(E item) {
            readLock.lock();
            try {
                return delegate.search(item);
            } finally {
                readLock.unlock();
            }
        }
        
    }

    Here is the unit test of the above class written in Groovy
    import org.junit.Before
    import org.junit.Test
    
    class SmartStackTest {
    
        SmartStack stack
    
        @Before
        public void before() {
            stack = new SmartStack()
            stack.push(72)
            stack.push(12)
            stack.push(55)
            stack.push(78)
            stack.push(22)
            stack.push(34)
        }
    
        @Test
        public void canGetCurrentMinValue() {
            assert stack.min == 12
    
            5.times { stack.pop() }
    
            assert stack.min == 72
        }
    
        @Test
        public void canGetCurrentMaxValue() {
            assert stack.max == 78
    
            4.times { stack.pop() }
    
            assert stack.max == 72
        }
    
        @Test
        public void canPeek() {
            assert stack.peek() == 34
        }
    
        @Test
        public void isEmpty() {
            assert !stack.empty()
        }
    
        @Test
        public void canSearch() {
            // the 1-based position from the top of the stack where the object is located
            assert stack.search(34) == 1
            assert stack.search(22) == 2
            assert stack.search(78) == 3
            assert stack.search(55) == 4
            assert stack.search(12) == 5
            assert stack.search(72) == 6
            // not in the stack should return -1
            assert stack.search(11112) == -1
        }
        
    }
    

    Sorting objects with one-way dependency using Topological Sort with Directed Graphs

    Suppose there are a bunch of objects to be evaluated, each of which might depend on a number of other objects being evaluated first before it can be evaluated. The objects being depended on might depend on other objects. There is no cyclic dependency that object A depends on object B, and object B depends on object C, and object C depends on object A.

    To solve this problem, we want to sort these objects such that when traversing through these objects, they should be ready for evaluation and their dependencies, if applicable, should have been evaluated. The sorting algorithm and data structure that are applicable for this problem is Topological Sort with Directed Graphs.




    If the edges in a graph have a direction, the graph is called a directed graph, as shown above. The arrows in the figure show the direction of the edges. We also include two vertices here to represent isolated objects that don't depend on other objects.

    The arrows in the figure represent evaluation direction. For example, to evaluate E, we must first evaluate A and B. To evaluate H, we might want a sorted collection like this:

    ADBECFDGH or ADGBECFH

    There are many other possible ordering, depending on the approach you take to implement the topological sort.

    DirectedGraph.java
    import java.util.*;
    
    public class DirectedGraph<T> implements Iterable<T> {
    
        // key is a Node, value is a set of Nodes connected by outgoing edges from the key
        private final Map<T, Set<T>> graph = new HashMap<T, Set<T>>();
    
        public boolean addNode(T node) {
            if (graph.containsKey(node)) {
                return false;
            }
    
            graph.put(node, new HashSet<T>());
            return true;
        }
    
        public void addNodes(Collection<T> nodes) {
            for (T node : nodes) {
                addNode(node);
            }
        }
    
        public void addEdge(T src, T dest) {
            validateSourceAndDestinationNodes(src, dest);
    
            // Add the edge by adding the dest node into the outgoing edges
            graph.get(src).add(dest);
        }
    
        public void removeEdge(T src, T dest) {
            validateSourceAndDestinationNodes(src, dest);
    
            graph.get(src).remove(dest);
        }
    
        public boolean edgeExists(T src, T dest) {
            validateSourceAndDestinationNodes(src, dest);
    
            return graph.get(src).contains(dest);
        }
    
        public Set<T> edgesFrom(T node) {
            // Check that the node exists.
            Set<T> edges = graph.get(node);
            if (edges == null)
                throw new NoSuchElementException("Source node does not exist.");
    
            return Collections.unmodifiableSet(edges);
        }
    
        public Iterator<T> iterator() {
            return graph.keySet().iterator();
        }
    
        public int size() {
            return graph.size();
        }
    
        public boolean isEmpty() {
            return graph.isEmpty();
        }
    
        private void validateSourceAndDestinationNodes(T src, T dest) {
            // Confirm both endpoints exist
            if (!graph.containsKey(src) || !graph.containsKey(dest))
                throw new NoSuchElementException("Both nodes must be in the graph.");
        }
    
    }
    

    TolopogicalSort.java
    import java.util.ArrayList;
    import java.util.HashSet;
    import java.util.List;
    import java.util.Set;
    
    public class TopologicalSort {
    
        public static <T> List<T> sort(DirectedGraph<T> graph) {
            DirectedGraph<T> reversedGraph = reverseGraph(graph);
    
            List<T> result = new ArrayList<T>();
            Set<T> visited = new HashSet<T>();
    
            /* We'll also maintain a third set consisting of all nodes that have
             * been fully expanded.  If the graph contains a cycle, then we can
             * detect this by noting that a node has been explored but not fully
             * expanded.
             */
            Set<T> expanded = new HashSet<T>();
    
            // Fire off a Depth-First Search from each node in the graph
            for (T node : reversedGraph)
                explore(node, reversedGraph, result, visited, expanded);
    
            return result;
        }
    
    
        /**
         * Recursively performs a Depth-First Search from the specified node, marking all nodes
         * encountered by the search.
         *
         * @param node     The node to begin the search from.
         * @param graph    The graph in which to perform the search.
         * @param result   A list holding the topological sort of the graph.
         * @param visited  A set of nodes that have already been visited.
         * @param expanded A set of nodes that have been fully expanded.
         */
        private static <T> void explore(T node, DirectedGraph<T> graph, List<T> result, Set<T> visited, Set<T> expanded) {
            if (visited.contains(node)) {
                // if this node has already been expanded, then it's already been assigned a
                // position in the final topological sort and we don't need to explore it again.
                if (expanded.contains(node)) return;
    
                // if it hasn't been expanded, it means that we've just found a node that is currently being explored,
                // and therefore is part of a cycle.  In that case, we should report an error.
                throw new IllegalArgumentException("A cycle was detected within the Graph when exploring node " + node.toString());
            }
    
            visited.add(node);
    
            // recursively explore all predecessors of this node
            for (T predecessor : graph.edgesFrom(node))
                explore(predecessor, graph, result, visited, expanded);
    
            result.add(node);
            expanded.add(node);
        }
    
        private static <T> DirectedGraph<T> reverseGraph(DirectedGraph<T> graph) {
            DirectedGraph<T> result = new DirectedGraph<T>();
    
            // Add all the nodes from the original graph
            for (T node : graph) {
                result.addNode(node);
            }
    
            // Scan over all the edges in the graph, adding their reverse to the reverse graph.
            for (T node : graph) {
                for (T endpoint : graph.edgesFrom(node)) {
                    result.addEdge(endpoint, node);
                }
            }
    
            return result;
        }
    
    }
    

    Below demonstrate the setup of the directed graph and will print out the sorted result.
        public static void main(String[] args) {
            DirectedGraph graph = new DirectedGraph()
            graph.addNode("A");
            graph.addNode("B");
            graph.addNode("C");
            graph.addNode("D");
            graph.addNode("E");
            graph.addNode("F");
            graph.addNode("G");
            graph.addNode("H");
            graph.addNode("I");
            graph.addNode("J");
    
            graph.addEdge("A", "D");
            graph.addEdge("A", "E");
            graph.addEdge("B", "E");
            graph.addEdge("C", "F");
            graph.addEdge("D", "G");
            graph.addEdge("E", "H");
            graph.addEdge("F", "H");
            graph.addEdge("G", "H");
    
            List sorted = TopologicalSort.sort(graph)
    
            System.out.println(sorted)
        }
    

    Merge Sort in Java

    The merge sort is a much more efficient sorting algorithm than those simple ones, for example, bubble sort, selection sort, and insertion sort. While the simple sorting algorithms have O(N2) complexity, the merge sort is O(N*logN). 

    The merge sort routine tries to break down an array into 2 sub-arrays, sort these sub-arrays, and merge them back, in a recursive way, as you shall see in the implementation below, as well as the graph below.




     
    import java.util.Arrays;
    import java.util.Random;
    
    public class MergeSortTest {
    
        static int numberOfElements = 10;
        static int maxValue = 100;
    
        public static void main(String[] args) {
            int[] original = createArrayWithRandomValues();
            // workspace array is used to keep the partial sorted results
            int[] workspace = new int[original.length];
            mergeSort(original, workspace, 0, numberOfElements - 1);
    
            System.out.println("The sorted array is " + Arrays.toString(original));
        }
    
        private static int[] createArrayWithRandomValues() {
            Random random = new Random();
            int[] original = new int[numberOfElements];
            for (int i = 0; i < numberOfElements; i++) {
                original[i] = random.nextInt(maxValue);
            }
            System.out.println("The original array is " + Arrays.toString(original));
            return original;
        }
    
        private static void mergeSort(int[] original, int[] workSpace, int lowerBound, int upperBound) {
            if (lowerBound != upperBound) {
                // find midpoint
                int mid = (lowerBound + upperBound) / 2;
                // sort lower half
                mergeSort(original, workSpace, lowerBound, mid);
                // sort upper half
                mergeSort(original, workSpace, mid + 1, upperBound);
                // merge them
                merge(original, workSpace, lowerBound, mid, upperBound);
            }
        }
    
        private static void merge(int[] original, int[] workSpace, int lowerBound, int middle, int upperBound) {
            // Copy both parts into the workspace array
            System.arraycopy(original, lowerBound, workSpace, lowerBound, upperBound + 1 - lowerBound);
    
            int leftPointer = lowerBound;
            int rightPointer = middle + 1;
            int originalPointer = lowerBound;
    
            // Copy the smallest values from either the left or the right side back to the original array
            while (leftPointer <= middle && rightPointer <= upperBound) {
                if (workSpace[leftPointer] <= workSpace[rightPointer]) {
                    // copy and move the pointer on workspace to the next smallest value on left (if there is any)
                    original[originalPointer++] = workSpace[leftPointer++];
                } else {
                    // copy and move the pointer on workspace to the next smallest value on right (if there is any)
                    original[originalPointer++] = workSpace[rightPointer++];
                }
            }
            // Copy the rest of the left side of the array into the target array
            while (leftPointer <= middle) {
                original[originalPointer++] = workSpace[leftPointer++];
            }
    
            // if there are elements left in the right side of the workspace, those
            // are the largest values of the merged array, so no action is required
        }
        
    }
    

    Monday, 18 March 2013

    Insertion Sort in Java

    Insertion sort is about twice as fast as the bubble sort, and faster than selection sort.

    In an insertion sort routine, we will pick a marker (starting from index=1), store the marked number, and compare the marked number with its neighbours on the left. Shift all the neighbours, which are larger than the marked number, one position to the right. After the shift, insert the marked number to the empty slot. Repeat this procedure until the marker index points to the last number on the right.

    Below demonstrates one of the implementations of the insertion sort algorithm.

    import java.util.Arrays;
    import java.util.Random;
    
    public class InsertionSortTest {
    
        static int numberOfElements = 10;
        static int maxValue = 100;
    
        public static void main(String[] args) {
            int[] original = createArrayWithRandomValues();
    
            insertionSort(original);
    
            System.out.println("The sorted array is " + Arrays.toString(original));
        }
    
        private static int[] createArrayWithRandomValues() {
            Random random = new Random();
            int[] original = new int[numberOfElements];
            for (int i = 0; i < numberOfElements; i++) {
                original[i] = random.nextInt(maxValue);
            }
            System.out.println("The original array is " + Arrays.toString(original));
            return original;
        }
    
        private static void insertionSort(int[] original) {
            int markerIndex, emptySlotIndex;
            for (markerIndex = 1; markerIndex < original.length; markerIndex++) {
                int marked = original[markerIndex]; // use a temp variable to store the marked number
                emptySlotIndex = markerIndex;
                while (emptySlotIndex > 0 && original[emptySlotIndex - 1] >= marked) {
                    original[emptySlotIndex] = original[emptySlotIndex - 1]; // shift element right
                    emptySlotIndex--; // moves the empty slot index one position to the left
                }
                original[emptySlotIndex] = marked; // insert marked element
            }
        }
        
    }
    

    Selection Sort in Java

    Selection sort is slightly faster than bubble sort because the number of required swaps is reduced from O(N2) to O(N), while the number of comparisons remains O(N2).

    Below demonstrates one of the implementations of the selection sort algorithm.
     
    import java.util.Arrays;
    import java.util.Random;
    
    public class SelectionSortTest {
    
        static int numberOfElements = 10;
        static int maxValue = 100;
    
        public static void main(String[] args) {
            int[] original = createArrayWithRandomValues();
    
            selectionSort(original);
    
            System.out.println("The sorted array is " + Arrays.toString(original));
        }
    
        private static int[] createArrayWithRandomValues() {
            Random random = new Random();
            int[] original = new int[numberOfElements];
            for (int start = 1; start <= numberOfElements; start++) {
                original[start - 1] = random.nextInt(maxValue);
            }
            System.out.println("The original array is " + Arrays.toString(original));
            return original;
        }
    
    
        private static void selectionSort(int[] original) {
            int outer, inner, minIndex;
            for (outer = 0; outer < original.length - 1; outer++) { // from 1st element to next-to-last element
                minIndex = outer; // each round the min index starts from the outer index
                for (inner = outer + 1; inner < original.length; inner++) { // from outer + 1 element to the last element
                    if (original[inner] < original[minIndex]) { // remember the index for the new minimum
                        minIndex = inner;
                    }
                }
    
                // swap the elements between outer index and min index
                int temp = original[outer];
                original[outer] = original[minIndex];
                original[minIndex] = temp;
    
            }
        }
        
    }
    

    Bubble Sort in Java

    Given an array of n (n>1) numbers, we want to sort them such that the smallest number is on the left, and the largest number is on the right. This is where all the sorting fun comes from. 

    Bubble sort is one of the slowest of all sorting algorithms, but it is conceptually the simplest. 

    In bubble sort routine, we start by comparing two numbers in the array, say the 1st (array[0]) and 2nd (array[1]) numbers. If the 1st number is larger than the 2nd number, we swap them, otherwise don't do anything. Then we move over one position and compare the 2nd and 3rd numbers, and so on, until we've completed the comparison of the second last and the last number. At this point, the largest number has been bubbled up all the way to the right (array[n-1]). On the 2nd round, we only need to do the comparison and swap from array[0] to array[n-2], which will ensure array[n-2] has the 2nd largest value in the array. Continue like this until the (n-1)th round.

    The algorithm complexity, when expressed in Big O notation, is O(N2). This is because, given an array of 10 numbers, you will need 9 comparisons on the first round to bubble the largest number to the right, and 8 comparisons on the second round, and so on, while the last round requires 1 comparison. That is

    9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 45, or n*(n-1)/2, 

    where n is the number of elements in the array. On the other hand, a swap is only required when the number on the left is larger than the one on the right, although in the worst case scenario, when the initial array is sorted such that the largest number is on the left and the smallest number is on the right, a swap is necessary for every comparison. Therefore, the number of swaps and comparisons are proportional to n2.This is how we work out bubble sort's complexity O(N2).

    Below demonstrates one of the implementations of the bubble sort algorithm.

    import java.util.Arrays;
    import java.util.Random;
    
    public class BubbleSortTest {
    
        static int numberOfElements = 10;
        static int maxValue = 100;
    
        public static void main(String[] args) {
            int[] original = createArrayWithRandomValues();
    
            bubbleSort(original);
    
            System.out.println("The sorted array is " + Arrays.toString(original));
        }
    
        private static int[] createArrayWithRandomValues() {
            Random random = new Random();
            int[] original = new int[numberOfElements];
            for (int start = 1; start <= numberOfElements; start++) {
                original[start - 1] = random.nextInt(maxValue);
            }
            System.out.println("The original array is " + Arrays.toString(original));
            return original;
        }
    
        private static void bubbleSort(int[] original) {
            int outer, inner;
            for (outer = original.length - 1; outer > 1; outer--) { // from last element to the 2nd element
                for (inner = 0; inner < outer; inner++) { // from the 1st element to outer index - 1
                    if (original[inner] > original[inner + 1]) { // bubble up the bigger value to the right
                        int temp = original[inner];
                        original[inner] = original[inner + 1];
                        original[inner + 1] = temp;
                    }
                }
            }
        }
    
    }