Executor that maintains execution order by group

I need a class that helps me to run set of tasks in the same order of submission (like serialized execution). Yet I don’t want to use a single-thread executor as I have a large collection of different set of tasks that tasks from different set can be run in parallel. If I use one single-thread executor, it loses efficiency that tasks from different sets no longer be run in parallel. If I use one single-thread executor per task set, then I’ll end up with lots of threads, which is not ideal.

Out of necessity, I created the following class, OrderedExecutor.

import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
import java.util.Queue;
import java.util.concurrent.Executor;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

/**
 * An executor that make sure tasks submitted with the same key
 * will be executed in the same order as task submission
 * (order of calling the {@link #submit(Object, Runnable)} method).
 *
 * Tasks submitted will be run in the given {@link Executor}.
 * There is no restriction on how many threads in the given {@link Executor}
 * needs to have (it can be single thread executor as well as a cached thread pool).
 *
 * If there are more than one thread in the given {@link Executor}, tasks
 * submitted with different keys may be executed in parallel, but never
 * for tasks submitted with the same key.
 * 
 * * @param <K> type of keys.
 */
public class OrderedExecutor<K> {

    private final Executor executor;
    private final Map<K, Task> tasks;

    /**
     * Constructs a {@code OrderedExecutor}.
     *
     * @param executor tasks will be run in this executor.
     */
    public OrderedExecutor(Executor executor) {
        this.executor = executor;
        this.tasks = new HashMap<K, Task>();
    }

    /**
     * Adds a new task to run for the given key.
     *
     * @param key the key for applying tasks ordering.
     * @param runnable the task to run.
     */
    public synchronized void submit(K key, Runnable runnable) {
        Task task = tasks.get(key);
        if (task == null) {
            task = new Task();
            tasks.put(key, task);
        }
        task.add(runnable);
    }

    /**
     * Private inner class for running tasks for each key.
     * Each key submitted will have one instance of this class.
     */
    private class Task implements Runnable {

        private final Lock lock;
        private final Queue<Runnable> queue;

        Task() {
            this.lock = new ReentrantLock();
            this.queue = new LinkedList<Runnable>();
        }

        public void add(Runnable runnable) {
            boolean runTask;
            lock.lock();
            try {
                // Run only if no job is running.
                runTask = queue.isEmpty();
                queue.offer(runnable);
            } finally {
                lock.unlock();
            }
            if (runTask) {
                executor.execute(this);
            }
        }

        @Override
        public void run() {
            // Pick a task to run.
            Runnable runnable;
            lock.lock();
            try {
                runnable = queue.peek();
            } finally {
                lock.unlock();
            }
            try {
                runnable.run();
            } catch (Exception ex) {
                ex.printStackTrace();
            }
            // Check to see if there are queued task, if yes, submit for execution.
            lock.lock();
            try {
                queue.poll();
                if (!queue.isEmpty()) {
                    executor.execute(this);
                }
            } finally {
                lock.unlock();
            }
        }
    }
}

Please feel free to use it if you find yourself in the same situation as me.

Efficient data structure for finding median

Just come up with a relatively simple implementation for a data structure that can report median value in constant time, and insertion/remove median in O(log(n)) time. It uses a max heap and min heap to store elements that are smaller and larger then the median respectively, and maintains the property that the differences between the two heaps are always less than or equal to 1.

Feel free to use/enhance/comment it.

import java.util.Collections;
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Queue;

/**
 * Data structure for storing collection of elements which can report
 * median in constant time, while {@linkplain FastMedian#add(java.lang.Object) adding} 
 * new element and {@linkplain FastMedian#removeMedian() remove median} takes O(log(n)) time. 
 * If there are even number of elements, it always
 * uses the smaller of the two median numbers as the median value.
 */
public class FastMedian<T> {
    private final Comparator<? super T> comparator;
    private final Queue<T> maxHeap;
    private final Queue<T> minHeap;
    private T median;

    /**
     * Creates a {@code FastMedian} which order elements according to their
     * {@linkplain Comparable natural ordering}.
     */
    public FastMedian() {
        this(null);
    }

    /**
     * Creates a {@code FastMedian} which order elements according to the given
     * {@link Comparator}.
     * 
     * @param comparator the comparator for ordering elements. If {@code null},
     *        the {@linkplain Comparable natural ordering} will be used.
     */
    public FastMedian(Comparator<? super T> comparator) {
        this.comparator = comparator;
        if (comparator == null) {
            this.maxHeap = new PriorityQueue<T>(11, Collections.reverseOrder());
            this.minHeap = new PriorityQueue<T>();
        } else {
            this.maxHeap = new PriorityQueue<T>(11, Collections.reverseOrder(comparator));
            this.minHeap = new PriorityQueue<T>(11, comparator);
        }
    }

    /**
     * Inserts the given element.
     *
     * @param e the element to add.
     */
    public void add(T e) {
        if (median == null) {
            median = e;
            return;
        }
        int cmp = (comparator == null) ? (((Comparable<? super T>) e).compareTo(median)) : comparator.compare(e, median);
        if (cmp < 0) {
            maxHeap.add(e);
        } else {
            minHeap.add(e);
        }
        // Rebalance
        int sizeDiff = maxHeap.size() - minHeap.size();
        if (sizeDiff > 0) {
            minHeap.add(median);
            median = maxHeap.remove();
        } else if (sizeDiff < -1) {
            maxHeap.add(median);
            median = minHeap.remove();
        }
    }

    /**
     * Returns the current median value.
     *
     * @return current median value.
     */
    public T getMedian() {
        return median;
    }

    /**
     * Removes the current median and returns it.
     *
     * @return value of the median removed.
     */
    public T removeMedian() {
        T result = median;
        if (maxHeap.isEmpty() && minHeap.isEmpty()) {
            median = null;
            return result;
        }
        if (maxHeap.size() >= minHeap.size()) {
            median = maxHeap.remove();
        } else {
            median = minHeap.remove();
        }
        return result;
    }    
}

A very simple example on using it:


import java.util.TreeSet;

public class MedianTest {
    
    public void test() {
        TreeSet<Integer> numbers = new TreeSet<Integer>();
        FastMedian<Integer> fastMedian = new FastMedian<Integer>();
        
        for (int i = 1; i < 10; i += 2) {
            numbers.add(i);
            fastMedian.add(i);
            System.out.println(numbers + " median: " + fastMedian.getMedian());
        }
        
        for (int i = 10; i >= 2; i -= 2) {
            numbers.add(i);
            fastMedian.add(i);
            System.out.println(numbers + " median: " + fastMedian.getMedian());
        }
        
        System.out.println();

        Integer median = fastMedian.removeMedian();
        while (median != null) {
            numbers.remove(median);
            System.out.println(numbers + " removed: " + median + " new median: " + fastMedian.getMedian());
            median = fastMedian.removeMedian();
        }
    }
}

Building a Great Team (via Ship Software OnTime!)

Can’t help to reblog this old post.

Building a Great Team The centerpiece of any successful development project is the team that builds it. There is no other single most important contributing factor to building great products. No tools, no development methods, no amount of money and no amount of time can substitute for the importance of an exceptional team if you plan to create an exceptional product. Some in our industry operate under the assumption "with a good system and a fine-tuned set of processe … Read More

via Ship Software OnTime!

Checks for whitespaces only string

Someone tries to have a method in Java that checks if a string contains only whitespace characters, and ended up with an implementation like this:

boolean isWhitespaceOnly(String str) {
    // DON'T DO THIS, EXPLAINED BELOW
    return str.trim().equals("");
}

So, what’s wrong with it?

  1. The trim() method will create a new String instance (only exception is an empty string or the first and last characters are not whitespaces. Check Java API doc.).
  2. Always use isEmpty() rather than .equals("") to check if a string is empty, as it is always faster.

In fact we should just inspect characters in the string and return true when we only see whitespaces. It would be faster (as we can terminate whenever we see a non-whitespace) and no new instance being created.

boolean isWhitespaceOnly(String str) {
    int pos = str.length() - 1;
    while (pos >= 0 && str.charAt(pos) <= ' ') {   // checking of <= ' ' is the same contract as the trim() does
        pos--;
    }        
    return pos < 0;
}

Stupid programming II

I came across a segment of Java code like this during a code review:

String str = "" + 10;

Sometimes I really don’t know what’s in people head. Isn’t it a lot more simpler/clearer/better/faster to do it as:

String str = "10";

If it is up to me, we should banish people writing code like this to write single line of code again.

Javascript UTF-8 codec that supports supplementary code points

I am working on a project that involves encode/decode UTF-8 bytes from/to Javascript String. I’ve been searching through the web and there are couple of UTF-8 coder implementations. Unfortunately none of them handle supplementary code points correctly, meaning all of them would produce invalid UTF-8 bytes/JS strings when it encounter characters belong to the supplementary code points.

In Javascript, characters are encoded using UCS-2, which is a subset of UTF-16, and is encoded the same way as UTF-16 for characters in Basic Multilingual Plane (BMP). In UCS-2, however, doesn’t have surrogate pair supports, which is how UTF-16 using two 16bits to represent one character in the supplementary code points. We can emulate the surrogate pair in Javascript by outputting two characters, correspond to the high and low surrogates. Most browsers would be able to print out the string correctly, having a side effect that if you do a String.length, it will return 2 instead of 1 for a surrogate pair.

I come up with the following code:

// Namespace UTF8
var UTF8 = (function() {
    return {
        // Encodes UCS2 into UTF8
        // Returns an array of numbers (bytes)
        encode : function(str) {
            var len = str.length;
            var result = [];
            var code;
            var i;
            for (i = 0; i < len; i++) {
                code = str.charCodeAt(i);
                if (code <= 0x7f) {
                    result.push(code);
                } else if (code <= 0x7ff) {                         // 2 bytes                     
                    result.push(0xc0 | (code >>> 6 & 0x1f),
                                0x80 | (code & 0x3f));
                } else if (code <= 0xd700 || code >= 0xe000) {      // 3 bytes
                    result.push(0xe0 | (code >>> 12 & 0x0f),
                                0x80 | (code >>> 6 & 0x3f),
                                0x80 | (code & 0x3f));
                } else {                                            // 4 bytes, surrogate pair
                    code = (((code - 0xd800) << 10) | (str.charCodeAt(++i) - 0xdc00)) + 0x10000;
                    result.push(0xf0 | (code >>> 18 & 0x07),
                                0x80 | (code >>> 12 & 0x3f),
                                0x80 | (code >>> 6 & 0x3f),
                                0x80 | (code & 0x3f));
                }
            }
            return result;
        },

        // Decodes UTF8 into UCS2
        // Returns a string
        decode : function(bytes) {
            var len = bytes.length;
            var result = "";
            var code;
            var i;
            for (i = 0; i < len; i++) {
                if (bytes[i] <= 0x7f) {                     
                    result += String.fromCharCode(bytes[i]);
                } else if (bytes[i] >= 0xc0) {                                   // Mutlibytes
                    if (bytes[i] < 0xe0) {                                       // 2 bytes
                        code = ((bytes[i++] & 0x1f) << 6) |
                                (bytes[i] & 0x3f);
                    } else if (bytes[i] < 0xf0) {                                // 3 bytes
                        code = ((bytes[i++] & 0x0f) << 12) |
                               ((bytes[i++] & 0x3f) << 6)  |
                                (bytes[i] & 0x3f);
                    } else {                                                     // 4 bytes
                        // turned into two characters in JS as surrogate pair
                        code = (((bytes[i++] & 0x07) << 18) |
                                ((bytes[i++] & 0x3f) << 12) |
                                ((bytes[i++] & 0x3f) << 6) |                                  
                                 (bytes[i] & 0x3f)) - 0x10000;
                        // High surrogate
                        result += String.fromCharCode(((code & 0xffc00) >>> 10) + 0xd800);
                        // Low surrogate
                        code = (code & 0x3ff) + 0xdc00;
                    }
                    result += String.fromCharCode(code);
                } // Otherwise it's an invalid UTF-8, skipped.
            }
            return result;
        }
    };
}());

Feel free to use it if you find it useful.

References:

All purpose binary search in PHP

A colleague asked about why there is no binary search method in PHP, but only the slow linear time array_search. I don’t have answer though, but someone suggest that you can do fast searching in a hash by flipping the array into a hash:

<?php
    // DON'T USE THIS FUNCTION, IT IS SLOW
    function flip_search($list, $key) { 
        $res = array_flip($list);
        if (array_key_exists($key, $res)) {
            return $res[$key];
        }       
        return FALSE;
    }
?>

What’s the run time of this function? Linear, as there is no magic in doing the flipping, but has to go through the array entry one by one, and not to mention it is memory inefficient. So if you get a sorted array, use binary search; if your array is not sorted, use the build-in array_search

I set out to write a general binary search for PHP.

<?php
    /*
     * Parameters: 
     *   $a - The sort array.
     *   $first - First index of the array to be searched (inclusive).
     *   $last - Last index of the array to be searched (exclusive).
     *   $key - The key to be searched for.
     *   $compare - A user defined function for comparison. Same definition as the one in usort
     *
     * Return:
     *   index of the search key if found, otherwise return (-insert_index - 1). 
     *   insert_index is the index of smallest element that is greater than $key or sizeof($a) if $key
     *   is larger than all elements in the array.
     */
    function binary_search(array $a, $first, $last, $key, $compare) {
        $lo = $first; 
        $hi = $last - 1;

        while ($lo <= $hi) {
            $mid = (int)(($hi - $lo) / 2) + $lo;
            $cmp = call_user_func($compare, $a[$mid], $key);

            if ($cmp < 0) {
                $lo = $mid + 1;
            } elseif ($cmp > 0) {
                $hi = $mid - 1;
            } else {
                return $mid;
            }
        }
        return -($lo + 1);
    }
?>

To use it is pretty straight forward:

<?php
    function cmp($a, $b) {
        return ($a < $b) ? -1 : (($a > $b) ? 1 : 0);
    }

    $a = array(1,2,3,4,5,6);
    $idx = binary_search($a, 0, sizeof($a), 4, 'cmp');
?>

Feel free to use the code if you need it.