Improved Merge Sort

So the first comment that I received for the Merge Sort post pointed out that my implementation was not very efficient. While the code correctly implements the algorithm, practical choices (such as choice of data structure) make the code very inefficient – the code I show will definitely not execute in O(nlogn) time. I took another swing at the Lisp code, paying closer attention to performance issues.

I implemented the original code using lists which - as it’s explained in any text about Lisp - is a singly linked list implementation. The implication is that indexing a specific element in a Lisp list is a linear-time operation in the index since you need to walk the links in your list to get to the element. The same is true for finding the length of the list as the only way to infer this property is to walk until you hit the empty list cdr.

In addition, any operation on the list which references specific indexes (which presumably includes the function subseq that I used repeatedly) incurs this ugly scanning penalty as well.

When you’re not in need of linked list functionality and are in need of fast element indexing, a list is a bad choice of data structure. So, one important change was to re-implement the functions using vector.

Another bit of waste in the code was the creation of new sequences as part of the recursive splitting; at each split, new lists, containing half the previous round’s elements (implying memory allocation and element copying), were created. We really only need to allocate new lists when we get to the leaves of the recursion. I re-implemented the algorithm and replaced splitting the sequence into sub sequences on each recursion with just manipulating the split indexes.

While I defended the original implementation on the grounds of readability, the resulting implementation is nearly as readable. I could probably sacrifice a sliver of performance and factor out some of the less “prosaic” code to make this improved version just as readable as the original.

So here is an updated version. I’m not “cheating” here – I didn’t go dredging around for a better example on Google — so if you Lisp aficionados have useful feedback then I’d like your help. By the same token, if you’re searching for the most efficient (time, space, loc) way of implementing merge sort, I make no claim that this is your best solution!

 

Improved (hopefully…) Merge Sort

;; To sort a vector, call the routine
;; (merge-sort vec 0 (- (length vec) 1))

(defun merge-sort (the-vector start end)
  (let ((vlen (+ (- end start) 1))
	(split-idx 0))
    (cond ((> vlen 2)
	   (setf split-idx (+ start (floor vlen 2)))
	   (merge 'vector
		  (merge-sort the-vector start split-idx)
		  (merge-sort the-vector (+ split-idx 1) end)
		  #'<))
	  (t (do-simple-sort the-vector start end)))))

(defun do-simple-sort (the-vector start end)
  (let ((start-val (svref the-vector start))
	(end-val (svref the-vector end)))
    (cond((= start end) (vector start-val))
	 ((> start-val end-val) (vector end-val start-val))
	 (t (vector start-val end-val))))))

1 Comment so far

  1. culbert on December 15th, 2008

    Ok. I had to peek; I googled. My implementation is efficient. I could simplify by letting the routine “recurse all the way” to single element vectors; I can let the merge routine do the simple sort so I don’t need that code. The result would be:

    (defun merge-sort-1 (the-vector start end)
      (let ((vlen (+  (- end start) 1))
    	(split-idx 0))
        (cond ((> vlen 1)
    	   (setf split-idx (+  start (ceiling vlen 2)))
    	   (merge 'vector
    		  (merge-sort-1 the-vector start (- split-idx 1))
    		  (merge-sort-1 the-vector split-idx end)
    		  #'< ))
    	  (t (vector (svref the-vector start))))))
    

Leave a reply