There
are lots of things to consider with sorting algorithms. Most people don't need
to know anything about them; others need to know a lot. They are an excellent
area to explore the questions of complexity and simplicity, there being so
many, some of which are subtle, others are devious, and many are clever.
Comparison
follows. "stable" means "algorithm can be used to implement a StableSort".
Worst case Average case Best case
Extra space Stable?
BubbleSort
O(n^2) O(n^2)? O(n) O(1) yes
SelectionSort
O(n^2) O(n^2) O(n^2) O(1) No (it can depend on sort order and
data arrangement)
InsertionSort
O(n^2) O(n^2) O(n) O(1) yes
BitonicSort
O(n log^2 n) O(n log^2 n)? ? O(1)? ?
ShellSort
O(n^2) O(n log n)? O(n) O(1) no
QuickSort
O(n^2) O(n log n) O(n log n) O(log n) no
HeapSort
O(n log n) O(n log n) O(n log n) O(1) no
SmoothSort
O(n log n) O(n log n)? O(n) O(1) no
MergeSort
O(n log n) O(n log n) O(n log n) O(n) yes
TimSort
O(n log n) O(n log n)? O(n) O(n) yes
CountingSort
O(n+k) O(n+k) O(n+k) O(n+k) yes
RadixSort
O(n+k) O(n+k) O(n+k) O(n+k) yes
BucketSort
O(n^2) O(n+k) ????? O(n*k) or O(n+k) ?
BogoSort
unbounded O(n!) O(n) O(1) no
SlowSort
O(n^(log n)) O(n^(log n)) O(n^(log n)) O(1)
yes
QuantumBogoSort
O(1) O(1) O(1) O(0) no
In
above table, k is the size of the bucket or size of the packet.
Why
are there so many of them? And why isn't the "best" one used
everywhere? We need to consider time-complexity and space-complexity, both
short and asymptotic - this comparison only shows asymptotic behaviour and
neglects constant factors. The "real" performance of your algorithm
depends on the system you are coding for, the amounts of data, the
characteristics of your data, and so on. We also need to consider ease of
coding, compactness of coding, predictability of time consumption and
intermediary state of data.
In
short, each of the algorithms has its limitations. HeapSort in-place,
for example, completely messes up the order of your data, should you interrupt
it before it is finished, whereas the three simpler algorithms iteratively
improve the sortedness. If we need stability, none of the unstable algorithms
can be used. Believe it or not, there are situations where BubbleSort is
not a bad choice.
Most
of the time the truly "optimal" method is a combination of two or
more of the above.
Since
a list of n elements has n! possible orderings,
and since each comparison might only eliminate half the remaining candidate
orderings, we know that in the worst case a sorting algorithm might need to
make log(n!) comparisons, where the logarithm is taken base 2. This
gives a lower bound for the number of comparisons that a serial algorithm can
take in the worst case, which is O(n log n). Checking
that an already sorted sequence is sorted takes O(n), which is therefore
the lower bound of the best case.
The RadixSort runs
in O(n), but only works in a few specialized applications. It avoids the above
O(n log n) worst case lower bound because it is not
based on comparing elements with each other.
Another
consideration is the one between StableSort and
UnstableSort. If we find an algorithm that suits our data perfectly, but isn't
stable, we may need to use a less optimal algorithm (if, of course, stability
is important to us for some reason).
Sweep
through your data swapping adjacent items if they're the wrong way around. A
factor of 2 improvement can be made by having successive sweeps get shorter.
Time-complexity is (n-1)*(n-2)/2 comparisons and we expect (on average) to make
half that many exchanges. If you don't swap equal items, it's stable.
Comes
in two basic flavors. One scans the data until it knows there can be no
problems remaining (n-1 sweeps, as above), the other sets a flag if items were
swapped and sweeps until the flag remains false. The latter stops early if the
data were almost sorted to start with but carries a small extra cost associated
with having and setting the flag.
BubbleSort is
easy to understand, easy to code, but inefficient. One hypothetical Really
Aggressive Optimizer automatically recognizes undergraduate implementations of
bubble sort and replaces them with something better.
Separating
the ideas of "finding the largest" and "moving the largest"
gives InsertionSort and SelectionSort,
the comparative speeds of which depend on which is more expensive, comparisons
or swaps. See those pages for more details.
BubbleSort is
never faster than InsertionSort, so long as the InsertionSort does not use
a binary search.
BubbleSort is faster
than InsertionSort on an array of 3 elements.
It
is important to remember that Bubble Sort is not a 100% useless algorithm - if
you have a sequence of objects that you would like to keep ordered that is
occasionally perturbed by having the value of one or two objects increase or
decrease, bubble sorting is a good thing. Consider Z-buffering a field of
mostly-immobile objects. Say one or two objects move closer or further away
from the observer - the sorting would be simply swapping a few adjacent slots,
which is what Bubble Sort is ideal for. Meanwhile, the Merge Sort or Quick Sort
algorithm would be inefficient. Quick sort is notoriously clumsy in cases where
the list is already nearly-sorted. Of course, an insertion-sort on the
perturbed objects is usually better.
Sample
code in CeeLanguage
void
bubble_sort (int a[], int n /* the size of
array */)
{
int i;
for (i = 0; i < n - 1; i++)
{
int j;
for (j = n - 1; j > i; j--)
{
if (a[j - 1] > a[j])
{
/* Swap a adjacent pair of
items. */
int t = a[j - 1];
a[j - 1] = a[j];
a[j] = t;
}
}
}
}
This
formulation obtains the advertised best-case performance of O(n).
Otherwise, best-case == worse-case == average-case == O(n^2).
void
bubble_sort (int a[], int n /* the portion of
the array that is unsorted */)
{
int i, sorted;
do {
sorted = 1;
--n;
for (i = 0; i < n; i++)
{
if (a[i] > a[i + 1])
{
/* Swap adjacent pair of items.
*/
int t = a[i];
a[i] = a[i + 1];
a[i + 1] = t;
sorted = 0;
}
}
} while (!sorted);
}
Sample
code in PythonLanguage:
def bubble(alist):
length = len(alist)
for i in xrange(length-1):
for j in xrange(length-i-1):
if alist[j] > alist[j+1]:
alist[j], alist[j+1] =
alist[j+1], alist[j]
def bubble_flag(alist):
length = len(alist)
for i in xrange(length-1):
swapped = False
for j in xrange(length-i-1):
if alist[j] > alist[j+1]:
alist[j], alist[j+1] =
alist[j+1], alist[j]
swapped = True
if not swapped: return
sort: aCollection
^aCollection size < self switchSize
ifTrue: [^self insertionSort: aCollection]
ifFalse: [^self quickSort: aCollection]
I've
always thought that Bubble Sort should be called Rock Sort because the in order
elements actually fall to the bottom not bubble to the top. Depends
purely on whether you run forwards or backwards.
8 7 7 3
7 8 3 5
9 3 5 7
3 5 8 8
5 9 9 9
Selection
Sort works by finding the smallest/largest item and then placing it at the
front/back. Runtime analysis:
Worst case Average case
Best case
Comparisons O(n^2) O(n^2) O(n)
Swaps O(n) O(n) O(1)
The
"best case" analysis assumes the "stop early" strategy is
used, checking that things are in order as you run along. If comparisons are
expensive and swaps cheap you'd be better using InsertionSort.
Example Code:
The
first version uses YAGNI - hence the selection is done in-line because the
abstraction isn't needed.
void insertionSort(int A[], int n)
{
int i,j,m;
while (n>1)
{
m = A[0]; j = 0;
for (i=1;i<n;i++)
if (m<A[i]) m = A[i], j = i;
swap(A,--n,j);
}
}
This
second version uses ShortMethods - "Factor out units of
semantic meaning, even if they are only ever used once. "
void insertionSort(int A[], int n)
{
while (n>1)
{
int j;
j = indexOfLargest(A, n);
swap(A,--n,j);
}
}
int indexOfLargest(int A[], int n)
{
int i, j, m;
m = A[0]; j = 0;
for (i=1;i<n;i++)
if (m<A[i])
m = A[i], j = i;
return j;
}
You
might think that this last algorithm is the equivalent of BubbleSort,
but it isn't. This only does n exchanges to get every item to
its correct place.
This
third version is in the StlStyle:
template<typename ForwardIterator>
void SelectionSort(ForwardIterator start, ForwardIterator stop)
{
for(; start != stop; ++start)
std::iter_swap(std::min_element(start,
stop), start);
}
Of
course, this isn't an optimal implementation (the last element is swapped with
itself).
Moved
here from SortingAlgorithms (RefactorMe):
SelectionSort: sweep your data to find the largest
element and swap that with the last element. Repeat on smaller and smaller
initial arrays. You perform an exchange almost every time because the largest
item in the original array is almost never at the end. Complexity is
(n-1)*(n-2)/2 comparisons and n exchanges. When coded in-line in C, possibly
the smallest code. Can easily be stable.
There
is a "stop early" strategy:
For each sweep:
set "sorted_so_far" to
"true".
As you sweep:
If the item you're looking at is more
than max_so_far:
update max_so_far and the index
pointing to it.
If the item you're looking at is less
than max_so_far:
set "sorted_so_far" to false.
At
the end of any sweep, if "sorted_so_far" is still true, then the
array is sorted and you can stop early.
This is can be the most economical sort to choose if you
perform each selection interleaved with other work, and that work may abort the
sort operation. For example, an AlphaBetaSearch? is
most efficient if most valuable edges ("moves") are traversed first. SelectionSort can
pick out the estimated most valuable edge to traverse, with a high probability
of an immediate beta cutoff. A cutoff would eliminate the need to sort any
further edges, thus saving the work that would have been required to do a full
sort of the edges.
Insertion Sort
The
Insertion Sort works by picking an element, figuring out where it should go,
then putting it there.
Using BinarySearch to
figure where the item should go gives the following runtime analysis:
Worst case Average case
Best case
Comparisons
O(n log n) O(n log n) O(n log n)
Swaps O(n^2) O(n^2) O(1)
If
you use LinearSearch to find the item location this
changes to:
Worst case Average case
Best case
Comparisons O(n^2) O(n^2) O(n)
Swaps O(n^2) O(n^2) O(1)
If
you expect your data to be mostly sorted the LinearSearch version
might be faster, but BubbleSort might work better anyway. Beware
of BinarySearch -
it is subtle and error-prone, even when using versions from the published
literature.
If
comparisons are cheap and swaps expensive you'd be better using SelectionSort.
InsertionSort, SelectionSort and MergeSort are
all StableSorts,
whereas the famous QuickSort is not.
Code
example in C
Note that this code is pretty severely broken. Correct
examples welcome.
void
insertion_sort (int a[], int n /* the size
of array */)
{
int i;
for (i = 1; i < n; i++)
{
/* Assume items before a[i] are
sorted. */
/* Pick an number */
int b = a[i];
/* Do binary search to find out the
point where b is to be inserted.
*/
int low = 0, high = i - 1, k;
while (high - low > 1)
{
int j = (high + low) / 2;
if (b <= a[j]) high = j;
else low = j;
}
/* Shift items between high and i
by 1 */
for (k = i; k > high; k--)
a[k] = a[k - 1];
a[high] = b;
}
}
Code
using upper_bound of STL
/* Algorithm from stl_algo.h in the SGI stl
library */
int
upper_bound (int a[], int low, int high, int
value)
{
int len = high - low;
while (len > 0)
{
int half = len / 2;
if (value < a[low + half])
len = half;
else
{
low = low + half + 1;
len = len - (half + 1);
}
}
return low;
}
void
insertion_sort (int a[], int n /* the size of
array */)
{
int i;
for (i = 1; i < n; i++)
{
/* Assume items before a[i] are sorted. */
/* Pick up a number */
int b = a[i];
int up, k;
up = upper_bound (a, 0, i, b);
for (k = i; k > up; k--)
a[k] = a[k - 1];
a[up] = b;
}
}
Couldn't
resist:
void
insertion_sort (int a[], int n /* the size of
array */)
{
int i;
for (i = 1; i < n; i++)
{
/* Assume items before a[i] are sorted. */
int high = findInsertionPoint(a, i);
int b = a[i]; /* OK, so this was added during refactoring ... */
shiftUp(a, high, i);
a[high] = b;
}
}
int findInsertionPoint (int a[], int i)
{
/* Pick a number */
int b = a[i];
/* Do binary search to find out the point
where b is inserted
at.
*/
int low = 0, high = i - 1;
while (high - low > 1)
{
int j = (high + low) / 2;
if (b <= a[j])
high = j;
else
low = j;
}
return high;
}
int shiftUp(int a[], int high, int i)
{
/* Shift items between high and i by 1 */
int k;
for (k = i; k > high; k--)
a[k] = a[k - 1];
}
Not
only is a binary search difficult to implement correctly, it doesn't gain you
anything in this case. Every element in the already-sorted set greater than the
element being inserted will have to be accessed in order to make room for the
element being inserted, so you may as well combine the movement of the data
with the comparison:
void insertion_sort(int a[], int elementCount)
{
int i;
for (i = 1; i < elementCount; i++) {
int insertee = a[i], j;
for (j = i - 1; j >= 0 &&
insertee > a[j]; j--)
a[j+1] = a[j];
a[j] = insertee;
}
}
Incidentally,
this approach is also stable, unlike the binary-search version. It's also about
half the size.
No comments:
Post a Comment