Shell Sort | Running time O(n2) and Memory O(1)
The idea behind shell sort can be understood as
follows:
Start by comparing elements far
apart, then elements less far apart,
and finally compare adjacent
elements.
For example, if we have to sort the list below in
descending order:
We start with sorting elements that are at a large
gap from each other. Suppose we choose this gap to
be 5.
So, we sort
the elements at a gap of 5, as shown below. The sorting technique used is insertion sort. As you can see, the list has been arranged in the form of columns. This makes manual sorting easier, but in our algorithm we do not form columns.
After applying insertion sort on all columns, we get the partially sorted list below:
Note
that in this list, most of the large elements are at the left end, and most of
the
smaller elements at
the right end. Thus, our first step introduced some degree of sortedness into the list.
In the next step, we reduce this gap size to, let’s
say, 3 ..
And then we sort each column as shown below (using
insertion sort):
And we get the list below (Note that it has a higher degree of
sortedness than the list in the previous step).
In this way we keep on reducing the gap after each
pass, and sort the list accordingly.
At
the end of each pass, the degree of sortedness of the list increases. Note that
when the gap size finally goes down to 1, it means that a normal insertion sort
will be performed on the whole list. However, since by now the list is almost sorted due to the previous
stages, insertion sort works very fast and the list is completely sorted in a
short time.
If we have to sort an array of n elements,
- We select a suitably large gap, and compare all
the elements present at that gap
- Then we decrease the gap size, and compare the
elements at that gap
- We keep on decreasing the gap till the gap size
becomes 1, at the end of which the array is fully sorted.
Note that we do not actually arrange the list in a
2-dimensional array in our algorithm. It is used just to make it manual sorting
easier.
One question is: how
do we decide the sequence of gaps? In the above example, we chose a gap
sequence of 5, 3, 1 (sequence of odd numbers) but there can be any other
sequence. The author of shell sort, Donald
L. Shell, proposed the sequence: N/2,
N/4,. . . , 1. However, there is no fixed agreement on the sequence to be
used.
In the following algorithm, we will use N/2k (original shell sequence) for the sorting.
Algorithm
//shell sort in descending order i.e. largest to
smallest
for ( gap=n/2; gap>0; gap=gap/2 ) //for each gap size
{
// Do an
insertion sort for all elements present at that gap
for ( i=gap;
i<n; i++ )
{
int
temp = a[i];
/* Shift to right all
elements smaller than temp */
for (
j=i; a[j-gap]<temp && j>=gap; j=j-gap )
{ a[j] = a[j-gap];
}
a[j] = temp;
}
/* Make gap = gap/2 before starting next iteration */
}
The complexity of
shell sort depends upon the gap sequence used:
1.
If
the gap sequence follows Shell’s increment ie. N/2k , then worst
case complexity is O(n2)
2.
If
the gap sequence follows Hibbard’s increment ie. 1, 3, 5, 7, . . . , 2k
-1, then worst case complexity is O(n1.5)
Other sequences will give a slightly different
complexity.
However, calculating the average case complexity is
not possible as it varies greatly from list to list.
How is shell sort an improvement over
insertion sort?
As we have already discussed, insertion sort is very
poor in sorting a list that is huge and highly unsorted.
However, a shell sort exploits the two properties of
insertion sort.
Property 1: Insertion
sort performs best for a small number of elements.
Property 2: Insertion
sort performs best if the list is almost sorted.
Initially, when the list is unsorted, shell sort
chooses a large gap size, so that the columns formed contain a small number of elements each. This
makes sorting each column a trivial job (Property 1). So even though the
elements in each column are highly unsorted, their small number means that the
sorting time will be small.
Also, choosing a large gap size means that each
column contains elements that are far away from each other. This means that
sorting of a column will sort the whole
list up to some degree (and not a small part of the list only). This, in turn,
means that the whole list will be much more sorted before the next round of
insertion sorts begins (which is good, due to Property 2).
As the gap size is reduced, the number of elements
in each column increases. However, this is compensated by the fact that the
elements in each column are more sorted than before, which reduces sorting
time.
Thus, the gap size keeps on reducing and the list
keeps on getting more and more sorted. When, finally, the gap size reaches 1,
the list has almost been sorted.
Shell sort with gap size 1 is same as a normal insertion sort on the whole
list. Only, in our case, the previous stages have made the list almost sorted,
which means this time the insertion sort will take a very small amount of time.
The total time taken with shell sort is, thus, less
than or equal to the time taken with a normal insertion sort. If Hibbard’s gap
sequence ( 1,3,5,7…) is used, then worst case complexity of shell sort becomes O(n1.5) which is less than O(n2) of insertion sort.
In this way, shell sort is, most of the time, better
than insertion sort.
No comments:
Post a Comment