Saturday, 22 December 2012

Sorting Algorithms: Shell Sort


Shell Sort  |  Running time O(n2)  and Memory O(1)
The idea behind shell sort can be understood as follows:
Start by comparing elements far apart, then elements less far apart,
and finally compare adjacent elements.  

For example, if we have to sort the list below in descending order:



We start with sorting elements that are at a large gap from each other.  Suppose we choose this gap to be 5.




So, we sort the elements at a gap of 5, as shown below. The sorting technique used is insertion sortAs you can see, the list has been arranged in the form of columns. This makes manual sorting easier, but in our algorithm we do not form columns. 




After applying insertion sort on all columns, we get the partially sorted list below:



Note that in this list, most of the large elements are at the left end, and most of the
smaller elements at the right end. Thus, our first step introduced some degree of sortedness into the list.

In the next step, we reduce this gap size to, let’s say, 3 ..



And then we sort each column as shown below (using insertion sort):







And we get the list below (Note that it has a higher degree of sortedness than the list in the previous step).



In this way we keep on reducing the gap after each pass, and sort the list accordingly.  
At the end of each pass, the degree of sortedness of the list increases. Note that when the gap size finally goes down to 1, it means that a normal insertion sort will be performed on the whole list. However, since by now the list is almost sorted due to the previous stages, insertion sort works very fast and the list is completely sorted in a short time.

If we have to sort an array of n elements,
- We select a suitably large gap, and compare all the elements present at that gap  
- Then we decrease the gap size, and compare the elements at that gap
- We keep on decreasing the gap till the gap size becomes 1, at the end of which the array is fully sorted.

Note that we do not actually arrange the list in a 2-dimensional array in our algorithm. It is used just to make it manual sorting easier.

One question is: how do we decide the sequence of gaps? In the above example, we chose a gap sequence of  5, 3, 1 (sequence of odd numbers) but there can be any other sequence. The author of shell sort, Donald L. Shell, proposed the sequence: N/2, N/4,. . . , 1. However, there is no fixed agreement on the sequence to be used.
In the following algorithm, we will use N/2k  (original shell sequence) for the sorting.

Algorithm 
//shell sort in descending order i.e. largest to smallest
for ( gap=n/2; gap>0; gap=gap/2 ) //for each gap size
{   
     // Do an insertion sort for all elements present at that gap
     for ( i=gap; i<n; i++ )
     {
          int temp = a[i];

          /* Shift to right all elements smaller than temp */
          for ( j=i; a[j-gap]<temp && j>=gap; j=j-gap )
          {    a[j] = a[j-gap];
          }
         /* insert temp at the empty hole */ 
         a[j] = temp;
     }   
     
     /*   Make gap = gap/2 before starting next iteration  */


The complexity of  shell sort depends upon the gap sequence used:
1.     If the gap sequence follows Shell’s increment ie. N/2k , then worst case complexity is O(n2)
2.     If the gap sequence follows Hibbard’s increment ie. 1, 3, 5, 7, . . . , 2k -1, then worst case complexity is O(n1.5)
Other sequences will give a slightly different complexity.

However, calculating the average case complexity is not possible as it varies greatly from list to list.


How is shell sort an improvement over insertion sort?
As we have already discussed, insertion sort is very poor in sorting a list that is huge and highly unsorted.
However, a shell sort exploits the two properties of insertion sort.
Property 1:      Insertion sort performs best for a small number of elements.
Property 2:      Insertion sort performs best if the list is almost sorted.

Initially, when the list is unsorted, shell sort chooses a large gap size, so that the columns formed contain a small number of elements each. This makes sorting each column a trivial job (Property 1). So even though the elements in each column are highly unsorted, their small number means that the sorting time will be small.

Also, choosing a large gap size means that each column contains elements that are far away from each other. This means that sorting of a column will sort the whole list up to some degree (and not a small part of the list only). This, in turn, means that the whole list will be much more sorted before the next round of insertion sorts begins (which is good, due to Property 2).

As the gap size is reduced, the number of elements in each column increases. However, this is compensated by the fact that the elements in each column are more sorted than before, which reduces sorting time.

Thus, the gap size keeps on reducing and the list keeps on getting more and more sorted. When, finally, the gap size reaches 1, the list has almost been sorted. Shell sort with gap size 1 is same as a normal insertion sort on the whole list. Only, in our case, the previous stages have made the list almost sorted, which means this time the insertion sort will take a very small amount of time.

The total time taken with shell sort is, thus, less than or equal to the time taken with a normal insertion sort. If Hibbard’s gap sequence ( 1,3,5,7…) is used, then worst case complexity of shell sort becomes O(n1.5) which is less than O(n2) of insertion sort.

In this way, shell sort is, most of the time, better than insertion sort. 



No comments:

Post a Comment