Saturday, 22 December 2012

Sorting Algorithms: Shell Sort


Shell Sort  |  Running time O(n2)  and Memory O(1)
The idea behind shell sort can be understood as follows:
Start by comparing elements far apart, then elements less far apart,
and finally compare adjacent elements.  

For example, if we have to sort the list below in descending order:



We start with sorting elements that are at a large gap from each other.  Suppose we choose this gap to be 5.




So, we sort the elements at a gap of 5, as shown below. The sorting technique used is insertion sortAs you can see, the list has been arranged in the form of columns. This makes manual sorting easier, but in our algorithm we do not form columns. 




After applying insertion sort on all columns, we get the partially sorted list below:



Note that in this list, most of the large elements are at the left end, and most of the
smaller elements at the right end. Thus, our first step introduced some degree of sortedness into the list.

In the next step, we reduce this gap size to, let’s say, 3 ..



And then we sort each column as shown below (using insertion sort):







And we get the list below (Note that it has a higher degree of sortedness than the list in the previous step).



In this way we keep on reducing the gap after each pass, and sort the list accordingly.  
At the end of each pass, the degree of sortedness of the list increases. Note that when the gap size finally goes down to 1, it means that a normal insertion sort will be performed on the whole list. However, since by now the list is almost sorted due to the previous stages, insertion sort works very fast and the list is completely sorted in a short time.

If we have to sort an array of n elements,
- We select a suitably large gap, and compare all the elements present at that gap  
- Then we decrease the gap size, and compare the elements at that gap
- We keep on decreasing the gap till the gap size becomes 1, at the end of which the array is fully sorted.

Note that we do not actually arrange the list in a 2-dimensional array in our algorithm. It is used just to make it manual sorting easier.

One question is: how do we decide the sequence of gaps? In the above example, we chose a gap sequence of  5, 3, 1 (sequence of odd numbers) but there can be any other sequence. The author of shell sort, Donald L. Shell, proposed the sequence: N/2, N/4,. . . , 1. However, there is no fixed agreement on the sequence to be used.
In the following algorithm, we will use N/2k  (original shell sequence) for the sorting.

Algorithm 
//shell sort in descending order i.e. largest to smallest
for ( gap=n/2; gap>0; gap=gap/2 ) //for each gap size
{   
     // Do an insertion sort for all elements present at that gap
     for ( i=gap; i<n; i++ )
     {
          int temp = a[i];

          /* Shift to right all elements smaller than temp */
          for ( j=i; a[j-gap]<temp && j>=gap; j=j-gap )
          {    a[j] = a[j-gap];
          }
         /* insert temp at the empty hole */ 
         a[j] = temp;
     }   
     
     /*   Make gap = gap/2 before starting next iteration  */


The complexity of  shell sort depends upon the gap sequence used:
1.     If the gap sequence follows Shell’s increment ie. N/2k , then worst case complexity is O(n2)
2.     If the gap sequence follows Hibbard’s increment ie. 1, 3, 5, 7, . . . , 2k -1, then worst case complexity is O(n1.5)
Other sequences will give a slightly different complexity.

However, calculating the average case complexity is not possible as it varies greatly from list to list.


How is shell sort an improvement over insertion sort?
As we have already discussed, insertion sort is very poor in sorting a list that is huge and highly unsorted.
However, a shell sort exploits the two properties of insertion sort.
Property 1:      Insertion sort performs best for a small number of elements.
Property 2:      Insertion sort performs best if the list is almost sorted.

Initially, when the list is unsorted, shell sort chooses a large gap size, so that the columns formed contain a small number of elements each. This makes sorting each column a trivial job (Property 1). So even though the elements in each column are highly unsorted, their small number means that the sorting time will be small.

Also, choosing a large gap size means that each column contains elements that are far away from each other. This means that sorting of a column will sort the whole list up to some degree (and not a small part of the list only). This, in turn, means that the whole list will be much more sorted before the next round of insertion sorts begins (which is good, due to Property 2).

As the gap size is reduced, the number of elements in each column increases. However, this is compensated by the fact that the elements in each column are more sorted than before, which reduces sorting time.

Thus, the gap size keeps on reducing and the list keeps on getting more and more sorted. When, finally, the gap size reaches 1, the list has almost been sorted. Shell sort with gap size 1 is same as a normal insertion sort on the whole list. Only, in our case, the previous stages have made the list almost sorted, which means this time the insertion sort will take a very small amount of time.

The total time taken with shell sort is, thus, less than or equal to the time taken with a normal insertion sort. If Hibbard’s gap sequence ( 1,3,5,7…) is used, then worst case complexity of shell sort becomes O(n1.5) which is less than O(n2) of insertion sort.

In this way, shell sort is, most of the time, better than insertion sort. 



Sorting Algorithms: Insertion Sort


Insertion Sort  |  Running time O(n2)  and Memory O(1)
It works by taking elements from the list one by one and inserting them in their correct position in the sorted part of the list.

If we have to sort an array of n elements,
- This algorithm makes n passes of the array
- In every pass, it picks up the first element from the unsorted part of array and inserts it in its correct position in the sorted part of array.
- The insertion is done as shown below:




Here the element 4 has to be inserted into its correct position in the sorted (black) partof the array. 
First of all, 4 is picked up from the array and stored in a temporary variable, so that a hole is created.



Now, all the numbers that are smaller than 4 are shifted to right. This is done through a loop.



Then finally 4 is inserted into the empty hole.






Algorithm
//insertion sort in descending order i.e. largest to smallest
for ( i=0; i< n; i++ )      //i controls the number of passes
{    int temp = a[i];

/* Shift to right all elements smaller than temp */
for( j=i; a[j]<temp && j>=0; j-- )
     {    a[j] = a[j-1];
     }

     /* insert temp at the empty hole */
     a[j] = temp; 


Example: We will sort an array in descending order using insertion sort.























































Insertion sort is also a time taking technique as its complexity is O(n2).

Note that if the list is almost sorted, then less number of shifts and swaps will be needed, as most of the elements will already be in or close to their correct places. But if the list is fully unsorted, then a large number of shifts and swaps will be needed. This means insertion sort is a good sorting technique to be used when the list is almost sorted.
Also, like other techniques having O(n2) complexity, insertion sort is inefficient if the list is very large (say millions of elements).



Sorting Algorithms: Selection Sort


Selection Sort  |  Running time O(n2)  and Memory O(1)
This algorithm works as follows: find the smallest (or largest) element using a linear scan and move it to the front. Then find the next smallest (or next largest) element, and move it, and so on.    
If we have to sort an array of n elements,
- This algorithm makes n-1 passes of the array
- In every pass, it finds (selects) the smallest (or largest) element in the unsorted part of array, and moves it to the front.

Algorithm
//selection sort in descending order i.e. largest to smallest
for ( i=0; i< n-1; i++ )     //i controls the number of passes
{    int index=i;

/* Find the largest element */
for( j=i+1; j<n ; j++ )
     {    if( a[j] > a[index])   //use < to change sort order
          {    index = j;
          }
     }
    
     /* Move this element to the front (swapping) */
     temp      = a[i];
     a[i]      = a[index];
     a[index] = temp;


Example: We will sort an array in descending order using selection sort.

Selection sort is the easiest sorting technique. Also, since each pass requires a maximum of one swap, there will be a maximum of n swaps during the sorting. Therefore, it is a better technique than bubble sort. However, a time complexity of O(n2) makes it an inefficient technique for large lists.

Friday, 21 December 2012

Sorting Algorithms: Bubble Sort


Bubble Sort  |  Running time O(n2)  and Memory O(1)

This algorithm compares two adjacent elements of an array, and swaps them if they are in the wrong order.     
If we have to sort an array of n elements,
- This algorithm makes n-1 passes of the array
- In every pass, it compares each pair of adjacent elements in the unsorted part of the array, and swaps them if they are in the wrong order.
- Thus, each pass "bubbles out" the smallest (or largest) element from the unsorted part of the array.

Algorithm
//bubble sorting in descending order i.e. largest to smallest
for ( i=0; i< n-1; i++ )      //i controls the number of passes
{    
   for( j=0; j< n-i; j++ )
   {    
      if( a[j] < a[j+1])      //using > will change sorting order
      {       //swap
              temp   = a[j];
              a[j]   = a[j+1];
              a[j+1] = temp;
      }
   }
 

Example:
          Array to be sorted  :         1 7 2 8 5 3 9               ( 7 elements)
          Sorting order         :          Descending           (largest to smallest)
                                       :         Sorted part of array


As we can easily see, bubble sort involves multiple swaps in each pass, which means it cannot be used when swapping is a costly action. Also, its complexity O(n2) means that the time taken for sorting increases as the square of the number of elements, meaning it can’t be used for large lists. So, it is not a popular sorting technique.

Hi !!


Hello folks! :) This is my first technical blog. I hope you will find it helpful. If you have any corrections or suggestions, you are most welcome to comment!

I will begin by explaining different types of sorting techniques that we have all learnt in high school.