Bucket Sort

Bucket Sort

https://github.com/explorer436/programming-playground/tree/main/java-playground/my-implementations/searching-and-sorting/src/main/java/com/my/company/sorting/algorithms

Bucket Sort is a non-comparison-based sorting algorithm that distributes elements into a number of “buckets,” sorts each bucket individually (often using another sorting algorithm like insertion sort), and then concatenates the sorted buckets to produce the final sorted array. It works best when the input is uniformly distributed over a range.

Pseudocode for Bucket Sort

BucketSort(arr, n):
    // Input: array arr of size n, elements in range [0, 1) or scaled to that range
    // Create buckets
    buckets = array of k empty lists (k is typically n or a constant)

    // Step 1: Distribute elements into buckets
    for i from 0 to n-1:
        bucket_index = floor(arr[i] * k)  // Map element to a bucket
        buckets[bucket_index].append(arr[i])

    // Step 2: Sort each bucket
    for each bucket in buckets:
        sort(bucket)  // Use a sorting algorithm like insertion sort

    // Step 3: Concatenate all buckets
    result = empty array
    for each bucket in buckets:
        result.append(bucket elements)

    return result

Example of Bucket Sort

Consider the array: [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]

Assume the elements are in the range [0, 1) and we use k = 10 buckets (one for each range [0.0-0.1), [0.1-0.2), ..., [0.9-1.0)).

Step-by-Step Process

  1. Create Buckets:

    • Initialize 10 empty buckets: buckets[0], buckets[1], ..., buckets[9].
  2. Distribute Elements:

    • For each element, compute bucket_index = floor(element * k).
    • Example:
      • 0.78: floor(0.78 * 10) = 7 → Add to buckets[7]
      • 0.17: floor(0.17 * 10) = 1 → Add to buckets[1]
      • 0.39: floor(0.39 * 10) = 3 → Add to buckets[3]
      • And so on…
    • After distribution:
      • buckets[0] = []
      • buckets[1] = [0.17, 0.12]
      • buckets[2] = [0.26, 0.21, 0.23]
      • buckets[3] = [0.39]
      • buckets[4] = []
      • buckets[5] = []
      • buckets[6] = [0.68]
      • buckets[7] = [0.78, 0.72]
      • buckets[8] = []
      • buckets[9] = [0.94]
  3. Sort Each Bucket:

    • Sort non-empty buckets (e.g., using insertion sort):
      • buckets[1] = [0.12, 0.17]
      • buckets[2] = [0.21, 0.23, 0.26]
      • buckets[3] = [0.39]
      • buckets[6] = [0.68]
      • buckets[7] = [0.72, 0.78]
      • buckets[9] = [0.94]
  4. Concatenate Buckets:

    • Combine all buckets in order:
      • [] + [0.12, 0.17] + [0.21, 0.23, 0.26] + [0.39] + [] + [] + [0.68] + [0.72, 0.78] + [] + [0.94]
    • Resulting sorted array: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94]

Complexity analysis

  • Time Complexity:
    • Average case: O(n + k) when elements are uniformly distributed, where n is the number of elements and k is the number of buckets.
    • Worst case: O(n²) if all elements fall into one bucket and the bucket sorting algorithm is O(n²).
    • Best case: O(n + k) when elements are well-distributed.
  • Space Complexity: O(n + k) for the buckets and result array.
  • Stability: Stable if the bucket sorting algorithm (e.g., insertion sort) is stable.
  • Advantages: Efficient for uniformly distributed data, especially floating-point numbers in a known range.
  • Disadvantages: Requires knowledge of the input range, less effective for non-uniform distributions.

Notes

  • For non-floating-point numbers (e.g., integers), scale the input to [0, 1) by dividing by the maximum value or map to an appropriate range for bucket indexing.
  • Bucket Sort is often used as a subroutine in algorithms like Radix Sort for sorting integers.
  • The number of buckets k is typically set to n or a constant, depending on the input size and distribution.

Links to this note