The perfect algorithmic section in Golang (IMHO)

I am very bad at any exams. I almost always keep up. And it always seemed to me that there is something wrong with this process. In companies like Yandex or Google, when they need to recruit software engineers, and it is not known what language and project they will work on, at least it is clear why this is necessary. But what to do in ordinary companies? Where you need to write CRUDs and set up interserver interaction? It seems unnatural to me.

But one day, while explaining prime numbers to my daughter, I came up with the perfect algorithmic section for a Go developer. I had the task down in about an hour (just the standard time for an algorithmic section). Interested? Welcome under the cut!

So, today we will try to pass the ideal AlgoSobes together. Today we will have:

A banal implementation option
Algorithmic implementation option
GOroutine (or multithreaded)
And a problem with an asterisk.

We will implement it in a banal way.

We have one task for today. We need to count a large number of prime numbers as quickly as possible. Let's implement this in code.

func isPrime(n uint64, primes []uint64) bool {
	for _, p := range primes {
		if p*p > n {
			break
		}
		if n%p == 0 {
			return false
		}
	}
	return true
}

Let's just make a function that checks whether our number and an array of prime numbers are prime.

Let's see how long it takes to do some big calculation.

Prime numbers from 2 to :1000000000 – 50847534 pcs.
Execution time: 3m19.523102917s

Hidden text

package main

import (
	"fmt"
	"time"
)

func isPrime(n uint64, primes []uint64) bool {
	for _, p := range primes {
		if p*p > n {
			break
		}
		if n%p == 0 {
			return false
		}
	}
	return true
}

const maxCounter = 1_000_000_000

func main() {
	start := time.Now()
	primes := []uint64{}

	i := uint64(2)
	for i = 2; i <= maxCounter; i++ {
		if isPrime(i, primes) {
			primes = append(primes, i)
		}
	}

	fmt.Printf("Простых чисел от 2 до :%v - %v шт.\n", maxCounter, len(primes))
	fmt.Printf("Время выполнения: %s\n", time.Since(start))
}

Algorithmic implementation option

For those who have forgotten, the algorithmically correct solution to the problem of counting prime numbers is achieved using an algorithm called the “Sieve of Eratosthenes”

Link on wiki

Therefore, we will implement it in code.

func sieveOfEratosthenes(max uint64) []uint64 {
	if max < 2 {
		return []uint64{}
	}

	sieve := make([]bool, max+1)
	for i := range sieve {
		sieve[i] = true
	}

	for p := uint64(2); p*p <= max; p++ {
		if sieve[p] {
			for i := p * p; i <= max; i += p {
				sieve[i] = false
			}
		}
	}

	var primes []uint64
	for p := uint64(2); p <= max; p++ {
		if sieve[p] {
			primes = append(primes, p)
		}
	}

	return primes
}

And as expected, our results with the same initial data

Prime numbers from 2 to 1000000000: 50847534 pcs.
Execution time: 50.8603695s

Absolute numbers are not so important, because they will depend on the maximum power of one core of our hardware. But the difference of 4 times, in my opinion, perfectly demonstrates the importance of choosing the right algorithm when solving a problem. (This is the answer for those who ask why I need algorithms, I only do CRUDs)

Hidden text

package main

import (
	"fmt"
	"time"
)

func sieveOfEratosthenes(max uint64) []uint64 {
	if max < 2 {
		return []uint64{}
	}

	sieve := make([]bool, max+1)
	for i := range sieve {
		sieve[i] = true
	}

	for p := uint64(2); p*p <= max; p++ {
		if sieve[p] {
			for i := p * p; i <= max; i += p {
				sieve[i] = false
			}
		}
	}

	var primes []uint64
	for p := uint64(2); p <= max; p++ {
		if sieve[p] {
			primes = append(primes, p)
		}
	}

	return primes
}

const maxCounterAlgo = 1_000_000_000

func main() {
	start := time.Now()

	primes := sieveOfEratosthenes(maxCounterAlgo)

	fmt.Printf("Простых чисел от 2 до %v: %v шт.\n", maxCounterAlgo, len(primes))
	fmt.Printf("Время выполнения: %s\n", time.Since(start))
}

GOroutine (or multithreaded) solution method

The go language allows you to write multithreaded applications very well. So, let's implement the correct algorithm, only in multithreaded mode.

To limit the number of goroutines, we will use a semaphore

func markNonPrimes(sieve []bool, start, end, prime uint64, sem chan struct{}, wg *sync.WaitGroup) {
	defer wg.Done()
	for i := start; i <= end; i += prime {
		sieve[i] = false
	}
	<-sem
}

func parallelSieveOfEratosthenes(max uint64, numWorkers int) []uint64 {
	if max < 2 {
		return []uint64{}
	}

	sieve := make([]bool, max+1)
	for i := range sieve {
		sieve[i] = true
	}

	var wg sync.WaitGroup
	sem := make(chan struct{}, numWorkers)

	sqrtMax := uint64(math.Sqrt(float64(max)))
	for p := uint64(2); p <= sqrtMax; p++ {
		if sieve[p] {
			wg.Add(1)
			sem <- struct{}{}
			go markNonPrimes(sieve, p*p, max, p, sem, &wg)
		}
	}

	wg.Wait()

	var primes []uint64
	for p := uint64(2); p <= max; p++ {
		if sieve[p] {
			primes = append(primes, p)
		}
	}

	return primes
}

Thus, we can now load not only one core, but also divide it across the entire available CPU. And naturally, the result is better again.

Prime numbers from 2 to 1000000000: 50847534 pcs.
Execution time: 8.31986275s

Hidden text

package main

import (
	"fmt"
	"math"
	"runtime"
	"sync"
	"time"
)

func markNonPrimes(sieve []bool, start, end, prime uint64, sem chan struct{}, wg *sync.WaitGroup) {
	defer wg.Done()
	for i := start; i <= end; i += prime {
		sieve[i] = false
	}
	<-sem
}

func parallelSieveOfEratosthenes(max uint64, numWorkers int) []uint64 {
	if max < 2 {
		return []uint64{}
	}

	sieve := make([]bool, max+1)
	for i := range sieve {
		sieve[i] = true
	}

	var wg sync.WaitGroup
	sem := make(chan struct{}, numWorkers)

	sqrtMax := uint64(math.Sqrt(float64(max)))
	for p := uint64(2); p <= sqrtMax; p++ {
		if sieve[p] {
			wg.Add(1)
			sem <- struct{}{}
			go markNonPrimes(sieve, p*p, max, p, sem, &wg)
		}
	}

	wg.Wait()

	var primes []uint64
	for p := uint64(2); p <= max; p++ {
		if sieve[p] {
			primes = append(primes, p)
		}
	}

	return primes
}

const maxCounterConcurrent = 1_000_000_000

func main() {
	start := time.Now()
	numWorkers := (runtime.NumCPU() * 2) - 1

	primes := parallelSieveOfEratosthenes(maxCounterConcurrent, numWorkers)

	fmt.Printf("Простых чисел от 2 до %v: %v шт.\n", maxCounterConcurrent, len(primes))
	fmt.Printf("Время выполнения: %s\n", time.Since(start))
}

It seems, well, that's it. We can stop, we took a great language, implemented a great algorithm, we can go and rest. It's Friday (and yes, thanks to those who read this far, I'm writing this on Programmer's Day, so colleagues, happy holiday).

But there is one problem. This implementation has an incredibly trivial limitation. And as you probably guessed, this limitation is in the maximum value of the uint64 type.

What happens if we need to calculate a really big number? How do we solve the problem? And how much does it cost in computing time? (after all, nothing is free)

Problem with an asterisk

To be honest, I didn't think about the solution options for very long (well, time is running out, the interview lasts 1 hour). Therefore, I decided to use big.Int's, as a solution that already exists in the language. Perhaps it can be done more optimally, but in practice I have not encountered big.Int – write in comments your experience.

func parallelSieveOfEratosthenesBigInt(max *big.Int, numWorkers int) []*big.Int {
	maxInt64 := max.Int64()
	sieve := make([]bool, maxInt64+1)
	for i := range sieve {
		sieve[i] = true
	}
	sieve[0], sieve[1] = false, false

	sqrtMax := new(big.Int).Sqrt(max)
	sqrtMaxInt64 := sqrtMax.Int64()

	var wg sync.WaitGroup
	semaphore := make(chan struct{}, numWorkers)

	for p := int64(2); p <= sqrtMaxInt64; p++ {
		if sieve[p] {
			wg.Add(1)
			semaphore <- struct{}{}
			go func(p int64) {
				defer func() { <-semaphore }()
				start := new(big.Int).SetInt64(p * p)
				prime := new(big.Int).SetInt64(p)
				markNonPrimesBigInt(sieve, start, max, prime, &wg)
			}(p)
		}
	}

	wg.Wait()

	var primes []*big.Int
	for i := int64(2); i <= maxInt64; i++ {
		if sieve[i] {
			primes = append(primes, big.NewInt(i))
		}
	}

	return primes
}

We will also use a semaphore to limit the number of goroutines running at the same time when seeding.

Prime numbers from 2 to 1000000000: 50847534 pcs.
Execution time: 15.391528792s

As you can see, the difference is almost 2 times, this is precisely the problem of the “universality” of the solution. Therefore, in addition to the correct algorithm, it is worth correctly setting the restrictions on the operating conditions of this code.

Hidden text

package main

import (
	"fmt"
	"math/big"
	"runtime"
	"sync"
	"time"
)

func markNonPrimesBigInt(sieve []bool, start, end, prime *big.Int, wg *sync.WaitGroup) {
	defer wg.Done()
	for i := new(big.Int).Set(start); i.Cmp(end) <= 0; i.Add(i, prime) {
		sieve[i.Int64()] = false
	}
}

func parallelSieveOfEratosthenesBigInt(max *big.Int, numWorkers int) []*big.Int {
	maxInt64 := max.Int64()
	sieve := make([]bool, maxInt64+1)
	for i := range sieve {
		sieve[i] = true
	}
	sieve[0], sieve[1] = false, false

	sqrtMax := new(big.Int).Sqrt(max)
	sqrtMaxInt64 := sqrtMax.Int64()

	var wg sync.WaitGroup
	semaphore := make(chan struct{}, numWorkers)

	for p := int64(2); p <= sqrtMaxInt64; p++ {
		if sieve[p] {
			wg.Add(1)
			semaphore <- struct{}{}
			go func(p int64) {
				defer func() { <-semaphore }()
				start := new(big.Int).SetInt64(p * p)
				prime := new(big.Int).SetInt64(p)
				markNonPrimesBigInt(sieve, start, max, prime, &wg)
			}(p)
		}
	}

	wg.Wait()

	var primes []*big.Int
	for i := int64(2); i <= maxInt64; i++ {
		if sieve[i] {
			primes = append(primes, big.NewInt(i))
		}
	}

	return primes
}

const maxCounterBigIntS = "1000000000"

func main() {
	start := time.Now()
	maxCounterBigInt := new(big.Int)
	maxCounterBigInt.SetString(maxCounterBigIntS, 10)
	numWorkers := (runtime.NumCPU() * 2) - 1

	primes := parallelSieveOfEratosthenesBigInt(maxCounterBigInt, numWorkers)

	fmt.Printf("Простых чисел от 2 до %v: %v шт.\n", maxCounterBigIntS, len(primes))
	fmt.Printf("Время выполнения: %s\n", time.Since(start))
}

And then I understand that the problem with the asterisk is not solved correctly. And not all values go through bigInt
And in general, it is worth talking about the memory limitation there. Therefore, apparently, this is already from another interview.