Task “Get the value of N url from a list” from an interview in Go

At the moment I am actively searching for a new project, so I am actively going to interviews.

I decided to share my thoughts on solving a problem that (it seems to me) is often given during interviews.

Task

Write a function that accepts several URL addresses and returns the sum of the address response bodies and an error if something went wrong.

Interested in discussing solutions?

So, we have a separate program. There are 2 sets of data – with a successful and unsuccessful case. Moreover, in the data sets of the unsuccessful case, different zones are specifically included. I came up with the data set myself, so if you think that their coverage is not complete, write in the comments.

Banal option

In the banal version (for it to work), we simply go through the entire data set. But the option is working!

// Банальный синхронный вариант

package main

import (
	"fmt"
	"io"
	"net/http"
	"time"
)

const byteInMegabyte = 1024 * 1024

func main() {

	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requesSumm(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabyte), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requesSumm(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabyte), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requesSumm(urlsSlv []string) (int64, error) {

	var sum int64

	client := &http.Client{
		Timeout: 10 * time.Second,
	}

	for _, v := range urlsSlv {
		resp, err := client.Get(v)
		if err != nil {
			return 0, err
		}
		defer resp.Body.Close()
		body, err := io.ReadAll(resp.Body)
		if err != nil {
			return 0, err
		}

		sum += int64(len(body))

	}
	return sum, nil
}

The execution time, as I think is clear from the definition, is equal to the sum of all requests.

ilia@goDevLaptop sobesi % go run httpget/v1.go
Сумма страниц в Мб=2.12, ошибка - <nil> 
Время выполнение запросов 16.01 сек. 
++++++++
Сумма страниц в Мб=0.00, ошибка - Get "https://111.321": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
Время выполнение запросов 18.88 сек. 
ilia@goDevLaptop sobesi %

Then the obvious option for the Golang language is to hook up an asynchronous call based on individual goroutines. Let’s see how the execution time changes?

// Банальный ассинхронный вариант
package main

import (
	"fmt"
	"io"
	"net/http"
	"sync"
	"time"
)

const byteInMegabytev2 = 1024 * 1024

type respSt struct {
	lenBody int64
	err     error
}

func main() {
	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requesSummAsync(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev2), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requesSummAsync(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev2), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requesSummAsync(urls []string) (int64, error) {
	var wg sync.WaitGroup
	ansCh := make(chan respSt, len(urls))

	client := &http.Client{
		Timeout: 10 * time.Second,
	}

	for _, url := range urls {
		wg.Add(1)
		go func(u string) {
			defer wg.Done()
			resp, err := client.Get(u)
			if err != nil {
				ansCh <- respSt{
					lenBody: 0,
					err:     err,
				}
				return
			}
			defer resp.Body.Close()

			body, err := io.ReadAll(resp.Body)
			if err != nil {
				ansCh <- respSt{
					lenBody: 0,
					err:     err,
				}
				return
			}
			ansCh <- respSt{
				lenBody: int64(len(body)),
				err:     nil,
			}
		}(url)
	}

	go func() {
		wg.Wait()
		close(ansCh)
	}()

	var sum int64
	var err error
	for bodyLen := range ansCh {
		sum += bodyLen.lenBody
		if bodyLen.err != nil {
			if err == nil {
				err = fmt.Errorf("Ошибка %v у сайта %v", bodyLen.err)
				continue
			}
			err = fmt.Errorf("Ошибка %v у сайта %v;%v", bodyLen.err, err)
		}
	}
	if err != nil {
		return 0, err
	}

	return sum, err
}

In fact, the execution time will be equal to the execution of the slowest query + the time for addition.

ilia@goDevLaptop sobesi % go run httpget/v2.go
Сумма страниц в Мб=2.50, ошибка - <nil> 
Время выполнение запросов 2.81 сек. 
++++++++
Сумма страниц в Мб=0.00, ошибка - Ошибка Get "https://111.321": context deadline exceeded (Client.Timeout exceeded while awaiting headers) у сайта Ошибка Get "https://999.000": dial tcp: lookup 999.000: no such host у сайта %!v(MISSING);%!v(MISSING) 
Время выполнение запросов 10.00 сек. 
ilia@goDevLaptop sobesi %

The request timeout is 10 seconds, but is it possible to improve the speed in our task?


Let’s supplement the implementation above with a common context that will be common to all created goroutines.

// Ассинхронный вариант с контекстом
package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"net/http"
	"sync"
	"time"
)

type respStC struct {
	lenBody int64
	err     error
}

const byteInMegabytev3 = 1024 * 1024

func main() {
	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtx(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev3), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtx(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev3), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requestSumAsyncWithCtx(urls []string) (int64, error) {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	var wg sync.WaitGroup
	ansCh := make(chan respStC, len(urls))

	client := &http.Client{
		Timeout: 10 * time.Second,
	}

	for _, url := range urls {
		wg.Add(1)
		go func(u string) {
			defer wg.Done()
			req, err := http.NewRequestWithContext(ctx, "GET", u, nil)
			if err != nil {
				ansCh <- respStC{lenBody: 0, err: err}
				return
			}

			resp, err := client.Do(req)
			if err != nil {
				ansCh <- respStC{lenBody: 0, err: err}
				return
			}
			defer resp.Body.Close()

			body, err := io.ReadAll(resp.Body)
			if err != nil {
				ansCh <- respStC{lenBody: 0, err: err}
				return
			}

			ansCh <- respStC{lenBody: int64(len(body)), err: nil}
		}(url)
	}

	go func() {
		wg.Wait()
		close(ansCh)
	}()

	var sum int64
	var err error
	for bodyLen := range ansCh {
		sum += bodyLen.lenBody
		if bodyLen.err != nil && !errors.Is(bodyLen.err, context.Canceled) {
			if err != nil {
				err = fmt.Errorf("Ошибка %v у сайта %v;%v", bodyLen.err, bodyLen.lenBody, err)
			} else {
				err = fmt.Errorf("Ошибка %v у сайта %v", bodyLen.err, bodyLen.lenBody)
			}
			cancel()
		}
	}
	return sum, err
}

Now let’s look at the execution time.

ilia@goDevLaptop sobesi % go run httpget/v3.go
Сумма страниц в Мб=2.50, ошибка - <nil> 
Время выполнение запросов 2.89 сек. 
++++++++
Сумма страниц в Мб=0.00, ошибка - Ошибка Get "https://999.000": dial tcp: lookup 999.000: no such host у сайта 0 
Время выполнение запросов 0.00 сек. 
ilia@goDevLaptop sobesi %

And now it turns out that we don’t have to wait for every request, but return an error right away.


But real life is always a struggle with restrictions, and if we are given too large a list (more available network connections) in the environment where the program is running, then we will get undefined behavior of both our program and the environment around it. Therefore, we will limit the program’s ability to create more than a given number of network connections at the same time.

For this, of course, we will use a buffered channel 😉

// Ассинхронный вариант с контекстом и пулом соединений в poolHTTPReq
package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"net/http"
	"sync"
	"time"
)

type respStCWP struct {
	lenBody int64
	err     error
}

const poolHTTPReq = 2
const byteInMegabytev4 = 1024 * 1024

func main() {
	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtxAndPool(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev4), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtxAndPool(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev4), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requestSumAsyncWithCtxAndPool(urls []string) (int64, error) {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	var wg sync.WaitGroup
	ansCh := make(chan respStCWP, len(urls))
	semaphore := make(chan struct{}, poolHTTPReq)

	for _, url := range urls {
		semaphore <- struct{}{}
		wg.Add(1)
		go func(u string) {
			defer func() {
				<-semaphore
				wg.Done()
			}()

			req, err := http.NewRequestWithContext(ctx, "GET", u, nil)
			if err != nil {
				ansCh <- respStCWP{lenBody: 0, err: err}
				return
			}

			resp, err := http.DefaultClient.Do(req)
			if err != nil {
				ansCh <- respStCWP{lenBody: 0, err: err}
				return
			}
			defer resp.Body.Close()

			body, err := io.ReadAll(resp.Body)
			if err != nil {
				ansCh <- respStCWP{lenBody: 0, err: err}
				return
			}

			ansCh <- respStCWP{lenBody: int64(len(body)), err: nil}
		}(url)
	}

	go func() {
		wg.Wait()
		close(ansCh)
		close(semaphore)
	}()

	var sum int64
	var err error
	for bodyLen := range ansCh {
		sum += bodyLen.lenBody
		if bodyLen.err != nil && !errors.Is(bodyLen.err, context.Canceled) {
			if err != nil {
				err = fmt.Errorf("Ошибка %v у сайта %v;%v", bodyLen.err, bodyLen.lenBody, err)
			} else {
				err = fmt.Errorf("Ошибка %v у сайта %v", bodyLen.err, bodyLen.lenBody)
			}
			cancel()
		}
	}
	return sum, err
}

And we get values ​​that are, of course, worse than the previous version, but closer to life.

ilia@goDevLaptop sobesi % go run httpget/v4.go
Сумма страниц в Мб=2.50, ошибка - <nil> 
Время выполнение запросов 9.05 сек. 
++++++++
Сумма страниц в Мб=2.12, ошибка - Ошибка Get "https://999.000": dial tcp: lookup 999.000: no such host у сайта 0 
Время выполнение запросов 4.29 сек. 
ilia@goDevLaptop sobesi % 

All code is naturally posted on GitHub

If you are interested in such articles or have questions, comments, or suggestions, be sure to write a comment.

And also subscribe to my telegram channel, where I publish my thoughts on everything interesting that catches my eye from the world of it.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *