Fast lookup algorithm using hashing

In this article, I want to present an algorithm for optimizing data storage for fast search (using the map container as an example).

So the task

There is a certain electronic book which is simultaneously read by an unlimited number of readers. It is necessary to make sure that a given reader at any time can check what proportion of users have read a smaller part of the book than he does. A naive solution would be to store in std::map the number of pages as a key, and the number of users who read them as a value. Of course, with this approach, the program works slowly with large tests, because the number of iterations through the map container is equal to the number of pages read by the user. That is, if the user has read 1000 pages out of 1000 possible, then 1000 iterations will need to be done in the loop, and this greatly slows down the program.

To reduce the running time of the program, it is necessary to simplify the algorithm for counting users. This algorithm separately counts how many users read each of the previous hundreds of pages, and then sums up page by page all those who have read a smaller number of pages from the hundred that the reader is currently on. As you probably already guessed, for such an algorithm, you will need to create a separate container, in which for every hundred pages the number of users reading it will be stored. Such an algorithm allows instead of 998 iterations (if the user reads the 999th page) to do only 107 (9 iterations for hundreds and 98 for single pages).

This is in brief, now let’s move on to a detailed description and first I will give the code.

#include <iomanip> 
#include <iostream> 
#include <vector> 
#include <utility> 
#include <map> 

using namespace std; 

class ReadingManager { 

public: 
    //ф-я отмечает, какую страницу читает пользователь
    void Read(int user_id, int page_count) 
    {  
      if (pages_by_users.find(user_id)!= pages_by_users.end()) 
      { 
        //уменьшаем количество читателей в той сотне страниц, на которой ранее был пользователь
hundreds_of_users[((pages_by_users[user_id] - (pages_by_users[user_id] % 100)) + 100)/100]--; 
         //уменьшаем ко-во читателей на уже прочитанной странице
         users_by_pages[pages_by_users[user_id]]--; 
      } 
      pages_by_users[user_id] = page_count; 
      users_by_pages[page_count]++; 
         // увеличиваем на 1 число пользователей, дочитавших до данной сотни страниц.
      hundreds_of_users[((pages_by_users[user_id] - (pages_by_users[user_id] % 100)) + 100) / 100]++; 
    }   
     //ф-я сообщает пользователю, сколько читателей прочли меньше, чем он
    float Cheer(int user_id) 
    { 
        //пользователь пока не прочел ни одной страницы
        if (!pages_by_users.count(user_id)) 
        {  return 0; } 
        //пользователь - единственный читатель
        if (pages_by_users.size() == 1) 
        {  return 1; } 
        //номер страницы, которую читает пользователь
        int read_by_user = pages_by_users.at(user_id); 

        int total_users_less = 0; 
        // с какого числа начинать постраничный подсчет пользователей
        int bottom_border = (read_by_user>99)?(read_by_user - (read_by_user % 100)):1; 
        // пока никто, кроме читателя, не дочитал до этой страницы
        if (users_by_pages.find(bottom_border) == users_by_pages.end())
        {  users_by_pages [bottom_border]=0; } 

        map<int, int >::const_iterator it;   
        for (it = users_by_pages.find(bottom_border); it != users_by_pages.end();++it) 
        { 
         //пока не дойдем до страницы читателя
          if (it->first == read_by_user) 
          { break; } 
            //постранично суммируем всех читателей в данной сотне
           total_users_less += it->second; 
        } 
      
         // подсчет читателей по сотням страниц
        if (read_by_user > 99) 
        { 
          int hundreds = 1, x = (read_by_user - (read_by_user % 100)) / 100; 
          while (hundreds <= x)
          { 
            total_users_less += hundreds_of_users[hundreds]; 
            hundreds++; 
          } 
        } 
        return float(total_users_less) / (pages_by_users.size() - 1); 
    } 
private: 
    map<int, int> pages_by_users; //читатели - страницы
    map<int,int>  users_by_pages; // страницы - читатели
    int hundreds_of_users[10] = { 0 }; // массив для поделенных на сотни страниц

}; 

int main() { 
    ios::sync_with_stdio(false); 
    cin.tie(nullptr); 
    ReadingManager manager; 
    unsigned long int query_count; 
    cin >> query_count; 
    unsigned long int  query_id = 0; 
    string query_type; 
    int user_id; 

    while (query_id < query_count)
    { 
        cin >> query_type; 
        cin >> user_id; 
        if (query_type == "READ") 
        { 
          int page_count; 
          cin >> page_count; 
          manager.Read(user_id, page_count); 
        } 
        else if (query_type == "CHEER") 
        { 
          cout << setprecision(6) << manager.Cheer(user_id) << "\n"; 
        } 

        ++query_id; 
    } 
    return 0; 

} 

First, in a separate array, for each hundred pages, you need to store how many users are currently reading one of the pages in this hundred. To do this, you need to break the pages into hundreds and, with each call to the Read function, take into account how many users have switched to reading a new hundred of pages (how many users read pages from 1-99, from 100-199, etc.). Pages are rounded up to the nearest hundred ((pages_by_users[user_id] – (pages_by_users[user_id] % 100)) + 100). So, page 135 is rounded up to 200, 765 to 800, 38 to 100.

First, we find in pages_by_users to which page the user, if this is not a new user, read the last time, calculate by the same formula to which hundred the page was rounded, and reduce the number of users reading this hundred by one. Then we calculate a new hundred and increase the number of readers in it by one. This data is conveniently stored in an array of 11 elements (for 1000 pages), when created, all elements are initialized to zero. To correlate hundreds of pages with array indices, the resulting hundred is divided by 100. The array cell with index 1 stores how many users read pages from 1 to 99, with index 2 – how many pages from 100 to 199 read, and so on.

hundreds_of_users[((pages_by_users[user_id] – (pages_by_users[user_id] % 100)) + 100) / 100]++; // increase by 1 the number of users who have read up to this hundred pages. Let’s say pages 301-400 are read by 98 people.

Now you don’t need to go through all the pages in a loop, it’s enough to round the page_count down to the nearest hundred (from 567 to 500, for example) to find out to which page users can be counted by hundreds of pages. We will take this data from the array:

if (read_by_user > 99)   
{ 
  int hundreds = 1, x = (read_by_user - (read_by_user % 100)) / 100; 
  while (hundreds <= x) 
  { 
    total_users_less += hundreds_of_users[hundreds]; 
    hundreds++; 
  } 
} 

It takes only 5 iterations to account for users who have read 500 pages.

2) The total number of users who have read less than the searched user will be found from users_by_pages. We will need to do not 567, but only 66 iterations. If page_count is less than 100, that is, the user has not read the full hundred pages, you cannot count everyone who reads the same hundred pages. In this case, for the number of iterations equal to (read_by_user -1), we summarize in the map all those who read less.

The algorithm allows you to significantly speed up the search, while using a minimum of additional memory.

Thank you for your attention.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *