How to automate the collection and cleaning of semantics from duplicates and garbage
Semantic kernel collection tools allow you to collect hundreds and thousands of keywords. But in this list there will inevitably be repeated requests, extra characters, spaces, phrases with zero frequency, etc. All this “garbage” needs to be cleaned.
Using the example of an online electronics store, we show step-by-step how to use the Click.ru tools to collect a query pool for a context and bring it into a usable form.
1. We collect a pool of requests
Click.ru has a free tool for collecting the semantic core of a site. “Media Planning”.
How to work with him:
1. Register at Click.ru. Click Create Account. In the window that opens, select the advertising system and name the account.
2. Add a campaign. Name it, select the places where your ads will appear, specify the URL, geo-targeting.
3. The system will offer you to select words or add your own. Click “Choose words” – the media planner will select keywords based on the content of the advertised site.
4. Add the matched words to the media plan. In our example, the system collected 623 keywords. To add them to the media plan, check the box in the header of the table and click “Add to media plan”.
5. Expand the semantic core with the words your competitors are promoting. To do this, in the “Automatic selection of words”, select “Words of competitors.”
The system will offer up to 5 competitors and collect words on them. Check if the range of proposed competitor sites matches yours. If not, delete the proposed sites and set the competitors URL yourself (up to 10 sites at a time).
The system will take several minutes to collect words. After that, click “Show competitors words”.
In the example, the system collected 2849 words. To view them, scroll down the table and click “Show All”.
Browse the list of collected words. Not all of them will be relevant, because even the closest competitors may not have the same product range. There are two ways to get rid of unnecessary words:
- Delete them right away manually
- Clean already at the very end – after grouping (then words can be deleted with “bundles”, and not one at a time).
If there are many words, it is better to choose the second option.
So, now we need to unload the collected keys. To do this, add them to the media plan. To add all the words from the table to the media plan, check the box in the header of the table – the system will automatically check the boxes next to each phrase. Next, click “Add to Media Plan”.
Upload the words added to the media plan into an .xls file.
Using two pickers (according to the content of the site and according to the competitors), it was possible to collect and add 3492 words to the media plan. Now all this needs to be cleared.
2. We delete duplicate requests, special characters, spaces, empty lines
Manually finding duplicates in a list of 1000 or more words is difficult. For this, free keyword normalizer.
What can he do:
- Delete duplicate words in exact occurrence. For example, if a tool finds two key phrases “buy samsung galaxy s10” in the list, it will delete one of them.
- Delete duplicates based on morphology and word permutation. For example, if the system detects two phrases “buy samsung galaxy s10” and “buy samsung galaxy s10 buy”, then the second phrase will be considered a duplicate and will be deleted.
- Delete special characters at the beginning and end of a word. In the collected requests (especially if this is done using third-party services), special characters may appear: question marks, pros and cons. For example, the normalizer found a phrase with a plus in the list of words: “samsung galaxy s10 + buy.” It will simply remove the plus and extra spaces, and the request itself will remain unchanged.
- Removes extra spaces. If there are extra spaces at the beginning, middle or end of the key phrase, the tool will detect them and delete them.
- Removes tabs and blank lines. The tool removes the indentation at the beginning and end of the line. If the table has empty rows, then they are also deleted.
- Converts words to lowercase. If the list contains parsed headers written in upper case, then the system translates them into lower case.
- Replaces e with e. If you do not use the letter “e”, check the box next to the option “Replace e with e”.
- Free use.
- Verification is done online. No need to install software or keep the page open.
- Unlimited words on the list.
- Completed tasks are stored in the Click.ru account for an unlimited amount of time.
- No need to enter captcha.
How to use the tool
Go to tool page and add the words.
Choose what actions you need to perform with the kernel, and click “Run.”
It will take a couple of minutes for the system to complete the specified actions. The report is available for download as an .xlsx file.
To download the report in the “Task List”, click on the “Download XLSX” button.
The report consists of two pages:
- list of words cleared of duplicates;
- initial settings (initial list of words and settings for necessary actions with the kernel).
In the example, the source list consisted of 3492 queries. After cleaning, their number was reduced to 2828 words, that is, duplicates occupied 19% of the core.
3. Delete words with zero frequency
For contextual advertising purposes, words with a near-zero frequency are not of interest, since there will be no impressions for them. Such words are best removed immediately.
To check the frequency of a large array of keys, Click.ru has parser wordstat. It collects frequencies from the left column of Wordstat. It parses frequency in any Yandex region and takes into account the type of keyword matching.
How to use the tool
Go to tool page. Add queries.
Select the region over which the instrument will pars frequencies.
Specify frequency collection parameters. The tool collects frequencies according to requests in a broad correspondence, fixes the number of words and morphology, fixes the word order.
Read more about the features of the Wordstat parser in the article: “How to quickly determine the frequency in Wordstat”
To start the task, click the “Run Scan” button. The collection time depends on the number of requests, regions, and match types.
The report is available in the task list in XLSX format.
The report indicates the frequency of requests in different types of compliance. Delete words with zero and near-zero frequencies.
Important! Be careful with keywords related to seasonal goods / services. In Wordstat, statistics are collected for the last month, so if there is a recession in your niche now, the frequency will be low. We wrote in detail about frequency analysis in Wordstat here. Google Trends can also help. They also told how to work with him.
After removing the “nulls,” you can proceed to group words.
4. Break the collected words into groups and complete the kernel cleanup
To group keywords use clustering. The tool groups words based on a comparison of TOPs of search results in a given region. The tool is usually used by SEO-specialists to break the keys on the pages. But it is also well suited for contextual advertising.
How to use the tool
Go to tool page. For ease of navigation in reports, indicate the website address and name the project.
Upload requests with a file or list. The list should contain at least 20 queries.
Choose a clustering method. Two options are available: comparison of TOPs and professional settings. In the settings, specify the search engine, the accuracy range, the number of words in the cluster (for professional settings). Click on the “Start Clustering” button.
Read more about the configuration and capabilities of Click.ru clustering in the article “How to group key queries using clustering?”
Download the report in the task list.
In the report, requests for ads are grouped in clusters based on search results. View the report and delete the clusters with irrelevant queries. It is much more convenient to do it right now, and not one word at a time during the selection.
As a result, you get grouped and garbage-free semantics, ready for use.
Not enough words for semantics – expand the list
It happens that after cleaning the list of words is greatly reduced. In this case, use the tools to expand the semantics:
Read more about expanding the kernel in a narrow niche in the article “How to expand the English-language semantics for search advertising in a narrow niche.”
Another option for replenishing semantics is keyword parsing, which launched the advertising of competitors using a free parser. Details about him – link.