How AI systems are transforming digital marketing – expert opinion and project discussions
Employees machine learning labs ITMO University is engaged not only in theory, but also in applied projects. Some of them manage to inspire members of the scientific and professional community around the world, transform business and digital space. This work is carried out by Media Research Group under the leadership of professors Alexandra Farseeva… Today he talks about his team’s research and projects.
User profiling on social networks
At Media Research Group, which is part of ITMO University’s Machine Learning Lab, we work in several research areas. They are associated with the use of artificial intelligence systems in the analysis of social media data and the generation of synthetic multimedia content. Moreover, all our projects find practical application in one way or another – take at least profiling on social networks.
This is about analyzing user data. His goal is to understand who they are, what they are interested in, what type of personality they have. Profiling is used in social, marketing, political and other research.
A loud discussion of our profiling algorithms took place back in 2017 in connection with the news about Donald Trump. Based on data from Twitter, the algorithms concluded that Trump was single, although it was clear that he was married. Then this news was discussed everywhere, even in The independent wrote about our work. The conclusion about Trump’s marital status seemed controversial to many, but still, in my opinion, he helped to reveal the “true face” of the ex-president.
It should be noted that the accuracy of the algorithm exceeded 80%, that is, the model was built correctly. Trump simply did not fit his demographics psychographic behavior. If you read Trump’s tweets without knowing that it is him, you would hardly have guessed that their author is an elderly married man who holds an important political post.
Chances are, like our algorithm, you would think that this is someone much younger.
Researchers’ ideas about the situation or the market do not always reflect the real state of affairs. For example, “in the world of a marketer” only women buy children’s products, moreover, from 35 to 40 years. In fact, aunts, uncles and dads do it. And mothers can love basketball, and not just sit with a child. But marketers usually don’t think so. Machine learning algorithms help you more accurately formulate and test various hypotheses.
In the process of profiling, we take into account age, location, subscribers, published videos and photos, text of posts and other data depending on the goals and the chosen research model. When building a machine learning model, the question arises about a balanced way to integrate all the diversity available in the data. Therefore, we develop algorithms for the so-called “multimodal»Machine learning. They are able to work not only with data from one social network and not only with one data type, but with many sources and data types. This approach allows you to build a holistic image of users and produce accurate profiling.
In a number of our studies, we predicted the characteristics of social media users on the MBTI scale (Myers – Briggs typology), but in one of them – decided to focus on predicting the marital status of users, since this characteristic largely determines the interests and behavior of people. For research we used collected we back in 2014 the NUS-MSS database, which contains multimodal data from three social networks (Twitter, Foursquare and Instagram) and reliable records of the family status of users from three regions – Singapore, New York and London. To obtain a predictive model with quantitative values, we divided NUS-MSS users into married and unmarried users, and then, using feature selection algorithms, identified characteristics correlated with marital status. Extrapolating from the findings, we applied feature selection algorithms to the two resulting groups. The average accuracy of the predictive abilities of the model for three locations is presented in the table.
Our experience shows that combining data from two sources in some cases can increase the prediction accuracy by 17%. It takes into account not only information about the behavior of an individual user, but also people similar to him. The similarity is determined by hitting the clusters identified on the basis of data from several social networks. You can read about spectral clustering, which is a key concept in this study, in our article… If you are interested in digging deeper, take a look implementation such clustering for Java.
This is just the tip of the iceberg of the capabilities of AI systems in analyzing data from social networks. Some cloud-based AI platforms (for example, Social Bakers or SoMin.aiof which I am the founder) are able to go far beyond personal profiling and use so-called psychographic analysis. It is about identifying the hidden personality traits that determine our daily decisions in literally every aspect of life.
Marketing professionals spend dozens of hours preparing several variations of a given content. After all, it is important for them to “get” into the right audience, reflect the corporate identity, and, in the end, make the content itself attractive to consumers. It also needs to be adapted for different channels (material for Habr! = Post on Facebook), which also entails an additional waste of time. This is where our second research area comes in – with machine learning support, marketers can focus on creativity and strategic decisions. And automated systems will be engaged in content generation.
Content generation is possible with generative adversarial networks… Their architecture consists of two main parts – a generator and a discriminator. The first deals with the creation of synthetic content, and the second determines whether the content in front of him is real or fake. The generator takes into account the results of the discriminator operation at each next iteration. If the discriminator is unable to distinguish a synthetic image from a regular photo, this is a sign that the generator is creating realistic synthetic images.
GANs are the technology of the future for the digital marketing industry, other professions and fields of activity. We also use GANs in our commercial developments – for example, we used one of the architecture variations in the design the first in the world of an AI influencer for PUMA Asia Pacific. We named this character Maya… She takes selfies and lives her usual virtual life. To create it, millions of faces were collated from various sources, including Instagram. This allowed the visualization of several versions of the face, which was the first step in creating a virtual blogger.
However, purely generative adversarial networks are indispensable. I cannot share all the technical details, since the project is commercial. But I would like to mention a tool that has proven to be quite useful both on this project and others related to profiling. This is a search by climbing to the top (Hill climbing) Is a technique for finding the optimal solution by step-by-step changing one of the elements of the solution. It is used as an optimization strategy for non-convex ensemble models. We often use Hill Climbing in cases where we have the task of selecting the parameters of machine learning algorithms and it is not possible to iterate over all combinations – for example, because of the complexity of each training pass. In the case of Hill Climbing, this problem is solved in much fewer passes, thereby speeding up the training process.
It is also important to be able to use a small modification of the algorithm – Hill Climbing with Random Restart. The bottom line is that we restart Hill Climbing many times with different random values of the departure points of the parameters, thereby increasing our chance of finding not a local, but a global minimum, even for non-convex optimization problems. A very useful heuristic that allows you to select parameter values quickly and with a high probability close to optimal. The implementation of the technique in the code can be viewed here…
In particular, the Hill Climbing technique was used in one of our first social media user profiling projects. This project is dedicated to the article Harvesting multiple sources for user profile learning: a big data study… Here we do data fusion by modeling the sources as a linear combination of machine learning model predictions trained on each source separately – the so-called Late Fusion Ensemble. It is clear that by connecting sources with weights of 1, we will not be able to achieve the best results. After all, text data, for example, from Twitter can be more useful than the same text data, only from Foursquare (intended for the exchange of geodata points). This is where approaches like Hill Climbing are needed in order to efficiently and quickly (without going through all the combinations of sources) find the correct weights for each social network and data modality to achieve good results of the combined model.
Profiling and generation
Synthetic content can be used in tandem with profiling. Depending on the interests of the person, he will be offered the most attractive auto-generated ad. Let’s say a fast food restaurant has released a banner ad for a new burger. we we can generate based on it, another hundred versions of the banner and find among them those that are more popular with the audience. In this way, user profiling and content generation complement each other organically. And SoMin.ai, in practice, combines these two research areas into a good marketing tool. Based on the MBTI personality type, which is automatically determined by analyzing content from social media profiles, SoMin.ai generates new content based on the preferences of other users with a similar personality type. This is how the SoMin.ai platform structure looks like:
The diagram shows that on the server side, we collect content from brands through native interactions with their libraries and upload to the platform every twelve hours. The other five steps are performed at varying intervals from 24 hours to 30 days:
Collecting content from brands.
Collecting content from users and collecting feedback.
Train profiling models and content generation.
Generation of content based on personality type.
Collecting feedback from platform users.
A more complete description of how the platform works can be found in articlewhich my lab colleagues and I published at WSDM 2020.
Business understands the potential of these research areas, and Media Research Group is successfully unlocking it. I think that’s why SoMin.ai became a partner OpenAI, and my team got access to GPT-3 to develop ad algorithms in social networks. Probably for the same reason SoMin.ai awarded the prestigious award from Gartner – Cool Vendors Award 2020. But that’s not all. Recently we presented a new project – SoPop.ai… This platform analyzes the posts of bloggers and determines the reaction of users to them. Like SoMin.ai, it helps companies find blogs that can be used for advertising purposes. In addition, SoPop.ai is partnering with Arival Bank to take the next step in platform development – creating a digital bank for influencers. Such an ecosystem for bloggers and companies will not only seek advertising opportunities, but also improve content. About the technologies on the basis of which the platform was developed – in this scientific article…
What’s next? Virtual friends, robots on the streets? Well … let’s see! One thing is clear – exciting tasks for machine learning labs will definitely not be less.
Our English-language habraposts on other topics:
Content marketing stamina – the easy way for founders to get ahead of their competition
The true cost of free labor – and how startup founders can find their way around it
PopMech and its ancestors: a foray into the history of tech
How startups can cut through passive-aggressive media
Startups going global: a guide to Startup Digest
Going global: a guide to Product Hunt
The founder’s guide to AngelList