In the wake of the hype around AI, everyone and everything speaks out, somewhat reminding me of the situation around Bitcoin, when seemingly respected people suddenly began to say exactly the opposite things about it (cryptocurrencies are useful – AI is useful / ban cryptocurrencies – ban AI). I want to bring a drop of rationality into this stream of thoughts and tell
how not to how you can use large language models on the example of ChatGPT version 4.0 in scientific work. This post is based on scientific article that I published in collaboration with V.L. Makarov And A.R. Bakhtizin.
A Little Introduction
The same GPT from the name of the model is the Generative Pre-Trained Transformer. Or, in Russian: autoregressive generative language model on transformer architecture.
language model needed to understand and generate natural language content. They are generating (generative) and discriminant.
Generative Models – statistical models, which are based on the analysis of the data themselves and allow you to create new data instances. Discriminant Models the models are also statistical, but they solve the problem of data classification.
The development of deep learning has led to the widespread use of various neural networks for solving problems. natural language processing (NLP), including convolutional neural networks (CNN) recurrent neural networks (RNN), Graph-Based Neural Networks (GNN) and attention mechanisms. A key advantage of these neural models is their ability to simplify model development in some way. Traditional non-neural approaches to the NLP problem depend on discrete, hand-crafted features, while neural network methods typically use low-dimensional and dense vectors to implicitly represent syntactic or semantic aspects of a language.
Transformer – the architecture of deep neural networks, which, like RNN, are designed for processing texts in natural language, translation, summarization, but do not require text processing in order. Which opens up excellent opportunities for parallelizing its work.
A large number of recent studies have shown that pre-trained models (pre-trained models – PTM) on large text corpora can learn universal language representations that are useful for subsequent NLP tasks and avoid learning a new model from scratch. With the development of computing power, the advent of deep models, and the continuous improvement of learning skills, the PTM architecture has moved from shallow to deep.
Generations of PTM models
First generation PTM sought to obtain a vector representation of words. Since these models are not needed for subsequent language processing tasks, they are usually very shallow in terms of computational efficiency. Examples of such models are Skip Gram And Glove. The resulting vectors, although they can convey the semantic meanings of words, are context independent and cannot capture higher level concepts.
PTM third generation are based on the second generation, with increased performance and the removal of some restrictions. There is no clear definition of this generation or list of models, but the following characteristics can be distinguished:
Improved understanding of context, capturing complex semantic relationships.
Multimodal Learning: Can integrate information from multiple sources or modalities such as text, images, and audio.
Scalability and Efficiency: Improve performance, such as through model compression techniques or more efficient architectures.
More complex pre-learning tasks: better capture of the linguistic and structural properties of the input data.
The newest is fourth generation PTM models. In addition to the main achievements of the third generation, the following is added:
More training data and more parameters allow them to capture a wider range of language patterns and nuances.
Improved comprehension and generation capabilities, resulting in more coherent and contextually accurate text creation.
Improved fine-tuning and training transfer. You can increase productivity for a specific task, such as translation, abstracting, and answering questions.
Increased scalability and efficiency.
At the moment, only GPT-4 from OpenAI belongs to the fourth generation of PTM models, and the most common models of the third generation are BERT and LLama.
Let’s consider these models not just as a universal tool, but as a means and method for performing certain tasks of scientific work. Obviously, there are both serious limitations to the use of such a tool, and exciting new opportunities to help scientists and researchers in various fields of science.
Possibilities of using ChatGPT in scientific work
This topic is now hotly debated in scientific circles. But due to the inertia of the classical process of publications in science, there are still not many high-quality works in this area. In general, multidirectional publications on medical topics predominate, as well as a discussion of the problem of plagiarism. You can read a little more about this in published scientific version of the article.
I use only the fourth generation of the model, because. GPT-3.5 failed too many of my tests. The most failed, in my opinion, was the logic test from the meme below, moreover, in different variations. For example, if 2 cars drive from Moscow to St. Petersburg in 5 hours, then 4 cars will take 2.5 hours, according to the GPT-3.5 inference.
I tried to apply ChatGPT in various fields of science: medicine, chemistry, physics, biology, but I want to illustrate clearly with an example that relates to the area of my scientific interests (agent-based modeling). Let’s imagine that I am writing a review article about large agent-based country models, and I heard that there is such an economic model of Switzerland called MIMOSE. I ask ChatGPT: tell me in which scientific articles this model is described for later reference in your work:
It is important to mention that this question was asked in the context of a conversation that started with a few questions about large agent-based models of countries and regions, and ChatGPT has already provided a correct description of MIMOSE on its own. Here is the full text of the answer, which looks very convincing, logical and connected.
In the first sentence, ChatGPT presents data from the original paper (Bretschger, Smulders 2012) that describes the MIMOSE model. Such an article does exist and has the same title, but it was published in 2003, not 2012, and not at all in the Scandinavian Journal of Economics, and in general, although it is an article on agent-based modeling, it is not at all does not affect the Swiss model called MIMOSE. But read how categorically ChatGPT claims otherwise!
The next article from the list (Bretschger, Valente 2012) also exists in reality, also has nothing to do with MIMOSE, the year of publication is correct, but again the model made a mistake with the journal indicating another real-life academic journal Resource and Energy Economics, instead of the Journal of Environmental Economics and management. Articles authored by Karydas, Katsikas et al could not be found at all, but even judging by the title, it does not correspond to the stated topic of the question. The final reference to an article by the authors of Rutherford, Tarr is also a fiction of the system, although such authors exist in reality and publish articles on similar topics.
In an attempt to give ChatGPT a face-saving opportunity, she was asked to point to an article that describes the MIMOSE model itself:
The question was asked within the same conversation with context, and there were no further conversations between them. To which the following response was received:
ChatGPT sorry for his mistake, although neither request mentions that the information given is incorrect. And again he issues a new variation on the article by Bretschger et al, stating that it provides a comprehensive overview of the agent-based MIMOSE model, the features of its implementation, the structure of the model, including the types of agents used. Once again, I focus on the confidence with which absolutely false information is presented, even after a clarifying question.
The main value lies not only in the ability of ChatGPT to rewrite, paraphrase and translate texts. The first two points generally only harm society, and Google Translate has been doing a good job with translation for a long time. The true value of ChatGPT is in the analysis of a huge array of information and the search for the part that the researcher needs, and explaining it in the “language” that is required.
ChatGPT should not and cannot be the author or co-author of any scientific work! This system is a tool in the work of a scientist. The same as, for example, search engines Google, Yandex, more advanced systems like Wolfram Alpha, citation and indexing systems RSCI, Web of Science and Scopus. The author must always check and take responsibility for the material that he creates, regardless of the tool he uses.
I specifically gave negative examples to warn overly gullible people to use the results “as is”. And not all reviewers scrupulously check the list of publications for accuracy.
It makes no sense to give positive examples – there are countless of them, and each of you (if you have access, of course) can ask the right questions himself. In preparing this article, of course, ChatGPT was used, not only as an object of study, but also as a study tool. But, only to clarify certain issues, to obtain additional information, and not to write (or rewrite) the text. The question of whether it is possible to distinguish between text written by ChatGPT and text written by a person is still open.