Symbiosis of numbers and art (part 2)

Fig. 1 The flagship of world cinema
Fig. 1 The flagship of world cinema

In continuation of the post about creating an algorithm for recognizing paintings by artists, I want to share one thought. Artificial Intelligence, as I always imagined it, was a kind of mind, a rational machine for resolving the questions and tasks set by a person. Whether it is a script with initial data, or a voice assistant, it is ready to decrypt and analyze incoming information and give an answer, even if it is fundamentally wrong. Just statistically, this answer was the most correct for a certain period of time (data array). Those. in most algorithms, a systematic approach to data processing dominates (by analogy, by logic, by most coincidences, etc.). How would I be delighted if I saw somewhere an “illogical” irrational AI assistant that gave out a strange, but most importantly correct option from a variety of possible ones, so to speak, would hit the bull’s-eye. For example, I would like to watch a movie in the evening, but I would definitely like this movie. I ask the online assistant a question, and what does it give out? It comes up with something that a lot of people have watched and liked before, or it comes up with a highly rated movie of a certain genre that I asked for but has nothing to do with what I really like. Of course, I know which films I liked. I climb into the search engine, and I type a movie like … or movies similar to … after which I see three or four portals with endless lists of films. I enter and discover that these films, firstly, are far from each other in their essence, and secondly, I might like them, but from a completely different angle, as if I saw them by accidentally clicking the TV remote control and decided to stop at them.

According to most of the selection criteria in this case, they are by no means banal things: the actors who starred in my film, which I designated as “what you need”, the genre, and even the era of creation of such a movie (70s, 80s, or completely modern cinema) . Sometimes things are better, the selection of the options you are looking for suggests a cool completed plot or an art-house atmosphere of the selected films, a location with an era not of creation, but of coverage of events in such a movie (Victorian England or the near future as an example). Here things are already better, and such options usually “go” to me, but again, not as deep as the film that I designated as a standard. And in life, it is the movie that enters the soul that comes to me as a rule by chance, unplanned, giving free rein to chance and imagination. Neural networks of the brain optimally select content for viewing (or listening) based on a huge layer of already available information, desires, needs of the individual. They themselves find what they need in the boundless sea of ​​text, video and audio content and sharpen our attention on an unexpected day and hour so that we listen to it. I am not a futurist, but I see the near future with a more advanced level of trainable artificial intelligence that would solve such problems with a high probability by loading into the source data, for example, all the films and the date of their creation that you can remember, at least 100. Then he himself would have decomposed them into inherent criteria and assessments, and having downloaded information from online databases on films, he would have chosen the most optimal options for today.

But I digress, in continuation of the last post about creating a program for interpreting paintings by artists and photo retouching and assigning a rating based on 1000 criteria, I will continue its description. Let me remind you that I had the idea to create a platform for digitizing incoming images, fully recognizing content and assigning a unique rating based on a growing array of data that has passed through this algorithm.

Fig.2 Block diagram with classes and criteria for selecting images.
Fig.2 Block diagram with classes and criteria for selecting images.
# установить устройство для вычислений / set computation device
device = ('cuda:0' if torch.cuda.is_available() else 'cpu')
print(f"Computation device: {device}")

# создание модели для обучения / create model
m = effnetv2_s(num_classes = 1)
for param in m.parameters(): 
    param.requires_grad = True

# загрузка последних весов сети / load the last weights of model
if os.path.exists(opt.best_weights):
    try:
        m.load_state_dict(torch.load(opt.best_weights,map_location = device))
        print("The pre-trained network has been loaded!")
    except Exception:
        print("The pre-trained network has not been loaded!")
m.to(device)
m.eval()

Фрагмент скрипта алгоритма распознавания.

The essence of this model is that all calculations are carried out sequentially, this sequence will need to be changed in order to achieve the most ideal recognition of incoming pictures. With all the abundance of standard image parameters (number of pixels, color saturation, contrast, sharpness), the most interesting criteria in my case are those of fine art. And it is not limited only to the styles of the work (still life, landscape, portrait, sketch, abstraction ..) but also to a host of others. What did they draw or photograph (instrument), what materials did they use (paints, charcoal, mosaic, photoshop, camera, finally), what technique did they use, a lot of comparative extras. criteria (aka Rembrandt, aka Wangog, aka Michelangelo) can also be applied. Separate criteria can be added both at the beginning of the cycle, and in any other place, inserting it into a sequential chain of calculations. To begin with, I limited myself to only 3: this is the presence of color, a sign of a photograph, a sign of the presence of living objects in the image. They are arranged in order of complexity of calculations from simple to complex and are maximally simplified by the fact that there is an alternative class in each of the calculations (undefined). And each criterion, depending on a certain worked out hierarchy, will add a certain number of points for the final calculation of the rating (on a 100 or 10 point scale).

With the finished script, everything was just beginning, and after several tests, I realized that it would take years. Testing began with the first part of the module, which was to determine whether the source file was a photograph or something else. A long and consistent series of testing of the model began, which will show the best sequence of recognitions and parameters for a given set of criteria. I chose the algorithm from simple to complex by assumption, in my case, color-non-color, photo-non-photo and whether there is a living thing in the picture or not. I (thank God) documented the results, do not blame me for the lack of system of records.

First test> ran 44 files, one file was rejected for some unknown reason, there were 8 non-photos in the photo folder, and most of them are scanned canvases of real paintings. And in the nfoto folder there were 2.5 errors (0.5 is an error, but at the same time controversial – there is a cup of coffee photographed, but retouched in Photoshop). The script did not process a couple of pictures at all, the reason for this remained a mystery, because after many updates these files were still accepted.

Second test> ran 301 images, 44 erroneous results in the foto folder (most of them are phantasmagoria with high-quality animation and scans of famous paintings by artists. There are only 2 erroneous results in the nfoto folder. The same picture of a man (photo from a magazine) and a low-resolution photo of a car. The result this test – there is a bias of erroneous interpretations towards non-photo.

Third test> 3940 photos, 3461 non-photos

COLOR. gave 29 errors out of 1929>1.5%

Ch.W gave 0 errors

UNDEFINITION returned 98 files out of 2415>4.05% (absolute B&W images)

LIVE gave 16 errors out of 1313>1.2% (shells and sea plants, beach)

I decided to continue the test and play around with one wonderful parameter, called confidence level, set to 0.85 by default. This parameter is responsible for the degree of “confidence” of the model in making a classification decision. The closer this parameter is to 1 (max value), the more confidently the script will scatter files into folders, less weighted, so to speak. Let’s see how this will be displayed in practice.

print("Start with parameters:")  
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # папка с исходными данными / input folder with data
    parser.add_argument('--input_folder', type=str, default="input")
    # папка с выходными данными / output folder with data
    parser.add_argument('--output_folder', type=str, default="output")
    # файл для весов лучшей модели / best weights file
    parser.add_argument('--best_weights', type=str, default="./best_weights.pth")
    # размер изображения / image size
    parser.add_argument('--image_size', type=int, default=528)
    # доверительная вероятность / confidence level
    parser.add_argument('--conf_level', type=float, default=0.85)
    # парсинг параметров / parsing of parameters
    opt = parser.parse_args()
print(opt)

Фрагмент кода разпознавающего скрипта eval.py с выделением параметра conf_level

Fourth test> With the conf_level parameter increased to 0.9 (instead of the default 0.85):

LIVE (ERRORS) > 12 OF 1293 (0.92%)

NON-LIFE (ERRORS) > 36 OF 470 (7.6%)

UNCERTAIN > 397 out of 2175 (18.25%)

Fifth test> With conf_level set to 0.95:

LIVE (ERRORS) > 5 OF 1253 (0.39%)

NON-LIFE (ERRORS) > 25 OF 398 (6.28%)

UNCERTAIN > 509 out of 2175 (23.4%)

Sixth test> With conf_level set to 1:

LIVE > 0 files

NON-LIVE > 0 files

UNDETERMINATE > 2160 out of 2175 files, all read

The trend is clear, the higher this parameter is set from the given point, the more files are sent to the “bin”. But along with this, the number of erroneous recognitions is significantly reduced, reaching 0 errors (“there is nothing to make mistakes, everything is rejected”). I left conf_level by default for now, but in the future, when there will be a lot of criteria and, accordingly, errors, I will probably raise it to 0.9.

PS I continue to work on testing this module, and I invite everyone who is not indifferent to write in the comments your wishes for recognition and for the platform as a whole, feedback and suggestions. I think there will be like-minded people who will correct my mistakes and contribute their ideas. To be continued..

Similar Posts

Leave a Reply