What can the new Claude 3.5 Sonnet model do?

We have not yet moved away from GPT-4o, when Anthropic bursts onto the scene with a new model, Claude 3.5 Sonnet, which, according to the developers, is superior to GPT-4o. Anthropic is once again pushing the boundaries of what AI can do.

In this article I would like to evaluate the capabilities of the new model and, of course, check whether they lied to us about the functionality of Claude 3.5 Sonnet.

Claude 3.5 Sonnet. What's new?

Meet Claude 3.5 Sonnet, the newest member of Anthropic's AI family. This model is designed to understand and generate text even better than its predecessors.

What's new? Claude 3.5 Sonnet is twice as fast as Claude Opus, has better reasoning, and has advanced visual perception abilities. It is more accurate, faster and more reliable than previous versions.

Anthropic aims for Sonnet to directly compete with OpenAI's GPT-4 and hopes users will appreciate its new capabilities. The model is already superior to Anthropic's own Claude 3 Opus in many respects, including speed, cost and test results.

The model sets new standards in areas such as graduate level reasoning (GPQA), undergraduate level knowledge (MMLU) and programming skills (HumanEval):

Do you know what else is remarkable about Claude 3.5 Sonnet? He learned to understand nuance, humor, and even cope with complex instructions.

Now you don’t have to worry that the text will sound dry and lifeless: Sonnet writes quite naturally and captivatingly.

The situation with the code is no worse: internal tests showed that Claude 3.5 Sonnet solved 64% of problems, leaving far behind its predecessor Claude 3 Opus (which managed only 38%).

Sonnet writes, edits, and executes code like it's child's play to him. Translating code, updating old programs, migrating databases – he can do it all.

Claude 3.5 Sonnet is also a visualization master. Anthropic has outdone themselves: this model works even better with images than the Claude 3 Opus.

Imagine: Sonnet doesn't just “see” a picture, it analyzes charts and graphs, understands what they show, and can even recognize text in fuzzy photos.

Claude 3.5 Sonnet can generate interactive charts and even create entire presentations based on JSON data.

And the most important feature is Artifacts (analogous to Advanced Data Analysis in ChatGPT). Imagine asking Claude to generate code, write text, or even design a website. Instead of simply producing results, Sonnet creates an Artifact, an interactive object that you can work with directly in the chat.

Do you want to fix the code, edit the text or change the design? Please! Artifacts turns communication with Claude into an exciting creative process where you and artificial intelligence work side by side.

Speaking of cost, using the model will cost $3 per million input tokens and $15 per million output tokens. The context window is 200 thousand tokens.

It's important to note that Claude 3.5 Sonnet is just the first step. In the near future, Anthropic plans to release Claude 3.5 Haiku and Claude 3.5 Opus, which will be even more impressive.

Full list of Claude models:

By the way, external experts have confirmed that Claude 3.5 Sonnet meets all security standards.

Let's evaluate it ourselves

Poetry

Words are, of course, good, but testing them in practice is even better.

First, let's see how well the model writes poetry. I will ask several models to write poems based on Brodsky. For such tests I will use BotHub, due to some difficulties with the official website, and also due to the fact that the model is more “subordinate” via the API. Let's start with the newcomer:

In this poem you really feel the atmosphere of loneliness and melancholy, so characteristic of Brodsky. Laconic and precise images also resemble his style.

However, the poem lacks the depth and multi-layeredness inherent in Brodsky's poetry. The theme of loneliness is presented too straightforwardly, without its characteristic irony and philosophical thoughtfulness.

But overall, the poem sounds good!

Let's compare with its predecessor:

The theme of loneliness, the search for the meaning of life, turning to books – all this is very consonant with Brodsky’s poetry. The atmosphere of thoughtfulness and reflection is also conveyed very accurately.

Now let's look at GPT-4o:

Here we see an attempt to create philosophical imagery, the use of extended metaphors, and a contemplative mood. But the verse is overloaded with images that do not always work for the general idea. There is too much pathos and straightforwardness in the expression of feelings, which you practically cannot find in Brodsky.

And, for example, let's evaluate Gemini 1.5 Pro:

The image of the city, especially St. Petersburg, is often found in Brodsky. There is the same gloomy, autumn atmosphere here. What is missing is Brodsky's characteristic intellectuality and complexity of language. The rhyme is simple, and the image of the lyrical hero is stereotyped.

In general, you need to understand, of course, that imitation of style is not only the use of certain images or themes, but also the ability to think and feel the same way as a poet, which AI cannot yet do. However, it is Claude who does the best writing and I like Sonnet much more due to its brevity.

Screen page code

I will submit a screenshot of the page as input and ask each of the models to write code for it.

Claude 3.5 Sonnet:

Сlaude 3 Opus:

ChatGPT-4o:

Gemini 1.5 Pro:

So, the results are before your eyes. Overall, Sonnet performs well, especially in comparison with its predecessor. From a visual point of view, Gemini and Sonnet did the best job, in my opinion.

Artifacts

I really wanted to show my work with Artifacts, but my account was constantly blocked, which is why I couldn’t prepare the material (don’t throw it at me, I honestly tried to do everything in a good way, but now I’m tilted), but still show examples of ordinary users I have the opportunity.

For example, here user asked the model to create a map of a fantasy world fallen to old magic, with cultural and military considerations:

Or here the same user asked for an interactive application demonstrating the central limit theorem:

Or, for example, here a user asked to visualize Deep Learning:


Thus, I went over the new model with you, which left a positive impression. Of course, the game of thrones in the field of generative models is only gaining momentum: companies are trying to jump over their heads in order to outplay their competitors, and we are left to watch.

Overall, it's very sad that I couldn't give you my results with Artifacts, since it's almost the main feature in the update, but I'll be happy to see your results in the comments.

Thank you for your attention!

Exactly! The model itself is available Here.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *