ARKit, realityKit, Reality Composer Pro

My name is Ilya Proskuryakov, I am an iOS developer at the company Effective and in the article I will talk about developing games for Apple Vision Pro.

My colleagues and I developed two mini-games as part of the Ludum Dare hackathon in Omsk, and then I worked a little with Apple Vision Pro myself. Now I want to share my experience with examples and code, talk about the pros and cons of Apple Vision Pro from a developer's point of view, and in general, what difficulties I encountered and how I solved them.

In March, Effective CTO Alexey Korovyansky went to the US, bought this new gadget and brought it home. I had almost no experience interacting with augmented reality. First, I tried on the glasses as a user, and then I became interested in them as an iOS developer, and I wanted to develop something for them myself. It so happened that on April 13-14, a two-day hackathon for game development Ludum Dare was taking place in Omsk, and two colleagues and I decided to sign up for it.

Mini-reference:

Apple Vision Pro is an augmented and virtual reality headset based on the M2 chip. It was announced in 2023 and released on February 2, 2024. Now the headset costs $3.5 thousand in the US market, and twice as much in Russia.

The headset currently costs $3,500 on the American market, and twice as much on the Russian market.

The headset currently costs $3,500 on the American market, and twice as much on the Russian market.

What we did at the hackathon

We learned about the hackathon a week and a half beforehand, started going through ideas, and realized that we wanted to make something like OSU for eyes. OSU is a game in which the user must manage to click the mouse on a simple moving target. Since Apple Vision Pro can track eye movement, our game had to have eye control instead of mouse control. The hackathon also had a theme that all projects had to match — summoning. We learned that one of the participants was making a game in which you need to call a cat to feed it, got inspired, and decided to continue the cat theme in two mini-games for Apple Vision Pro.

To start the first game, the user puts on the headset and sees a vacuum cleaner in his hand. Pallas's cats appear in the space around the user, he sucks them up with the vacuum cleaner, and to make the event more meme-y, we added a Google voice that counts the cats. When the count reaches 10, a huge Pallas's cat appears and expresses dissatisfaction – initially it was supposed to come out of the portal, but we didn't have enough time and in the end it just spawned next to the user.

In the second game, the user needs to appease a large manul. Burgers and tomatoes fly around him in the space, and he catches the burgers with a special gesture. You can't catch tomatoes, otherwise the manul will get angry and eat the user.

All this was supposed to be two stages of one game, but we did not have time to combine them into one scenario and made two separate mini-games that can be launched from the menu.

We wrote everything in Swift, using the Swift UI, ARKit and RealityKit frameworks.

  • ARKit helps to track everything that is happening around: the user’s hand movements, different planes and the whole world as a whole.

  • RealityKit allows you to render 3D objects and interact with their physics, geometry, etc.

Essentially, ARKit is a kind of support for RealityKit on Vision OS.

Let's start with the menu

The design is not a fountain, but this is approximately what any 2D screen under Vision OS looks like. By default, the screen can be resized or moved: for example, if you place it in a room, leave it and return, the screen will remain in place.

This is what the menu looks like in code:

In fact, it's just a Swift UI framework. The developer codes in it for Vision OS the same way as if he were writing for iOS. There are the same VStack, modifiers, paddings, spacers and other things. For fun, I even ran this code on iOS, and nothing broke. So you can write the same code for different platforms, but functionally get the same result.

Game menu on iPhone and Apple Vision Pro

Game menu on iPhone and Apple Vision Pro

Creating a space for augmented reality

First of all, you need to declare an ImmersiveSpace – that is, an augmented space, and give it an ID, as shown in the left screenshot below.

In my observations, only one ImmersiveSpace can be open at a time per application in VisionOS (there is definitely only one per application). There are environment variables — openImmersiveSpace and dismissImmersiveSpace — that open and close the space, respectively. In openImmersiveSpace, you pass the desired ID corresponding to the space you want to open, and call the whole thing via the await operator.

Filling the space with content

In the ImmersiveSpace closure we see immersiveView — a standard Swift UI view, inside which lies the realityView object. The realityView in the closure has content — an inout parameter of the RealityViewContent type, in which you need to put your 3D models, as well as attachments — 2D views that are attached to 3D objects. For example, attachments can contain a counter of sucked-in manules on the vacuum cleaner handle. And here the update function is also important, which is called on frame change and allows changing the space over time.

RealityViewContent is a structure. To work with it, you will have to pass it by reference every time and this is inconvenient, so I will tell you how to simplify your life further.

In realityKit there is a classEntity that performs similar functions of filling the space with content. The scheme looks like this: you need to create a root empty 3D view rootEntity, put it in the content using the add(content.add(rootEntity)) method, and then put non-empty Entities in rootEntity that will contain your 3D models.

Apple has a tool called Reality Composer Pro designed to make it easier to prepare 3D content for VisionOS apps. It's similar to Unity's scene editor.

First, you need the objects you want to display — 3D models in USD format. Here we encountered one of the main problems when developing games for VisionOS: finding 3D models. Apple has a set of free assets, and although the choice is small, you can get something from there. You can buy a 3D model. This is expensive, the price varies from $ 2 to $ 2000 depending on the quality and functionality of the model. You can find a 3D designer or make the model yourself. The most affordable option is to find free assets on websites, in groups or Telegram chats.

Once you have found the 3D models you need, such as the Earth and the Moon, you need to import them into the scene. Simply drag them either to the panel on the left or directly into space. Then you need to position them on the scene. The position of an object is determined by coordinates on three axes – x, y, z. To understand where each coordinate is pointing, make the following gesture: thumb – x-axis, index – y-axis, and middle – z-axis.

I didn't know about this gesture, so I poked at positions at random, launched the project and checked if everything was ok. Apparently, due to the rotation of the portal, which was originally horizontal, the coordinate system turned 90 degrees from me. After a bunch of attempts, I found the positions of objects that satisfied me.

Reality Composer Pro also lets you resize, rotate, and unfold 3D models. You can add components to an object: lighting, shadows, collisions, physics, sounds. For example, if you want a bird to fly and chirp somewhere, you can add a sound to it and it will come straight from it.

We create a world inside the portal

To make a portal, you need to create:

  • The world that will be displayed inside the portal,

  • The essence of the portal is a black circle,

  • An anchor is an entity to which all objects are attached. The anchor can be the user's hands, the walls of the room, the floor, tables, and other things.

The developer puts the world into the portal, puts the portal itself into the anchor, and then puts all three entities into the content.

To make a world, you need to create an entity and set the appropriate property for it. The component responsible for it is called World Component. It separates everything that is inside the portal from the outside world, so thanks to it, the content will be located specifically in the portal.

Now we need to fill this whole thing with content. Here you can see the load function, in which we load the Solar System asset that we set up earlier in Reality Composer Pro.

We create a portal

The screenshot on the left is a portal made by Apple. There is even a “Code” section under the video, but I tried it and the code didn't work. In the end, we did it ourselves, and our result is on the right.

To create a portal, you need to make an entity and set up its Portal Component, into which we will place the world, and the Model Component — the appearance of this entity, that is, a black circle. This will be enough for it to work.

Adding special effects

We wanted to make not a simple, but a beautiful portal, so on line 65 we loaded the Particles asset – these are particles with which you can create various effects.

We return to Reality Composer Pro and add Particles. There are various assets here, such as fireworks and fogs. You can adjust how often the particles will pulse, their number, shape and color.

I spent an hour fiddling with this tool and got this:

In my opinion, it's not bad. At least the portal reminds you – that's already good. Next, you need to add this thing to the portal, and it will shine.

Attaching a vacuum cleaner to your hand

To attach the vacuum cleaner to the hand, you first need to load the assets. The vacuum cleaner should interact with the spinning cats. To do this, it needs to set up collisions via the Collision component, mask, and group. The mask is responsible for which groups the vacuum cleaner will interact with, and the group is responsible for which group the object belongs to. The Collision component is set by a bit mask. At the bottom, you see the Collision Group, we pass a bit operation there and get a bit mask.

Then you need to attach the vacuum cleaner to your hand. To do this, you first need to learn how to track the user's hands. Let's remember ARKit and create an ARKit session.

The session has a run method that takes an array of DataProviders as a parameter. In this case, we need a Hand Tracker Provider that tracks hands. Then we can track the state of the handTracking variable. We take the right hand from there and get its anchor with the originFromAnchorTransform parameter — the location of the hand relative to the world. Then we assign this location to the vacuum cleaner handle. We also need to configure where this handle will look via the Look method — because we set the position of the object via the position field, and the look method configures where the object will be directed.

We spin manuls

We created our own custom component — a structure that will conform to the Component protocol. We set the necessary fields in it and registered the component: we called the Register Component function somewhere once.

Now we need a system that can interact with this component. To do this, we created a class, formalized the System protocol, and it now has the ability to override the Update method. Update is called every time a frame is updated, and the frequency of its call depends on the frame update frequency in Vision OS. On glasses, it is approximately 90 Hz, so it will update 90 times per second. We need to find an entity that corresponds to a certain parameter – in this case, it has a Rotate Component, change its orientation parameter, and then the object will rotate.

Burgers and tomatoes

In terms of code, the second mini-game is simpler than the Pallas's cat game. To fill the world around the user with burgers and tomatoes, we had to call the Add Burger or Add Tomatoes functions as many times as we wanted burgers or tomatoes.

The Add Burger function is simple – load an asset with a burger model and add components:

  • The input target component allows you to interact with the user.

  • Hover effect – to make the object the user is looking at stand out from other objects. Like a button that the mouse cursor hovers over.

The object also needs to be given a position. It can be generated randomly: at the bottom right you can see a function that returns a SIMD3 structure that is responsible for the coordinate grid. In it we generate random values.

Next, we need to set up a standard gesture for Apple Vision Pro, where the index finger and thumb touch each other. This can be easily done via SpatialTapGesture(), the .targetedToAnyEntity() modifier allows this gesture to interact with any objects we put in the Immersive View.

Turn on the music

We generated ten assets with Google voice. When the vacuum cleaner sucks in the Pallas's cats, Google voice keeps count. In essence, the developer needs to fill the array of audio file resources via the load() function, then call the playAudio() function on the collision of the vacuum cleaner and the cat and pass the desired asset. Background music can be set in the standard way via AVAudioPlayer.

My colleagues and I managed to do all this at the hackathon in one productive night, and now I’ll tell you what I did alone.

After the hackathon

I had some time to work with the Apple Vision Pro in a relaxed environment. Since we didn't interact with physics in the hackathon minigames, I decided to do something like tennis.

Setting up physics

First, I needed to understand how physics works for objects. To do this, I loaded the corresponding asset and added a PhysicsMotionComponent to it, which is responsible for the movement of objects. Then I added collisions, a physical body that is responsible for the center of mass, elasticity coefficient, friction coefficient, and other things. This can be done both in Reality Composer Pro and in code.

I ended up with a virtual object that couldn't interact with the real world. If I just loaded it, it would fall and fly endlessly through walls and ceilings. I had to learn to track the reality around me.

Let's get back to ARKit and its session. As a data provider, I needed to give it SceneReconstructionProvider, which is responsible for tracking all the surfaces around. I threw it into the Run method so that anchors of walls, floors, and other things with the necessary parameters would come to sceneReconstruction updates. Using them, it would be possible to recreate the shape of this anchor, for example, a wall.

Then I created an entity in which I put the CollisionComponent component. It is made based on the form created above. I set the physical body and the location (transform) through the location of the incoming anchor. In fact, we cannot directly interact with real objects, but we can track their shape, location and create virtual copies with which our virtual objects can interact.

By the way, to be able to track all this, you need to get permission from the user, for which you need to create a pair of keys with a description of why we are getting this permission in info.plist.

We are trying to catch the ball

I had to grab a ball and a racket, and then hit one against the other. At first, I thought about attaching the racket to my hand like an anchor, like we did with the vacuum cleaner, and moving the ball with the SpatialTapGesture. But that was too simple – I wanted it to be like in reality.

I learned that I can track not only the hand, but also each joint of the hand. How do I do that? Create a dictionary whose key is the joint. And the value by key is a 3D entity that has a physical component and a CollisionShape component. In this way, I attached the 3D entity to the finger and taught it to interact with everything around it. In the same method where I tracked the hands, you can track the fingers, then there will be two coordinates – the hands relative to the world and the joints relative to the hand. By multiplying these, essentially, matrices, we get the coordinate of the joint relative to the world. And then we enter the resulting change into the dictionary mentioned above.

I thought that would be enough, but in the end the ball turned into slippery soap, slipped out of my fingers and did not want to stick to my hand. Later I was told that such collisions are not designed for such interactions.

Therefore, you need to look for a different approach, for example, recognize the gesture and then attach the ball to the hand.

Making capture gesture recognition

Apple has a great example – the game Happy Beam. In it, sad clouds fly towards the user, and he sends them rays of goodness with heart-shaped gestures so that they turn into happy white clouds.

I looked at how they track the gesture and made my own method. Here's how it works: I track the joints of the hands and their location in the same way, and then I take the thumb, index finger and ring finger and make sure that the distance between them is less than six centimeters – I picked it up by eye. This will resemble the grab gesture. You can come up with your own interpretation of the gesture, but mine generally works.

In addition to defining the gesture, it was necessary to make it work only at the moment of collision with the required object. That is, so that the user touches the ball, makes a gesture, and only then the ball is attached to his hand. To do this, I called the subscribe() method on the content that is passed in the RealityView clojure and subscribed to all collisions that occur in the application.

I called a function on this subscription, to which I passed whether there was a gesture or not, and looked in the function itself to see which objects interacted, because I needed to track specific ones out of all the collisions of all the objects. In the end, everything looked like this: some kind of collision occurred, I checked whether there was a gesture at that moment and whether the right objects — the hand and the ball — interacted. If everything matched, the ball was attached to the base of the hand.

As a result, the slippery soap turned into a slime that did not come off my hand. Unfortunately, that's all I managed to do. However, I do not plan to stop and will continue to try!

Apple Vision Pro's Disadvantages from a Developer's Perspective

It's not easy to find 3D models of objects for augmented reality

What are the options:

  • Find free models in Reality Composer Pro, on TurboSqiud or in Telegram channels – there are not many of them, but you can find something.

  • Buy 3D models, but depending on their quality, the cost can vary from two dollars to two thousand dollars.

  • Try making 3D models yourself or find a 3D designer.

Difficulties in testing software under development

At the hackathon, we had one Apple Vision Pro headset for three people, and our manuls need to be tested. The simulator is incomplete: yes, you can see how objects will look inside, twist them, turn them, and so on. However, the problem with the simulator is that it is impossible to track the real world. If you run an application in the simulator that tries to do this, it will crash. Therefore, if a product is being developed by a team of several people, only the one who is physically near the Apple Vision Pro will be able to test it fully.

Unpopular technology

There is still very little information and few examples on the Internet about development for Apple Vision Pro. There is, of course, documentation from Apple, but due to inexperience in 3D development, you can read about a function or parameter and still not understand what to do with it.

Both ARKit and RealityKit are in beta. I took the code from Apple's documentation, used it, and found that it didn't work. I went to the forums and found that Apple had reworked the feature, but hadn't had time to update the documentation.

On the other hand, you can look at it as a plus, spend more time, figure it out yourself and become a pioneer.

Benefits of Apple Vision Pro from a Developer's Perspective

  • It's interesting because developing for augmented reality is a blast.

  • Declarative approach. Many functions that are difficult from an engineering point of view, such as loading a 3D object and placing it in augmented reality, can be done in a few lines of code.

  • Practical experience in 3D development – you understand how to work with physics, how to place objects in 3D space, etc.

In a month of experimenting with the Apple Vision Pro, I've only scratched the surface. What else is worth trying:

  • Shader Graph in Reality Composer Pro. It allows you to build graphs and create beautiful effects and objects using nodes.

  • Full Immersive virtual reality applications using the Metal framework. This is a framework that allows you to interact with the GPU and draw beautiful things.

  • Development on Unity for Vision OS.

I will continue to experiment with Apple Vision Pro and post them on our Telegram channel @effectiveband. By the way, it already contains a lot of materials on iOS, Android, Flutter, Web development and solution architecture.

I would appreciate your comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *