Macros in tentacli. Part one

Since the publication of the first two articles, my project has changed its name and concept. It is now called TentaCLI and this name is a play on words tentacle And clifully reflects the new essence of the project. Although tentacli can still be downloaded from github and be used as a separate client application, it and its parts as well available in the form of crates. The embeddability, as well as the ability to add your own modules to tentacli makes it suitable for creating your own applications. In particular, I have two of them: a mini wow server for testing tine and the hidden project binary army, in which tentacli fully reveals its potential as a tentacle-performer – and for which I write the heart to control.

And the heart of tentacli is reading and processing TCP packets and to make working with them easier I use macros.


Motivation

Long ago, in the distant first version, parsing and creating packages was a very tedious task:

// чтение пакета с опкодом SMSG_MESSAGECHAT

let mut reader = Cursor::new(input.data.as_ref().unwrap()[4..].to_vec());
let message_type = reader.read_u8()?;
let language = reader.read_u32::<LittleEndian>()?;

let sender_guid = reader.read_u64::<LittleEndian>()?;
// skip 
reader.read_u32::<LittleEndian>()?;

// условное поле раз
let mut channel_name = Vec::new();
if message_type == MessageType::CHANNEL {
    reader.read_until(0, &mut channel_name)?;
}

let channel_name = match channel_name.is_empty() {
    true => String::new(),
    false => {
        String::from_utf8(
            channel_name[..(channel_name.len() - 1) as usize].to_owned()
        ).unwrap()
    },
};

let target_guid = reader.read_u64::<LittleEndian>()?;
let size = reader.read_u32::<LittleEndian>()?;

// условное поле два
let mut message = vec![0u8; (size - 1) as usize];
reader.read_exact(&mut message)?;

let message = String::from_utf8_lossy(&message);

You might think that this is special code for punishments, and in fact it was – scaling such code was quite a challenge. As a result, handlers were implemented only for the most basic packages. But this situation was destined to change.

The beginning of the great transition

During my research I found kraits serde And bincode. However, the concept I had in mind couldn't be implemented using these crates – I needed conditional deserialization. The code example above is ideal for reflecting the problem, since it presents two cases of conditional deserialization at once: when the field (channel_name) can be read only if a certain condition is met and when reading the field (message) depends on the previously read field (size). I was thinking about the most concise form of description of such fields.

The result of my experiments and research, as well as significant help from the official Rust community, was this macro – which replaced the code above:

#[derive(WorldPacket, Serialize)]
struct Incoming {
    message_type: u8,
    language: u32,
    sender_guid: u64,
    skip: u32,
    #[conditional]
    channel_name: String,
    target_guid: u64,
    message_length: u32,
    #[depends_on(message_length)]
    message: String,
}

impl Incoming {
    fn channel_name(instance: &mut Self) -> bool {
        instance.message_type == MessageType::CHANNEL
    }
}

Macro device

Now let's take a look at how it works. The foundation for reading/writing data is the trait BinaryConverter :

pub trait BinaryConverter {
    fn write_into(&mut self, buffer: &mut Vec<u8>) -> AnyResult<()>;

    fn read_from<R: BufRead>(
      reader: &mut R, 
      dependencies: &mut Vec<u8>
    ) -> AnyResult<Self> where Self: Sized;
}

I implement this trait for each type that I want to use in serializer fields:

impl BinaryConverter for u8 {
    fn write_into(&mut self, buffer: &mut Vec<u8>) -> AnyResult<()> {
        buffer.write_u8(*self).map_err(|e| FieldError::CannotWrite(e, "u8".to_string()).into())
    }

    fn read_from<R: BufRead>(reader: &mut R, _: &mut Vec<u8>) -> AnyResult<Self> {
        reader.read_u8().map_err(|e| FieldError::CannotRead(e, "u8".to_string()).into())
    }
}

In some cases a little more code is required, for example for the lines:

impl BinaryConverter for String {
  fn write_into(&mut self, buffer: &mut Vec<u8>) -> AnyResult<()> {
    buffer.write_all(self.as_bytes())
      .map_err(|e| FieldError::CannotWrite(e, "String".to_string()))?;

    Ok(())
  }

  fn read_from<R: BufRead>(
    reader: &mut R,
    dependencies: &mut Vec<u8>
  ) -> AnyResult<Self> {
    let mut cursor = Cursor::new(dependencies.to_vec());

    let size = match dependencies.len() {
      1 => ReadBytesExt::read_u8(&mut cursor)
            .map_err(|e| FieldError::CannotRead(e, "String u8 size".to_string()))? as usize,
            2 => ReadBytesExt::read_u16::<LittleEndian>(&mut cursor)
                .map_err(|e| FieldError::CannotRead(e, "String u16 size".to_string()))? as usize,
            4 => ReadBytesExt::read_u32::<LittleEndian>(&mut cursor)
                .map_err(|e| FieldError::CannotRead(e, "String u32 size".to_string()))? as usize,
            _ => 0,
        };

        let buffer = if size > 0 {
            let mut buffer = vec![0u8; size];
            reader.read_exact(&mut buffer)
                .map_err(|e| FieldError::CannotRead(e, "String".to_string()))?;
            buffer
        } else {
            let mut buffer = vec![];
            reader.read_until(0, &mut buffer)
                .map_err(|e| FieldError::CannotRead(e, "String".to_string()))?;
            buffer
        };

        let string = String::from_utf8(buffer)
            .map_err(|e| FieldError::InvalidString(e, "String".to_string()))?;

        Ok(string.trim_end_matches(char::from(0)).to_string())
    }
}

The same is true for custom types. This allows you to use the type when declaring a serializer Player directly as a type for the field:

// пакет с опкодом SMSG_CHAR_ENUM
#[derive(WorldPacket, Serialize, Debug)]
struct Incoming {
    characters_count: u8,
    #[depends_on(characters_count)]
    characters: Vec<Player>,
}

Now, when receiving a packet from the server, using the serializer from the example above, we can read the list of characters into a variable characters:

let (Incoming { characters, .. }, json) = Incoming::from_binary(&input.data)?;

Method from_binary returns a tuple of two elements – an instance of the current struct and json representation of its fields.

Let's consider where this method came from and what trait has to do with it BinaryConverter.

The inside of the serializer

There are two macros: one for the Login server, the second for the World server. But we will not choose between them and will consider only one, since they are very similar.

#[proc_macro_derive(WorldPacket, attributes(depends_on, conditional))]
pub fn world_packet(input: TokenStream) -> TokenStream {
  let ItemStruct { ident, fields, .. } = parse_macro_input!(input);

  // формируем список полей
  // формируем список зависимостей для полей
  // формируем список значений
  // формируем то, что вернет макрос

  TokenStream::from(output) 
}

Any proc-macro most likely it will look something like this.

I would like to start from the end, namely, with an explanation of what it is output. In short, it is a variable that contains code wrapped in a macro. quote!. That is, in order to structto which I apply my macro, received a certain method, let's call it from_binaryyou will need to add the following lines to this variable:

#[proc_macro_derive(WorldPacket, attributes(depends_on, conditional))]
pub fn world_packet(input: TokenStream) -> TokenStream {
  let ItemStruct { ident, fields, .. } = parse_macro_input!(input);

  // формируем список полей
  // формируем список зависимостей для полей
  // формируем список значений
  
  let output = quote! {
    impl #ident {
      pub fn from_binary(buffer: &[u8]) -> #result<(Self, String)> {
        println!("It works !");
        // а здесь нужно вернуть результат
      }
    }
  };

  TokenStream::from(output) 
}

In the code above ident – This identifier Togo structto which the macro is applied. The pound sign is used for interpolation of expressions – so in the context of the current series of examples, ident means Incomingas if I wrote:

impl Incoming {
  pub fn from_binary(buffer: &[u8]) -> AnyResult<(Self, String)> {
    println!("It works !");
    // а здесь нужно вернуть результат
  }
}

In addition to variables, you can also interpolate imports, for example, the variableresult – it's nothing more than quote!(anyhow::Result).

Now let's add the formation of a list of fields and a list of values. Since the task of the method from_binary – to form a struct from a byte packet (well, and also json), there needs to be something like this inside the method:

let binary_converter = quote!(tentacli_traits::BinaryConverter);
let cursor = quote!(std::io::Cursor);

let output = quote! {
  impl #ident {
    pub fn from_binary(buffer: &[u8]) -> #result<(Self, String)> {
      println!("It works !");

      let mut reader = #cursor::new(buffer);
      let json = String::new();

      let instance = Self {
        characters_count: #binary_converter::read_from(&mut reader, &mut vec![]),
        characters: #binary_converter::read_from(&mut reader, &mut vec![]),
      };

      Ok((instance, json))
    }
  }
};

This code creates a one-time macro.

Now we need to make it process any set of fields:

// эту строку я уже указывал в примерах выше, но просто добавлю ее
// для ясности - откуда взялся fields
let ItemStruct { ident, fields, .. } = parse_macro_input!(input);

let field_names = fields.iter().map(|f| {
  // в этом случае ident - это уже идентификатор поля !
  f.ident.clone()
}).collect::<Vec<Option<Ident>>>();

let initializers = fields.iter()
  .map(|f| {
    let field_name = f.ident.clone();
    let field_type = f.ty.clone();

    quote! {
      {
        let value: #field_type = #binary_converter::read_from(&mut reader, &mut vec![])?;
        value
      }
    }
});

let binary_converter = quote!(tentacli_traits::BinaryConverter);
let cursor = quote!(std::io::Cursor);

let output = quote! {
  impl #ident {
    pub fn from_binary(buffer: &[u8]) -> #result<(Self, String)> {
      println!("It works !");

      let mut reader = #cursor::new(buffer);
      let json = String::new();

      // а теперь магия развертывания
      let mut instance = Self {
        #(#field_names: #initializers),*
      };

      Ok((instance, json))
    }
  }
};

Here is the code (deployment):

let mut instance = Self {
  #(#field_names: #initializers),*
};

The compiler will transform it into something like this:

let mut instance = Self {
  field1: {
    let value: i32 = binary_converter::read_from(&mut reader, &mut vec![])?;
    value
  },
  field2: {
    let value: String = binary_converter::read_from(&mut reader, &mut vec![])?;
    value
  },
  // ...
};

That is, in other words, due to the deployment, a comparison of each element from field_names element with the same ordinal number from initializers then each pair is substituted into Self – and is separated by a comma.

depends_on and conditional attributes

To form a list of fields that contain the given attributes, you can use a regular vector or some hashmap/btreemap:

// этот struct объявлен вне макроса
struct DependsOnAttribute {
  pub name: Ident,
}

impl Parse for DependsOnAttribute {
  fn parse(input: ParseStream) -> syn::Result<Self> {
    let name: Ident = input.parse()?;

    Ok(Self { name })
  }
}

// дальнейший код уже внутри макроса
let mut depends_on: BTreeMap<Option<Ident>, Vec<Ident>> = BTreeMap::new();
let mut conditional: Vec<Option<Ident>> = vec![];

for field in fields.iter() {
  let ident = field.ident.clone();

  if field.attrs.iter().any(|attr| attr.path().is_ident("depends_on")) {
    let mut dependencies: Vec<Ident> = vec![];

    field.attrs.iter().for_each(|attr| {
      if attr.path().is_ident("depends_on") {
        let parsed_attrs = attr.parse_args_with(
          Punctuated::<DependsOnAttribute, Token![,]>::parse_terminated
        ).unwrap();

        for a in parsed_attrs {
          dependencies.push(a.name);
        }
      }
    });

    depends_on.insert(ident.clone(), dependencies);
  }

  if field.attrs.iter().any(|attr| attr.path().is_ident("conditional")) {
    conditional.push(ident);
  }
}

Using variables depends_on And conditional we simply form lists of identifiers that will be used later (see the end of the article).

But before we move on to the final phase, I want to consider one more thing.

At one time parse_terminated and attribute parsing in general caused me a lot of questions and misunderstandings, so let's look at it in more detail with examples.

How to parse attributes at all

Method parse_terminated takes two generic parameters: what we are looking for and what it is separated by (separator).

First, let's make a macro whose attribute will accept a list of numbers, which can then be output to the console:

#[derive(Simple)]
#[numbers(1, 2, 3, 4)]
struct MyStruct;

fn main() {
    MyStruct::output()
}

// и код макроса:
#[proc_macro_derive(Simple, attributes(numbers))]
pub fn simple(input: TokenStream) -> TokenStream {
  let DeriveInput { ident, attrs, .. } = parse_macro_input!(input);

  let mut numbers = vec![];

  for attr in attrs {
    if attr.path().is_ident("numbers") {
      let number_list = attr.parse_args_with(
        Punctuated::<LitInt, Token![,]>::parse_terminated
      ).unwrap();

      for number in number_list {
        numbers.push(number.base10_parse::<i32>().unwrap());
      }
    }
  }

  // поскольку на вектор при интерполяции накладываются некоторые ограничения
  // для вывода мы можем предварительно привести его к строке
  let numbers_str = format!("{:?}", numbers);

  let output = quote! {
    impl #ident {
      pub fn output() {
        println!("{:?}", #numbers_str);
        // либо можно вывести вектор вот так: 
        println!("{:?}", [ #( #numbers ),* ]);
      }
    }
  };

  TokenStream::from(output)
}

We parse each attribute using attr.parse_args_withwhich takes a parser as a parameter. Actually, the parser in our case is the above-mentioned parse_terminated .

You can refine the parsing process a little and create a custom struct:

struct NumberList {
  numbers: Punctuated<LitInt, Token![,]>,
}

impl syn::parse::Parse for NumberList {
  fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> {
    Ok(NumberList {
      numbers: Punctuated::parse_terminated(input)?
    })
  }
}

// и в самом макросе number_list будет читаться как-то так:
let number_list = attr.parse_args::<NumberList>().unwrap().numbers;

In this case, use parse_terminated can be taken out of the general code. The concept of custom struct we will need it further.

Now let's complicate the task. We will parse a list of parameters, where there are key and value pairs:

#[derive(Middle)]
#[values(tentacli=works, join=us, on=discord)]
struct BetterStruct;

// для этого я применю уже рассмотренный выше подход с кастомным struct
struct ValuesList {
    pub items: Vec<(String, String)>,
}

impl syn::parse::Parse for ValuesList {
  fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> {
    let mut items = vec![];

    while !input.is_empty() {
      let key: Ident = input.parse()?;
      input.parse::<Token![=]>()?;
      let value: Ident = input.parse()?;

      items.push((key.to_string(), value.to_string()));

      if input.peek(Token![,]) {
        input.parse::<Token![,]>().expect(",");
      }
    }

    Ok(Self { items })
  }
}

#[proc_macro_derive(Middle, attributes(values))]
pub fn middle(input: TokenStream) -> TokenStream {
  // ...
  for attr in attrs {
    if attr.path().is_ident("values") {
      let items_list = attr.parse_args::<ValuesList>().unwrap();
      // ...
     }
  }

  // ...

  TokenStream::from(output)
}

What we parse is essentially just a set of tokens, so you can think of the parsing process as a sequential iteration of them – and if some element in the sequence is missed (say, in our case, the “=” sign is missed) – an error will occur at the compilation stage.

Since the parameters specified in the attribute brackets are passed without quotes, they are perceived by the parser as identifiersotherwise each key/value would have to be parsed as a string using LitStr instead of Ident.

These were all attributes for struct. For the sake of completeness, let's also consider the attributes of fields. Everything is the same with them, the only difference is that these attributes are parsed from fields.

#[derive(Hard)]
#[values(tentacli=works, join=us, on=discord)]
struct TopStruct {
  #[value("Tentacli")]
  name: String,
  #[value("https://github.com/idewave/tentacli")]
  github_link: String,
  #[value("https://crates.io/crates/tentacli")]
  crates_link: String,
}

#[proc_macro_derive(Hard, attributes(values, value))]
pub fn hard(input: TokenStream) -> TokenStream {
  
  // чтобы получить fields вместо DeriveInput используем ItemStruct
  let ItemStruct { ident, fields, attrs, .. } = parse_macro_input!(input);
  
  for field in fields.iter() {
    field.attrs.iter().for_each(|attr| {
      if attr.path().is_ident("value") {
        let value = attr.parse_args::<LitStr>().unwrap();
        values.push(value.value());
      }
    });
  }

  TokenStream::from(output)
}

I created a turnip on githubwhich contains all three examples.

Conclusion

Now with a full (I hope) understanding of how the macro functions, I suggest adding the code for the variable initializers and for the method from_binary:

let initializers = fields
  .iter()
  .map(|f| {
      let field_name = f.ident.clone();
      let field_type = f.ty.clone();

      let output = if let Some(dep_fields) = depends_on.get(&field_name) {
          quote! {
              {
                  let mut data: Vec<u8> = vec![];
                  #(
                      #binary_converter::write_into(
                          &mut cache.#dep_fields,
                          &mut data,
                      )?;
                  )*
                  #binary_converter::read_from(&mut reader, &mut data)?
              }
          }
      } else {
          quote! {
              {
                  let value: #field_type = #binary_converter::read_from(
                      &mut reader, &mut vec![]
                  )?;
                  cache.#field_name = value.clone();
                  value
              }
          }
      };

      if conditional.contains(&field_name) {
          quote! {
              {
                  if Self::#field_name(&mut cache) {
                      #output
                  } else {
                      Default::default()
                  }
              }
          }
      } else {
          output
      }
  });

let output = quote! {
  impl #ident {
    pub fn from_binary(buffer: &[u8]) -> #result<(Self, String)> {
      println!("It works !");

      let mut cache = Self {
          #(#field_names: Default::default()),*
      };
    
      let mut reader = #cursor::new(buffer);
      let mut instance = Self {
          #(#field_names: #initializers),*
      };
    
      let details = instance.get_json_details()?;
    
      Ok((instance, details))
    }
  }
};

The first question you might have is: what is this? reader, cache and other different variables that not announced before initializersbut for some reason they are used inside this variable. The answer is quite simple: the contents of the variable initializers will be substituted in that place of the variable outputwhere we indicated it. And everything that we passed inside TokenStream::from(output) – will be compiled in one piece. So, in the code above, – the variable cache declared on line 52, variable reader – 56 and all of them are announced TO of how initializers got into the code.

The second question: what is cache? It is a replica of the current instance struct except that the entry is made there up to the first field with an attribute depends_on. Thanks to this approach, you can make a request to previously read fields without waiting for all fields to finish reading. And at the build stage, decide how to correctly read the next field. For example, let's take the very first code, where the package is described SMSG_MESSAGECHATThere is a conditional field there. channel_nameif at the reading stage we read it when it was not necessary to do so, then the next field (and all subsequent ones) will already be read incorrectly, which will lead to an error.

And ask the third question in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *