Escape sequences and numeric notation in PHP

Hello, Habr. In anticipation of the start of the online course “PHP developer” prepared a traditional translation of the material. We also offer you to watch the recording of the past demo lesson “PHP Ecosystem”.


Many modern programming languages ​​support different ways of using different characters, such as ordinary English Latin letters, numbers, symbols, emoji, and various special characters such as the newline or tab character.

Most characters can be simply typed and used in PHP code as they are. For example, $string = "php.watch" is a completely valid string in PHP, and $num = 42 is a valid number. It is also possible to use multibyte characters (which require more than one byte to store), for example, this perfectly valid emoji: $emoji = "?"

PHP, along with many other programming languages, supports a certain number of escape sequences for various characters that cannot be typed from a regular keyboard, cannot be represented in text form (such as invisible characters or various control characters), or otherwise not are read. These symbols use escape sequencesthat PHP recognizes.

As far as numbers are concerned, PHP supports standard decimal numbers, but it can also use other notations such as binary, octal, hexadecimal and even scientific view… They can make your code more readable and understandable depending on the context.

Double quotes and Heredoc

In PHP, a double quoted string ("string") or Heredoc (see below) supports escape sequences and variable interpolation.

PHP will try to interpolate variables if the string literal is inside a double quoted string or Heredoc.

$name="John";
echo "Hi $name"; // "Hi John"
$name="John";
echo <<<HEREDOC
Hi $name
HEREDOC;
// "Hi John"

Alternatively (and preferably) interpolated variables can be enclosed in curly braces so they appear more readable:

$name="John";
echo "Hi {$name}"; // "Hi John"

Single quoted strings ('string') and Nowdoc syntax do not interpolate variables:

$name="John";
echo 'Hi $name'; // "Hi $name"
$name="John";
echo <<<'NOWDOC'
Hi $name
NOWDOC;
// "Hi $name"

Only double quoted strings and Heredoc support escape characters

Escaping characters

Because PHP interprets and interpolates special characters inside double quoted string literals and heredocs, the backslash () is used as an “escape character”.

For example using $name instead $name will prevent PHP from interpolating the variable $name

$name="John";
echo "Hi $name"; // "Hi $name"

Using the second backslash character prevents the first backslash character from being escaped.

$name="John";
echo "Hi \$name"; // "Hi John"

PHP supports several special escape sequences for special characters. In the above example $ is considered an escape sequence because it overrides PHP interpolation, forcing PHP to literally use the character $

Tab characters: t and v

Perhaps the simplest of the control characters is the tab character. The tab character (by pressing the tab key) can be used inside a string literal, but using t makes it clear instead of a visual pass. Using t instead of literal tab character also avoids automatic replacement of tab characters with spaces in various IDEs.

echo "FootBar";
Foo Bar

v is a vertical tab… On supported terminals, the vertical tab character goes to the next character on the next line:

echo "FoovBarvBaz";
Foo
   Bar
      Baz

New lines: r and n

r (“Carriage return”) and n (“Line feed”) are newline characters.

Historically, various systems began to use either ror nand even Windows uses rn… For example, Linux and MacOS use the linefeed character (n) as a newline character, whereas Windows uses the combination rn (carriage return followed by line feed). Older macOS systems used as newline character r

PHP has a constant PHP_EOLwhich always refers to the system newline character.

echo "LeftnLeftnRightnRight";
Left
Left
Right
Right

Escape character: e

Escape characters are often used to send ANSI escape sequences to the terminal. For example, efollowed by [32m, tells the terminal to change its color to green, and [33m – to yellow.

echo "e[32mGreen texte[0m e[33mYellow texte[0";

If the above snippet is executed in a terminal that supports ANSI escape sequences, it interprets it and changes the text:

New page character: f

The new page character is an ASCII control character to indicate the end of a page. It allows printers to display the current page and start at the top of the next. When f is passed to a display terminal, it can result in screen clearing, although this is a very rare behavior for most terminal emulation programs.


Octal ASCII character escapes

PHP supports escaping an octal number in its corresponding ASCII character.

For example, the ASCII character P is equal to 80 in decimal system (see diagram). 80 from decimal number system translated into octal – 120

For symbol P octal escape sequence can be used:

echo "120";

P

Any basic ASCII character can be represented using this notation:

echo "12011012056127141164143150";

PHP.Watch

Any value in the range before 377 will be interpreted as an octal ASCII character escape sequence.

Note that the numbers for extended ASCII characters (128 through 255) are not UTF-8 compatible. PHP counts the value 128 (octal: 200; hexadecimal: 80) is invalid because it is not a valid UTF-8 value.

Although PHP accepts such values, they are considered invalid characters in the UTF-8 context.

ASCII Hexadecimal Escape Sequences

Like octal character escapes, PHP also allows hexadecimal numbers to be escaped with x.

Only one byte is allowed, which implies a valid range from x0 before xFF… However, the UTF-8 limitation is still there, and only values ​​up to x80 will be considered valid characters.

Also, hexadecimal characters are not case sensitive (i.e. AF equally af and aF).

ASCII P – 80, which is x50:

echo "x50";

P

The same “PHP.Watch” example can be rewritten with hexadecimal escape sequences:

echo "x50x48x50x2Ex57x61x74x63x68";

PHP.Watch

Escape sequences of Unicode characters

PHP supports the use of any prefixed Unicode character u and hexadecimal code point inside curly braces.

echo "u{1F418} - u{50}u{48}u{50}u{2E}u{57}u{61}u{74}u{63}u{68}";

? - PHP.Watch

PHP will throw parser error if Unicode character exceeds value 10FFFF:

echo "u{10FFFF1}"

Invalid UTF-8 codepoint escape sequence: Codepoint too large on line ...

10FFFF is the upper limit because the UTF-8 spec declares range from U+0000 before U+10FFFF

In a previous version of this article, the upper limit was incorrectly referred to as FFFFFwhich was fixed to 10FFFF… Thanks to Sarah Golemon for pointing this out.


Unicode record form u{} can be used as escape sequence for any symbol. Here are some examples:

Symbol

Code point (Dec)

Code point (Hex)

Unicode escape sequence

A

65

41

” u {41}”

B

66

42

” u {42}”

$

36

24

” u {24}”

8364

20AC

” u {20AC}”

n (line feed)

10

A

” u {A}”

r (carriage return)

13

D

” u {D}”

t (horizontal tab)

nine

nine

” u {9}”

v (vertical tab)

eleven

B

” u {B}”

e (escape character)

27

1B

” u {1B}”

f (new page)

12

C

” u {C}”

?

128024

1F418

” u {1F418}”

3461

D85

” u {D85}”


Binary strings

Several years ago PHP 5.2.1 introduced a new string syntax called binary strings. It was a stretch to call this syntax, and was conceived as part of improving forward compatibility with the upcoming PHP 6.

The syntax was to add the prefix b to single / double quoted strings so that PHP can recognize it as a binary string.

echo b'Foo';

Foo

Functions is_binary, is_unicode and is_buffer were supposed to distinguish between binary strings and Unicode strings in PHP 6, but they never made it to the next version of PHP 7. However, the binary string syntax did make it to PHP 7 and is supported in PHP 8.

The binary string syntax does not have any special functionality and only serves to be a historical relic, a surprise interview question, a fun fact for PHP articles, and to confuse other PHP developers.

This syntax is not deprecated and is not planned to be removed in a future version of PHP.


Numeric notation

When using numeric literals in PHP scripts, PHP expects decimal values ​​by default. However, PHP also allows other numeric notations such as binary numbers, octal numbers, hexadecimal numbers and scientific notation

As of PHP 7.4, PHP also allows separate digits with underscores for better readability of long numbers.

Binary notation

Any numeric literal starting with a prefix 0b, will be considered a binary number.

$number_binary  = 0b101010;

For better readability, you can use the underscore separator.

$number_binary = 0b10_1010;

0b101010  === 42; // true

0b10_1010 === 42; // true

Octal notation

PHP accepts prefixed octal numeric values 0… As of PHP 8.1 PHP also supports explicit octal numeric notation prefixed with 0O and 0o

$number_octal = 052; // === Десятичное 42
$number_octal = 0o52; // === Десятичное 42
$number_octal = 0O52; // === Десятичное 42

Hexadecimal notation

Numbers are interpreted in hexadecimal using prefixes X and x

$number_hex = 0x2A; // === Десятичное 42
$number_hex = 0X2A; // === Десятичное 42

Scientific Notation

PHP also supports scientific “E-notation” for floating point values.

$number_float = 42E1;

E-notation is equivalent to 42 * 10 ^ 0 (10 to the 0 power). Please note that in many programming languages ​​(including PHP) the operator ^ is used for XOR and for exponentiation is used **

The readability benefit of scientific notation is more pronounced for very precise or large numbers.

$planck_constant    = 6.62607004E-34;
$avogadros_constant = 6.022140857E+23;
3.844E5 === 3.844 * 10**5 === 384400.0; // true

PHP supports various forms of escape sequences and numbers. He recently added explicit octal numeric notation prefixed with 0O / 0o (PHP 8.1)and numeric separators (PHP 7.4)

Note that these escape sequences and numbers are not interpreted when reading user input. For example, converting an underscore-separated number to an integer will not count the use of this new PHP feature for user-supplied strings.

var_dump((int) "2_34_5");
// int(2)

var_dump((int) "0xabcd");
// int(0)

Escape sequences in user input are also ignored. For example, if the form contains "43", this value will be used as is, without being interpreted as an octal character escape sequence. BUT $str = "43" will give out "https://habr.com/ru/company/otus/blog/553960/#"because in PHP source files it works.


Learn more about the course “PHP developer”

Watch the recording of the past demo lesson “PHP Ecosystem”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *