We work with XML as an array, in PHP

Hello everyone. I want to share my experience in parsing XML, I want to tell you about a tool that helps me with this.

XML is still alive and sometimes you have to parse it. Especially if you work with SMEV (hello to all the guys for whom “federal executive body” is not an empty phrase :)).

The purposes of such parsing can be very different, from a trivial answer to the question of what namespace is used in an xml document, to the need to obtain a structured representation for the document as a whole.

The tool for each goal will be different. The namespace can be found by substring search or regular expression. To make a structured representation (DTO) out of an xml document, you have to write a parser.

PHP has a couple of built-in classes for working with XML. These are XMLReader and SimpleXMLElement.

XMLReader

Using XMLReader, parsing will look something like this:

$reader = (new XMLReader());
$reader->XML($content);
while ($reader->read()) {
    $this->parse($reader);
}

Inside the parse (XMLReader $ xml) method there will be endless ones:

$name = $xml->name;
$value = $xml->expand()->textContent;
$attrVal = $xml->getAttribute('attribute');
$isElem = $xml->nodeType === XMLReader::ELEMENT;

For small documents or when we only need a couple of elements from the entire document, this is acceptable, on large volumes – it starts to ripple in the eyes from the monotonous code, plus the conscience gnaws at the overhead from going through all the elements of the document.

SimpleXMLElement

SimpleXMLElement helps you analyze only the elements you need. This class makes an object from an XML document, in which all elements and attributes become properties, that is, it becomes possible to work only with certain elements, and not with all in a row, for example:

$document = new SimpleXMLElement($content);
/* имя корневого элемента */
$name = $document->getName();

/* получить произвольный элемент */
$primary = $document
	->Message
	->ResponseContent
	->content
	->MessagePrimaryContent ?? null;

/* получить элементы определённого пространства имён */
$attachment = $primary
	->children(
		'urn://x-artefacts-fns-zpvipegr/root/750-08/4.0.1'
	)
	->xpath('tns:Вложения/fnst:Вложение')[0];

/* получить значение элемента */
$fileName = $attachment
	->xpath('//fnst:ИмяФайла')[0]
	->__toString();

Convenient, but not entirely. If the name of the element is in Cyrillic, then you will not be able to access it through the property, you have to use SimpleXMLElement :: xpath (). You also have to work with multiple values ​​through SimpleXMLElement :: xpath (). Besides SimpleXMLElement has its own peculiarities and some things are far from obvious.

Converter

There is an easier way. It is enough to convert the XML document to an array. There are no pitfalls in working with arrays. The XML array is done in a couple of lines of code:

$xml=<<<XML
    <b attr4="55">
        <c>ccc
            <d/>
        </c>
        0000
    </b>           
XML;
$fabric = (new NavigatorFabric())->setXml($xml);
$converter = $fabric->makeConverter();
$arrayRepresentationOfXml = $converter->toArray();

Each XML element will be represented by an array, which in turn consists of three other arrays.

Respectively:

  • the array with index ‘* value’ contains the value of the element,

  • ‘* attributes’ – attributes of the element,

  • ‘* elements’ – nested elements.

/*
'b' =>
  array (
	'*value' => '0000',
	'*attributes' =>
	array (
	  'attr4' => '55',
	),
	'*elements' =>
	array (
	  'c' =>
	  array (
	  ),
	),
  ),
*/

If the element is multiple, that is, it occurs in the document several times in a row, then all its occurrences will be in the array with the ‘* multiple’ index.

$xml=<<<XML	
<doc>
    <qwe>first occurrence</qwe>
    <qwe>second occurrence</qwe>
</doc>
XML;

/*
'doc' =>
array (
  'qwe' =>
  array (
	  '*multiple' =>
	  array (
  	  0 =>
	    array (
		  '*value' => 'first occurrence',
	    ),
	    1 =>
	    array (
		  '*value' => 'second occurrence',
	    )
	  )
  )
)
*/

But that’s not all.

XmlNavigator

If you are dazzled by square brackets from working with XML documents as with an array, then XmlNavigator is your option, it is also created in two lines of code.

/* документ */
$xml = <<<XML
<doc attrib="a" option="o" >666
    <base/>
    <valuable>element value</valuable>
    <complex>
        <a empty=""/>
        <b val="x"/>
        <b val="y"/>
        <b val="z"/>
        <c>0</c>
        <c v="o"/>
        <c/>
        <different/>
    </complex>
</doc>
XML;
$fabric = (new NavigatorFabric())->setXml($xml);
$navigator = $fabric->makeNavigator();

XmlNavigator does the same thing as Converter, but it provides an API, and we work with the document as an object.

Element name, method name()

/* Имя элемента */
echo $navigator->name();
/* doc */

Item value, method value()

/* Значение элемента */
echo $navigator->value();
/* 666 */

Attribute list, method attribs()

/* get list of attributes */
echo var_export($navigator->attribs(), true);
/*
array (
  0 => 'attrib',
  1 => 'option',
)
*/

Attribute value, method get()

/* get attribute value */
echo $navigator->get('attrib');
/* a */

List of nested elements, method elements()

/* Список вложенных элементов */
echo var_export($navigator->elements(), true);
/*
array (
  0 => 'base',
  1 => 'valuable',
  2 => 'complex',
)
*/

Get nested element, method pull()

/* Получить вложенный элемент */
$nested = $navigator->pull('complex');

echo $nested->name();
/* complex */

echo var_export($nested->elements(), true);
/*
array (
  0 => 'a',
  1 => 'different',
  2 => 'b',
  3 => 'c',
)
*/

Iterate through all occurrences of a plural, method next()

/* Получить вложенный элемент вложенного элемента */        
$multiple = $navigator->pull('complex')->pull('b');

/* Перебрать все вхождения множественного элемента */
foreach ($multiple->next() as $index => $instance) {
    echo " {$instance->name()}[$index]" .
        " => {$instance->get('val')};";
}
/*
b[0] => x; b[1] => y; b[2] => z;
*/

All methods of the XmlNavigator class

The XmlNavigator class implements the IXmlNavigator interface.

<?php

namespace SbWereWolfXmlNavigator;

interface IXmlNavigator
{
    public function name(): string;

    public function hasValue(): string;

    public function value(): string;

    public function hasAttribs(): bool;

    public function attribs(): array;

    public function get(string $name = null): string;

    public function hasElements(): bool;

    public function elements(): array;

    public function pull(string $name): IXmlNavigator;

    public function isMultiple(): bool;

    public function next();
}

From the names of the methods, their purpose is obvious. Not obvious ones were discussed above.

How to install?

composer require sbwerewolf/xml-navigator

Conclusion

In work, you first have to use SimpleXMLElement – with its help, we get the necessary element from the entire document, and already we work with this element through the XmlNavigator.

$document = new SimpleXMLElement($content);
$primary = $document
    ->Message
    ->ResponseContent
    ->content
    ->MessagePrimaryContent;
$attachment = $primary
    ->children(
        'urn://x-artefacts-fns-zpvipegr/root/750-08/4.0.1'
    )
    ->xpath('tns:Вложения')[0];

$fabric = (new NavigatorFabric())->setSimpleXmlElement($attachment);
$navigator = $fabric->makeNavigator();

I wish you a pleasant use.

Epilogue

Of course, you may have your own alternatives for working with XML. I propose to share in the comments.

Of course, I can’t say that XmlNavigator will help with any XML – I haven’t checked it, but there were no problems with ordinary documents, without tricks in the document schema.

If the order of the elements is important to you, then you will have to use XMLReader. Because SimpleXMLElement converts a document to an object, and an object has no such thing as element order.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *