Reverse Engineering LHX Game Assets. Part 5, Final

Intriguing picture

Intriguing picture

Second higher education

In the previous post I shared my joy about the fact that I managed to dig out the game models from LHX and bring them to a modern look. And also the models themselves. And even the method I did it.

But after that, by inertia, I decided to dig further. Optionally, so to speak.

There are many other files in the LHX resources besides the dot files. And some of them look pretty simple (deceptively so) – for example, the scenario files (with mission descriptions); others have interesting extensions – .fnt (always wanted an authentic CGA/EGA era font – 4 by 6 dots), and SECRET.PIC is a real challenge.

Close-up call

Close-up call

In general, I have encountered the following extensions for resource files:

  1. drv — files like “IBMDRIVE.DRV” suggest that these are boring drivers for old hardware

  2. fmd And fme — judging by the fact that the names of the files are equal to the names of the game helicopters, and only the name of the osprey (which can fly both as a helicopter and as a plane) is carried by both the fmd and the only fme file — Maybethis is data about flight model

  3. fnt — obviously, fonts. Besides, there are only 2 of them and one of them is called 4×6 — that is the size in pixels of the font from the game

  4. msk – a mask?

  5. pic – well, the pictures

  6. s — a mishmash of everything, but sometimes bits of lines from mission briefings slip through. Scenarios?

  7. sng — song is very suggestive, but the file names are strange — CHOPLIB, CHOPPER, CHOPTAN…

  8. w — file names (ASIA, EUROPE, GULF) that match the names of locations in the game, plus the names of settlements inside the files make one believe that w is world (map).

  9. bin – it's not clear what, the darkest forest. binary – well, that's too general

  10. 2 — It's funny, but this is the most obvious format. Because there is only one file with this extension, and it is called “palette”

First approach

I decided to start with pictures — because a picture in a computer game from 1990 is hardly more complicated than a regular bitmap. When viewing the contents of all PIC files, the first 4 characters were “PXPK”. To me, this looks very much like a title, like Pictures Packed — a hint at a packed picture. It makes sense (it was the 90s, after all) — who would release a pure bitmap into the wild? It's so big! Well, then maybe it's some kind of standard old picture format? I need to google it!

In general, by this point I had come to the firm conclusion that in our time absolutely any problem must first be googled – the Internet has existed for a long time, a lot of knowledge has already accumulated even on rather exotic questions. Actually, this exotica was given to me, all – irrelevant, but interesting in places. And sometimes confusing – I found Deluxe Painta 1990s graphics editor released by EA, and what's more, it was co-created by Brent Iverson – the guy who wrote LHX!

A hint about who wrote (or at least supervised) LHX

A hint about who wrote (or at least supervised) LHX

But the files saved in it Not had the letters PXPK in the header. And the initial bytes themselves didn't really match. It's a pity, but… I had to give up for a while.

The situation was the same with the other extensions. Fonts with the fnt extension and a mention of DeluxeFonts inside? Possibly, but it is unclear what kind of structure to expect there in principle. Other mysterious bin files? It is even more unclear what this is about. And so on. Again, some tool was needed that does things that are very useful for a mega-narrow circle of people – takes a set of bytes and visualizes them in different ways – and look, a hint will appear.

Interlude 3

And I found such a tool – gorgeous hobbits. He takes the bit stream – that's why Hobbits — for example, from a file, and displays it in different forms:

Above - possible ways of representing a bit stream, a little lower - an obvious picture given by bits - zeros and ones. But the byte stream containing these bits in places looks like this - 1F9C07003F and so on. Without such a tool - you will never guess that it describes a picture.

Above are possible ways of representing a bit stream, a little lower is an obvious picture given bits — zeros and ones. But the byte stream containing these bits in places looks like this — 1F9C07 003F and so on. Without such a tool — you will never guess that it describes a picture.

But it didn't help either – all the files except for “text” ones showed just some kind of jumble. I put “text” in quotes because there were clearly text phrases (in English, of course) slipping through there. But they slipped through mostly at the beginning-middle of the file, and by the end there was often nothing readable there:

Looks like a coded message from the Center

Looks like a coded message from the Center

Second approach

And I decided to take them on – what else could I do? Text is still a much more understandable domain than drivers, for example.

Plus, the contents of these text files suggested that, Maybethese files are packed.

I knew a very high-level principle of simple types of file packaging: “go through the file and build a dictionary of encountered words/letter combinations, replace the next encountered ones with a link to the dictionary.” Actually, the nature of the mangled files (more or less readable at the beginning, complete finish at the end) hinted at such a principle – after all, by the end of the packaging process, the dictionary grows more and more, and accordingly, there are also more and more links instead of text. But the devil is always in the details.

Anyway, I started googling the packing algorithms invented before 1990 (that's where context brought simplification instead of complications). There weren't that many of them, many were based on the academic LZ77 and LZ78 (a joint work of Abraham LEmpel and Jacob Zwillow 1977 and 1978 years, respectively). The one that seemed to fit (when manually checking by substituting letters at addresses/offsets) in my case was called LZSS.

The main ideas of LZSS

In general, packaging according to this algorithm (taking into account the context) works as follows.

First, the results of packing are written to a separate output file.

Secondly, the dictionary as such is not created. The source text itself is the dictionary. Well, more precisely, not the entire text, but only a part of it – in this algorithm it is also called a “window”, and it goes from the beginning of the text to the current cursor position.

The packaging process goes like this:

  1. In the dictionary (aka the “window” before the cursor), a sequence is searched for that matches a sequence of fixed length (here this length = 19, I’ll explain later) going after cursor (candidate for packing). Once again, before the cursor is the dictionary, after the cursor is the possibly packed text. Where the cursor itself is included is not important for the explanation. If:

    1. A match was found – that's it day off the file is written a link to the match. This is a pair of numbers (and this is 2 bytes, but there is a catch, I will describe it later) – “Distance to the matching combination from the dictionary” and “Length of the combination”, after which the cursor moves to the right by the value, equal to “Length of combination”increasing the window length. Or – if the window length is already maximum – by moving it. And everything starts again.

    2. If not found, then the length of the sequence in question (the one to the right of the cursor) is reduced by 1. If:

      1. The length turned out to be equal to 2 (I will also explain later) – then it is considered that no match was found, therefore in day off file is written 1 character – the one that on the right from the window. The cursor moves to the right by 1 this symbol – and returns to point 1 with the default length of the sequence in question (which is 19).

      2. Length not yet 2 – return to point 1, but with a reduced length of the sequence in question.

  2. In order to figure out later, when unpacking, where the real text is and where the links are, this algorithm uses some kind of road signs: in the packed byte stream (i.e., skipping all the headers), starting from the very first byte, markup bytes are periodically encountered. Or “flag bytes”, according to Internet terminology. I repeat, the first byte in the stream is also a flag byte. Bits These flag bytes contain information on how to interpret the bytes (of which there will be 8 to 16) immediately following the flag byte during unpacking. If the next bit of the flag byte:

    1. equals, for example, 1, then the next byte of the stream under consideration is unpacked text and must be rewritten to the output file unchanged, after which it is necessary to shift 1 byte further in the packed stream.

    2. equal to the opposite value (in our case, 0) – then two the next bytes are a link, and the text of the specified length must be copied to the output file from the specified location in the previous bytes of the stream, after which it is necessary to shift 2 bytes further in the packed stream. That is why the number of bytes marked with the flag byte can be from 8 (all bytes are unpacked text) to 16 (all bytes are links).

The archive structure is visible even in the text view (encoding - ANSI-1251). Green - flag bytes.

The archive structure is visible even in the text view (encoding – ANSI-1251). Green – flag bytes. “I” is FF in ANSI-1251. That is, a byte with all bits = 1. This means that all the following 8 bytes are text, and there are no links there. Therefore, after “I” there are always 8 readable characters. But the last green character is unprintable, and in Hex-view it is visible that it is equal to 7F, which in bit representation = 0111 1111. The senior (last) bit is equal to 0 – this means that the last 2 bytes (underlined) in the sequence after this flag byte are a link, and there are 9 bytes in the sequence.

A couple of clarifying details.

  1. The window does not immediately become the maximum length. First, when the cursor is at the beginning of the file, the window begins to be to from the beginning of the file and has a size of 3 (because remember that the minimum length of a matching sequence = 3). And the algorithm thinks that it is filled with machine zeros. You can use anything, of course, but zeros are quite a compromise idea. As the cursor moves to the right, the window grows and at some point (when its real length becomes 2 less than the official one) it gradually moves out of the beginning of the file and never returns there.

  2. A pair of values ​​(Distance to a matching dictionary combination and Combination length) of a link is 2 bytes. But this is not a pair of bytes aka 2 numbers with a max. value = 256. This is not enough for the distance, and too much for the length of the sequence. Therefore, they decided to cut 16 bits of these 2 bytes into 12-bit and 4-bit numbers. And therefore, the maximum distance = 2 in the 12th = 4096 bytes. And the maximum length of the match = 2 in the 4th = 16 bytes. But here we remember that we do not pack less than 3 bytes, so the algorithm considers that the recorded length should be Always add 3. 16+3 = 19 – that's where the maximum length of the packed sequence comes from.

  3. So why are at least 3 bytes packed? I think it's obvious – if a link weighs 2 bytes, then there's no point in changing one two-byte sequence to another. That's why there are 3.

As a result, after a week of poking around, it turned out to be something like this – “text” files are packed in LZSS with the only difference being that in the original algorithm the dictionary window moves smoothly, and in the implementation in LHX it is rigidly fixed. And when packing, the cursor passes through a set of windows. Passed 4k characters – the window is again of the minimum size and the return distances are again small.

The archive header specifies the size of the unpacked file in bytes and a pair of zeros, followed by the archive itself in the form of the above-described sequence of tuples of the form “flags byte, content bytes”. The final tuple “flags byte, stream bytes” will not necessarily contain stream bytes for all 8 flag bits (simply because the unpacked file has ended), but a bit from the flags byte has only 2 states – text/link. There is no “end” state – so indicating the size of the unpacked file is very helpful in understanding that unpacking is complete, without unexpected exceptions.

As you can see (especially above, under the cut), there are a lot of nuances, half of which had to be reversed manually, because the options documented on the Internet are not about fixed windows. All this was accompanied by writing raspacker (in C#, of course, why change horses?) All this was constantly tested on packed “text” files (taking into account that there were relatively few of them, and atpacker to make new and immediately correct, of course, was not there), and it all worked far from immediately. But it worked. Text files were completely correctly unpacked and were met with enthusiastic applause. Victory!

And 2 minutes after the enthusiastic applause, I absolutely for fun fed the unpacker other resource files – and the joke was a success. Almost all the files were completely correctly unpacked.

Except PNT and PIC.

And now – pictures

PNT – because these are point files, they were not packed – I used them before to extract models.

And PIC — because they have a header of the image format before the archive header (the size of the future file and two zeros). The same one, starting with PXPK (whatever that means). The header is of a rather trivial format — “PSXPK signature, how many colors the image has (256 or 16 or 4, that is, VGA/EGA/CGA), the size of the image by X, garbage, the size of the image by Y, zeros.” And then comes the archive itself, in the usual format. That same practically simple bitmap — a matrix of pixel colors.

At the same time, color (256/16/4 colors) imposes its own specifics, allowing you to pack the picture even more, but with a different approach.

256 colors require 2 in 8 values, that is 8 bits or 1 byte for each pixel. But 4 colors require only 2 in 2 values, that is 2 bits (a quarter of a byte).

Therefore, as you may have guessed, in a 4-color CGA image, each byte contains the colors of as many as 4 pixels in a row, in a 16-color EGA image, each byte is the color of two pixels, but in a 256-color VGA, there is no packaging – each byte is the color of one pixel.

And only when I wrote the code for this additional unpacking in my unpacker (github link), I remembered how, as a child, I made color tables for editing sprites in the CGA game Goody and finally understood why one number there denoted the color of two pixels at once.

Anyway, I also unpacked the pictures and got those same bitmaps with color. The bitmap specifies the color number. And the colors themselves are stored in that same “palette.2” in the simplest format — RGB values ​​of components for 256 colors, simply written in a row. And its size = 256*3 = 768 bytes.

It is interesting that there were only 4 CGA bitmaps — these are cockpit pictures. And cockpits are the only pictures (from the entire list of pictures) that are constantly shown during the drawing of the flight itself.

Cabin penetration sprites are drawn with minimum color = 16.

Cabin penetration sprites are drawn with minimum color = 16.

And the broken screen sprites too.

And the broken screen sprites too.

And here they are in the game in 4-color (CGA) mode.

And here they are in the game in 4-color (CGA) mode.

As far as I understand, the program downgrades all the pictures for CGA on the fly from the EGA version with all sorts of substitutions of solid colors for patterns. But this is a bit expensive, and therefore the cockpits are still stored ready.

A piece of the world map from the game in CGA mode

A piece of the world map from the game in CGA mode

The same, but in EGA mode. The colors have changed, but the shapes haven't.

The same, but in EGA mode. The colors have changed, but the shapes haven't.

I was too lazy to write a converter from bitmaps to a modern graphic file format, so I converted from bitmaps (taking into account the palette) to png directly in Mathematica.

The same, but in EGA mode. The colors have changed, but the shapes haven't.

This is the basic converter I came up with. Only the file name should be in the format something like this — «cp_blk4-picture-320×200×4-unpacked.png», so as not to mess around with manually specifying the color and size. And the color values ​​of the palette had to be normalized by dividing by 256 (line 3).

An example of the result is a picture from a resource file with all the game medals in VGA format.

An example of the result is a picture from a resource file with all the game's medals in VGA format.

Remnants of luxury

What about other file types?

*.s — really turned out to be script files, with all sorts of variable names — but there was no surprise here, I reversed the unpacking algorithm on them, after all.

*.bin — some of them are string libraries. But then again, I reversed the algorithm on them too. Well, and in general the name “strings.bin” kind of hints at it. And “strings2.bin” — too. But I couldn't figure out all those “AH‑MCGA.BIN”

*.fnt — here hobbits showed that fnt are indeed fonts, and in bitmap form. I didn't bother to dig into their format, because I was too lazy — and it was obvious anyway:

Bitmap of file “4x6.fnt”

Bitmap of file “4×6.fnt”

Bitmap of the file “PROP.fnt”

Bitmap of the file “PROP.fnt”

*.msk – really turned out to be bit masks, obviously for clipping the 3D image in the cockpit:

Cut out screens of masks from hobbits over the cockpit picture

Cut out screens of masks from hobbits over the cockpit picture

*.w – I started messing around with them, and I'm 98% sure that these are level maps, but I didn't get them out.

The contents of the other types (*.drv, *.fmd, *.fme, *.sng) remained a mystery to me simply because they did not interest me.

Unexpected end

Well, and for dessert.

Some time after I had finished this whole epic, a rather trivial thought, in essence, came to my mind.

Electronic Arts is a game company that is still well-known today, because it churns out tons of games. And it's unlikely that even in the 90s it wrote unique auxiliary tools for each of its games. Maybe my unpacker can unpack something else?

I went to old‑games.ru, sorted games released by EA by year and found a couple more flight simulators released around the same time (and which I had absolutely no idea about):

  • Stormovik: Su-25 (1990) (what an irony, it was with the Su-25 that I started to pick apart LHX)

  • Chuck Yeager's Air Combat (1991)

Some of EA's games from 1990-91.

Some of EA's games from 1990-91.

I downloaded them, quickly ran through them and:

  1. They both had library files that unpacked fine with the library unpacker (github).

  2. Both of them have preserved the structure of the model description in the executable file.

  3. Su-25 had packed point files, and the point files from CYAC had their header format changed (u2 separators were added between pointers, but this is not accurate)

  4. Resource files were successfully unpacked, images were successfully converted, CYAC palette should be normalized by dividing by 64, not by 256.

I stopped here because you have to know when to stop. But maybe someone will want to continue poking around in these old, old bytes and solving the riddles of years long gone.

I'll just insert a picture from the credits of Chuck Yeager's Air Combat – it was here that I first saw the name of the person who, most likely, so many years ago modeled those cute boxes called M113 and House6 (or at least drew all those heartfelt pictures with a screaming pilot for LHX) – Cynthia Hamilton.

Chuck Yeager's Air Combat credits screen with the name of the creator (I'm 86% sure) of all those models I've been digging for so long.

Chuck Yeager's Air Combat credits screen with the name of the creator (I'm 86% sure) of all those models I've been digging for so long.

A quick Google search led me to this page.

And to this one:

And here is this passage in the game description Budokan: “The graphics team of Mike Kosaka, Nancy Fong, Mike Lubuguin, Cynthia Hamilton and Connie Braat (animations) also worked together on Kings of the Beach, Lakers vs Celtics and the NBA Playoffs, and Ski or Die.

Connie and Cynthia also worked on the graphics for LHX Attack Chopper and Stormovik: SU-25 Soviet Attack Fighterwith Rick Tiberi doing the programming.” Highlights and links are mine.

End.

That same SECRET.PIC

That same SECRET.PIC

All links in one place:

  1. Programs:

    1. Unpacker .LIB files

    2. Unpacker of resource files for some EA games

    3. Converter of 3D models from binary format EA to format OBJ

  2. Models extracted from LHX:

    1. In binary format EA

    2. In OBJ format with the most watchable settings

    3. Models in all variants in one archive

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *