The floating point that rules the world

There is no end to the memes and jokes about how in programming 0.2 + 0.2 is not 0.4, but 0.40000009… Everyone is used to such restrictions stemming from the IEEE754 standard. But how did we come to it, what are FPU modules for working with floating point, how did ARM processors manage without them until recently? And where did the concept of floating point come from in mathematics? Let's try to figure it all out, and at the same time try it out in practice in code.

Where it all began

Imagine a mathematician at the beginning of the 20th century trying to grasp the ungraspable: he has to deal with numbers of truly cosmic proportions and at the same time with quantities so small that they elude human perception. How to grasp these extremes, to subject them to the strict laws of mathematics? This is the challenge that numbers threw down to scientists and engineers. And they accepted it, putting forward a revolutionary idea – floating-point numbers.

In 1914, Leonardo Torres y Quevedo, a Spanish engineer and inventor, proposed using scientific notation to represent numbers in computers. It was a simple but brilliant idea. Think of a number as a combination of two parts, the mantissa and the exponent. The mantissa is responsible for precision, and the exponent is responsible for scale. By moving the decimal point left or right, we can change the scale of the number while maintaining its precision. It's like a magic magnifying glass that allows us to see the smallest details or take in the entire universe at a glance.

But the path from idea to implementation was long and thorny. It was only in the 1950s, when the first electronic computers appeared, that floating point began to transform from an abstract idea into a practical tool. William Kahan, a professor of mathematics and one of the pioneers of computer science, took on the problems of accuracy and consistency of calculations. His work became the foundation for the future IEEE 754 standard, which remains the “constitution” of the floating point world to this day.

From Theory to Practice: First Steps

The first computers were like lumbering giants. They operated with whole numbers and had no concept of fractions or exponents. Programming for them was like trying to communicate with a foreigner without knowing his language. But as the needs of the scientific world grew, so did the desire to teach computers to speak the language of mathematics.

In the mid-1950s, a quiet revolution occurred when IBM introduced the IBM 704, the first commercial computer capable of handling floating-point numbers. It was like giving a child his first foreign language textbook. The computer began to learn the basics of the new language spoken by the universe.

But there were still many obstacles on the path to a universal language of numbers. Each computer system had its own peculiarities, its own floating-point “dialects.” Programmers, moving from one computer model to another, felt like travelers in a foreign country trying to understand local customs. What was needed was a single standard, a common language that all the world's computers could speak.

IBM-704, 1957

As computers became more powerful, so did the need for special floating-point units. The first attempts at implementation were purely software-based. It was like trying to build a skyscraper out of sand — the process was slow and involved significant loss of precision.

Hardware implementation: the birth of the FPU

But in the 1960s, the first hardware FPU modules appeared — specialized units inside the processor optimized for floating-point operations. It was as if builders were given bricks and concrete instead of sand: the process sped up several times, and the structures became stronger and more reliable. FPUs turned into a kind of “mathematical coprocessors” that unloaded the main processor from complex calculations.

The 1970s were the golden age of floating-point hardware. Chip companies began creating specialized math coprocessors. These chips were like miniature math geniuses, capable of executing the most complex calculations at lightning speed.

The Intel 8087 processor, released in 1980, became a true legend in the computer world. It transformed personal computers from simple word processors into powerful tools for scientific calculations and engineering design. With it, an ordinary PC could compete in computing power with large mainframes. The average person suddenly had a personal mathematical genius, ready to solve the most difficult problem at any moment.

Intel 8087

IEEE Standard 754: Universal Number Language

In 1985, something happened that changed the world of computing forever: the IEEE 754 standard was born. Whereas each computer manufacturer had previously spoken its own floating-point dialect, now everyone had a common language. The standard clearly defined the rules for representing numbers, performing operations, and handling exceptions. It was like the emergence of Esperanto in a world of multilingual computers.

IEEE 754 introduced several number formats that differed in precision and range. Single precision (32 bits) became the universal choice for most tasks. Double precision (64 bits) was intended for situations that required increased precision and a wide range. And extended precision (80 bits) was used for special cases when every bit counted.

The standard defined the structure of a floating-point number: sign, exponent, mantissa. A universal template by which any number could be represented in binary form. Special values ​​appeared – positive and negative infinity, as well as NaN (Not a Number). They allowed computers to correctly respond to situations that previously led to errors and failures.

Modern applications: from scientific calculations to neural networks

Floating-point numbers are the ubiquitous hard workers of the digital world. They do their work unnoticed in the depths of processors, graphics accelerators, and microcontrollers. Without them, modern scientific research, engineering design, and the creation of realistic graphics and special effects in movies and games are unthinkable.

But perhaps the most exciting application of floating point today is deep machine learning and artificial neural networks. Think of a neural network as a huge mathematical model of the brain, where each neuron is a floating point number. During training, these numbers are constantly changing, adapting to the input data. The accuracy of the numbers is critical here – an error in even one decimal place can lead to the neural network not converging, or giving an incorrect result.

Choosing the right floating-point format for neural networks is an art form. Single precision (FP32) has long been the standard, providing the best balance between precision and performance. But as neural networks and data volumes grew, the need arose to speed up computations. That’s where half precision (FP16) came into the picture, allowing for twice as many numbers to fit into memory and speed up processing on specialized accelerators. Some researchers are going even further, experimenting with formats like bfloat16 or even int8, which sacrifice precision for speed.

ARM and Floating Point

The history of the relationship between ARM processors and floating point is a complicated romance. ARM was originally designed as a power-efficient architecture for embedded systems and mobile devices. In such applications, floating point performance was not a priority – like a race car for a trip to the grocery store. So the first ARM processors did without an integrated FPU, performing floating point operations in software, slowly and not always accurately.

But times were changing, and ARM's ambitions were growing. In 1995, the first ARM-compatible FPU coprocessor was introduced, optionally attached to the processor. It was like a first date after a long friendship – exciting, but a little awkward. It took almost another decade for the relationship to move to the next level. In 2005, the ARMv7 architecture was released, and with it, the Vector Floating Point (VFP) unit for working with floating-point numbers, supporting single and double precision, became almost ubiquitous in ARM processors, cementing itself as a standard.

Today, ARM processors are used not only in smartphones and tablets, but also in laptops, servers, and even supercomputers. Giants such as Amazon, Microsoft, and Google build systems for high-performance computing and machine learning on ARM processors. And they are helped in this by advanced floating-point support, which combines high performance and energy efficiency.

ARM 1 on a motherboard is the first commercial processor based on the RISC architecture, released on April 26, 1985

Number Conversion: The Magic Under the Hood

To the average user, floating point numbers look completely mundane. Here's the number pi, 3.14 — what could be simpler? But under the hood of a computer, real magic of conversion occurs. Think of this process as the work of an invisible translator, which translates numbers from human language into the language of binary codes.

First, the sign of the number is determined — plus or minus. Then the integer and fractional parts are converted to the binary number system. The next step is to normalize the number. The binary comma is shifted so that only one unit remains to the left of it. This is similar to how we highlight the main idea in a sentence by pushing aside less important details. The number takes the form “1, mantissa × 2 ^ exponent”. Let me remind you that the mantissa is responsible for precision, and the exponent is responsible for scale.

Finally, all parts are packed into a single structure according to IEEE 754 format. The sign, exponent, and mantissa occupy strictly assigned bits, forming a binary representation of the original number.

In university computer architecture courses, students usually do all these operations manually on paper, but we can simplify this step and conduct an experiment to implement the IEEE754 standard in Python code:

def float_to_ieee754(number, precision=30):

# Nested function to convert floating point number to binary representation

def convert_to_binary(decimal_number, places):

# Separate the integer and fractional parts of a number

whole, dec = str(decimal_number).split(“.”)

whole = int(whole)

# Convert the integer part to binary and add a comma

result = (str(bin(whole)) + “.”).replace('0b', '')

# Convert the fractional part

for _ in range(places):

dec = str('0.') + str(dec)

temp = '%1.20f' % (float(dec) * 2)

whole, dec = temp.split(“.”)

result += whole

return result

# Determine the sign of the number (0 for positive, 1 for negative)

sign = 0

if number < 0:

sign = 1

number = number * (-1)

# Get the binary representation of the number

binary_representation = convert_to_binary(number, places=precision)

# Find the position of the comma and the first unit in a binary string

dot_place = binary_representation.find('.')

one_place = binary_representation.find('1')

# Remove the comma and adjust the positions if the first unit is to the right of the comma

if one_place > dot_place:

binary_representation = binary_representation.replace(“.”, “”)

one_place -= 1

dot_place -= 1

# Remove the comma and adjust the positions if the first unit is to the left of the comma

elif one_place < dot_place:

binary_representation = binary_representation.replace(“.”, “”)

dot_place -= 1

# Form the mantissa starting from the first unit

mantissa = binary_representation[one_place + 1:]

# Calculate the exponent and convert it to shift format

exponent = dot_place — one_place

exponent_bits = exponent + 127

# Convert the exponent to 8-bit binary representation

exponent_bits = bin(exponent_bits).replace(“0b”, '').zfill(8)

# Limit the mantissa to the first 23 bits

mantissa = mantissa[0:23]

# Generate the final 32-bit IEEE 754 representation

ieee754_binary = str(sign) + exponent_bits + mantissa

# Convert binary representation to hexadecimal

hex_representation = '0x%0*X' % ((len(ieee754_binary) + 3) // 4, int(ieee754_binary, 2))

return (hex_representation, ieee754_binary)

if __name__ == “__main__”:

# Example with a positive number

print(float_to_ieee754(263.3))

# Example with a negative number

print(float_to_ieee754(-263.3))

Conclusion

Floating-point numbers are the invisible heroes of our digital age, though sometimes, due to our lack of understanding, they seem like evil villains and agents of chaos. Few people think about how a computer represents and processes these numbers. But understanding the principles of floating-point is the key to creating efficient, accurate, and reliable programs. It is like knowing anatomy for an artist or music theory for a composer: the deeper the understanding, the more perfect the creation.

The history of floating-point numbers is the history of human ingenuity, the quest for precision, and the desire to embrace the ungraspable. From the first computers to modern supercomputers and neural networks, floating point has come a long way, becoming the universal language of science and technology. And who knows what other secrets and possibilities this elegant mathematical abstraction hides? Perhaps new formats and architectures will allow us to solve the problems of the future more effectively – from modeling complex systems to creating artificial intelligence that surpasses the human mind. And maybe one day we will learn to do without floating point at all, using fundamentally new approaches to computing.

However, these are all dreams and thoughts about what may or may not happen someday, but what about the present and the past? It will be interesting to read in the comments what surprises you encountered while working with floating point numbers?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *