Recognizing a Real Number from a String

In programming microcontrollers, it is often necessary to enter some calibration coefficients into the firmware in the UART console: values ​​of currents, threshold voltages, lighting levels, and so on. As a rule, these are real floating-point numbers.

Then you often need to analyze text logs from the SD card. It is necessary to grab real numbers from CSV lines.

It is also a parsing of the NMEA 0183 protocol, which is in all navigation receivers, in order to tritely read latitude and longitude.

All this requires some reliable, portable, transparent and simple algorithm to recognize real numbers from strings.

Formulation of the problem.

Given a string of characters. Recognize a real (real) number from a string of characters. If there is no number, then throw an error. In this case, the prototype should be

bool try_str2real(const char* const text, double* const double_value)

Here is the legal alphabet of characters in a string: ‘ ‘, ‘+’, ‘-‘, 0 1 2 3 4 5 6 7 8 9, ‘e’, ​​’E’ ‘.’ ‘\n’ ‘\r’

Here is a list of test cases:

No.

Input string

Recognized number

is it a real number?

1

“.0”

0.0

Yes

2

“+.”

No

3

“005047e+6”

5047000000

Yes

4

“-8115e957”

INF

Yes

5

“2e0”

2

Yes

6

“4e+”

No

7

“.”

No

The full list of test cases can be viewed here

https://docs.google.com/spreadsheets/d/1rbEw8p8CMHzM5DJUYGxDX-BCQ8sZsp4jDMRsbifJD9k/edit#gid=0

for some reason, LeetCode rate this puzzle hard. https://leetcode.com/problems/valid-number/description/

Although it seems to me that there is nothing complicated. In general, this is the rare case when real tasks from prod(a) flicker on LeetCode.

Solution

real number– this is essentially a number to represent continuous physical quantities (mass, voltage, lighting, etc.).

To begin with, let’s decide on the scheme of the syntactic work of a venal number. Here is a diagram for parsing a rational number

Looking at this diagram, 5 or 6 distinct states can be distinguished.

typedef enum{
	PARSE_NUMBER_STATE_PARSE_SIGN=1,
	PARSE_NUMBER_STATE_PARSE_INTEGER=2,
	PARSE_NUMBER_STATE_PARSE_FRACTION=3,
	PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN=4,
	PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER=5,
	PARSE_NUMBER_STATE_DONE=6,

	PARSE_NUMBER_STATE_UNDEF=0
}ParseNumberStates_t;

Therefore, I will solve this problem using a finite automaton methodology. Here is the main function. It loops through the state machine to recognize a number from a string.

typedef struct {
    double value;
    double mantissa;
    double exponenta;
    char cur_letter;
    int8_t sign;
    uint64_t integer ;
    uint32_t fraction_order;
    double fraction;
    ParseNumberStates_t state;
    bool spot_mantissa;
    bool spot_exponent;
    int8_t exponent_sign;
    uint32_t exponent_integer;
}Text2NumberFsm_t;


bool try_str2real(const char* const text, double* const double_value){
    bool res = false;
    uint32_t len = 0;
    if(text){
        len = strlen(text);
        if(len){
            if(double_value){
                res = true;
            }
        }
    }

    if(res) {
        Text2NumberFsm_t Fsm;
        res = text_2_number_init(&Fsm);
        uint32_t i = 0;
        uint32_t ok = 0;
        uint32_t err = 0;
        LOG_DEBUG(STRING, "ParseDoubleIn:%u:[%s]", len, text);

        for(i=0; i<len; i++){
            Fsm.cur_letter = text[i];
            res = number_proc_one(&Fsm);
            if (res) {
                ok++;
            } else {
                err++;
                LOG_ERROR(STRING,"ProcErr: ch[%u]=%c ",i, text[i]);
                break;
            }
        }

        if(0==err) {
            Fsm.cur_letter="\n";
            res = number_proc_one(&Fsm);
            if (res) {
                ok++;
            } else {
                err++;
            }
        }
        if (0==err) {
            *double_value = Fsm.value;
            res = true;
        } else {
            LOG_ERROR(STRING,"len:%u ok:%u",len, ok);
            res = false;
        }
    }
    return res;
}

Here is the character handler. The algorithm changes depending on the current state of the state machine

static bool number_proc_one(Text2NumberFsm_t* Node){
    bool res = false;
    LOG_DEBUG(STRING, "Proc: %s",NumberParserFsm2Str(Node));
    if(Node) {
        switch (Node->state) {
            case PARSE_NUMBER_STATE_PARSE_SIGN:
                res = number_proc_parse_sign(Node); break;
            case PARSE_NUMBER_STATE_PARSE_INTEGER:
                res = number_proc_parse_integer(Node); break;
            case PARSE_NUMBER_STATE_PARSE_FRACTION:
                res = number_proc_parse_fracion(Node); break;
            case    PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN:
                res = number_proc_parse_exponenta_sign(Node); break;
            case    PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER:
                res = number_proc_parse_exponenta_integer(Node); break;
            case PARSE_NUMBER_STATE_DONE:
                res = number_proc_done(Node); break;

        default: res = false; break;
        }

    }
    return res;
}

Here is the initialization of the state machine for parsing a real number.

static bool text_2_number_init(Text2NumberFsm_t* Node){
    bool res = false;
    if (Node) {
        Node->value = 0.0;
        Node->sign = 1;
        Node->integer = 0;
        Node->exponent_integer = 0;
        Node->fraction = 0.0;
        Node->fraction_order = 1;
        Node->spot_mantissa = false;
        Node->spot_exponent = false;
        Node->mantissa = 1.0;
        Node->exponent_sign = 1;
        Node->exponenta = 1.0;
        Node->cur_letter="\0";
        Node->state = PARSE_NUMBER_STATE_PARSE_SIGN;
        res = true;
    }
    return res;
}

And these are auxiliary functions handlers for each individual state so that all functions individually fit on one screen



bool  number_compose_result(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
    	if(Node->spot_mantissa) {
            Node->value = ( (double)Node->sign ) * (((double) Node->integer) + Node->fraction);
            LOG_DEBUG(STRING,"Val:%f",Node->value);
            Node->state = PARSE_NUMBER_STATE_DONE;
            res = true;
        } else {
            LOG_ERROR(STRING,"NoMantissa");
    	}
    }
    return res;
}

static bool number_proc_parse_sign(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case 'E':
    case 'e': {
        Node->mantissa = 1.0;
        Node->spot_mantissa = false;
        Node->integer = 1;
        Node->fraction = 0.0;
        Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN;
        res = true;

    }break;

    case ' ': {
        res = true;
    }break;
    case '+': {
        Node->sign = 1;
        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
        res = true;
    }break;
    case '-': {
        Node->sign =-1;
        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
        res = true;
    }break;
    case '0':  {
        Node->sign =1;
        Node->spot_mantissa=true;
        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
        res = true;
    }break;
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        Node->spot_mantissa = true;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        Node->integer *=10;
        Node->integer += digit;

        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
    }break;
    case '.': {
    	Node->fraction_order = 1;
        Node->integer = 0;
        Node->spot_mantissa = false;
        Node->state = PARSE_NUMBER_STATE_PARSE_FRACTION;
        res = true;
    }break;
    case '\n':
        res = false;
        break;
    case '\r':
        res = false;
        break;
    default:
        res = false;
        break;
    }
    return res;
}

static bool number_mantissa_save(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
        Node->mantissa = ( (double)Node->sign   ) * (((double) Node->integer) + Node->fraction);
        LOG_DEBUG(STRING,"Mantissa:%f",Node->mantissa);
        res = true;
    }
    return res;
}

static bool number_proc_parse_integer(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case 'E':
    case 'e': {
        res = number_mantissa_save(Node);
        Node->state =PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN;
    }break;
    case ' ': {
        res = false;
    }break;
    case '.':{
    	Node->fraction_order=1;
        Node->state = PARSE_NUMBER_STATE_PARSE_FRACTION;
        res = true;
    } break;
    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        LOG_PARN(STRING,"ParseInt %u", digit);
        Node->integer *= 10;
        Node->integer += digit;
        Node->spot_mantissa = true;
        res = true;
    }break;

    case '\r':
    case '\n':{
        res = number_compose_result(Node);
    } break;


    default:{
        res = false;
    } break;
    }
    return res;
}


static bool number_proc_parse_exponenta_sign(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case '+': {
        Node->exponent_sign = 1;
        Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
        res = true;
    }break;

    case '-': {
        Node->exponent_sign = -1;
        res = true;
        Node->state= PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
    }        break;

    case '0':{
    	Node->exponent_integer = 0;
    	Node->exponent_sign = 1;
    	Node->spot_exponent = true;
    	Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
    	res = true;
    }break;
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9':{
        uint8_t digit = 0;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        LOG_PARN(STRING,"ParseExpInt %u",digit);
        Node->exponent_integer *=10;
        Node->spot_exponent = true;
        Node->exponent_integer += digit;
        Node->state= PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
        res = true;
    }break;

    case '\r':
    case '\n':{res = false;} break;
    default: res = false; break;
    }
    return res;
}

static bool number_exponenta_calc(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
    	if(Node->spot_exponent ){
            int32_t rank = ((int32_t)Node->exponent_integer) * ((int32_t)Node->exponent_sign);
            LOG_DEBUG(STRING,"ExpRank %d",rank);
            double power = (double )rank;
            LOG_DEBUG(STRING,"power %f",power);
            if(rank < 306) {
                if(-306 < rank) {
                    Node->exponenta = pow(10.0,power);
                    LOG_DEBUG(STRING,"Exponenta: %f",Node->exponenta);
                    res = true;
                } else {
                	Node->exponenta = 0.0;
                    LOG_ERROR(STRING,"MathError:TooSmalExpPower %d",rank);
                    res = false;
                }
            } else {
                LOG_ERROR(STRING,"MathError:TooBigExpPower %d",rank);
                res = false;
            }
    	} else {
    		LOG_ERROR(STRING,"NoExponents");
    	}
    }
    return res ;
}

static bool number_proc_parse_exponenta_integer(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        LOG_PARN(STRING,"ParseExpInt %u",digit);
        Node->exponent_integer *=10;
        Node->exponent_integer += digit;
        Node->spot_exponent = true;
        Node->state= PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
        res = true;
    }break;

    case '\r':
    case '\n': {
        res = number_exponenta_calc(Node);
        if (res) {
        	if(Node->spot_mantissa) {
                double mantissa = 0.0;
                mantissa =((double)Node->sign)*(((double)Node->integer)+((double)Node->fraction));
                LOG_DEBUG(STRING,"mantissa %f", mantissa);
                Node->value = mantissa*(Node->exponenta);
                Node->state= PARSE_NUMBER_STATE_DONE;
        	}else {
        		LOG_DEBUG(STRING,"NoMantissa");
        		res = false;
        	}
        }
    }break;
    default:
        res = false;
        break;
    }
    return res;
}

bool number_compose_mantissa(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
    	Node->mantissa =((double)Node->sign)*(((double)Node->integer)+((double)Node->fraction));
    	res = true;
    }
    return res;
}

static bool  number_proc_parse_fracion(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter){
    case 'E':
    case 'e':{
    	res = number_compose_mantissa(Node);
    	Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN;
    }break;
    case ' ': {
    	LOG_ERROR(STRING,"TornFlowErr!");
    	res = false;
    }break;
    case '.': {
    	LOG_ERROR(STRING,"DoubleDotErr!");
    	res = false;}break;
    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        res = try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        double fraction_digit =  (    (double)digit  ) / (   pow(10.0, (double) Node->fraction_order) );
        LOG_DEBUG(STRING,"+ %f",fraction_digit);
        Node->spot_mantissa= true;
        Node->fraction += fraction_digit;

        Node->fraction_order++;
        res = true;
    }break;
    case '-':
    case '+': {res = false;}break;

    case '\r':
    case '\n':{
        res = number_compose_result(Node);
    } break;

    }
    return res;
}

static bool number_proc_done(Text2NumberFsm_t* Node){
    bool res = false;
    LOG_DEBUG(STRING, "Proc: %s",NumberParserFsm2Str(Node));
    switch(Node->cur_letter){
        case '\n':
        case '\r':{
            res = true;
        } break;
        default:{
            res = false;
        }break;
    }
    return res;
}

This solution was verified on the LeetCode website and turned out to be the fastest solution in the C programming language!

If you know the test data for the problem of recognizing a real number from a string, then write them in the comments.

Conclusion

State machines are ideal for parsing various data types from strings, in particular double. If you have learned to recognize the double type, then you can immediately say that you have learned to recognize uint8_t int8_t uint64_t, uint64_t, float with the same code. One double parser is universal!

Dictionary

Acronym

Decryption

DFA

Deterministic finite automaton

csv

Comma-separated values

NMEA

National Marine Electronics Association

Links

https://leetcode.com/problems/valid-number/description/

https://docs.google.com/spreadsheets/d/1rbEw8p8CMHzM5DJUYGxDX-BCQ8sZsp4jDMRsbifJD9k/edit#gid=0

Questions

–Do you know examples of real practical problems on LeetCode, the solution of which would be useful in real embedded projects?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *