double d = strtod("3ex",&end);
用3.0初始化d,并在输入字符串中将结束指针放在’e’字符.这正是我期望它的行为. ‘e’字符可能看起来是指数部分的开始,但是由于缺少实际指数值(6.4.4.2要求),所以’e’应被视为一个完全独立的字符.
但是,当我做
double d; char c; sscanf("3ex","%lf%c",&d,&c);
我注意到,sscanf对%lf格式说明符消耗“3”和“e”.变量d接收3.0值.变量c最后以“x”表示.这看起来很奇怪,有两个原因.
首先,由于语言规范在描述%f格式说明符的行为时指的是strtod,所以我直觉地期望%lf以与strtod相同的方式处理输入(即选择与终止点相同的位置).但是,我知道scanf应该在历史上返回不超过一个字符回到输入流.这限制了任何预先scanf可以由一个字符执行的距离.上面的例子至少需要两个字符.所以,假设我接受的是,%lf从输入流中消耗’3’和’e’.
但是我们遇到第二个问题.现在sscanf必须将“3e”转换成double类型. “3e”不是浮点常量的有效表示(根据6.4.4.2,指数值不是可选的).我希望sscanf将此输入视为错误:在%lf转换期间终止,返回0并保持d和c不变.但是,上述sscanf成功完成(返回2).
这种行为在标准库的GCC和MSVC实现之间是一致的.
所以,我的问题是,在C语言标准文档中,恰好在sscanf中,如上所述,参考以上两点:消耗多于strtod,并成功地将这样的序列转换为“3e”?
通过查看我的实验结果,我可能可以“反向工程”sscanf的行为:消耗尽可能多的“看起来”不会退步,然后将消耗的序列传递给strtod.这样,’e’被%lf消耗,然后被strtod忽略.但是正是语言规范中的一切吗?
解决方法
The strtod(),strtof(),and strtold() functions convert the initial
portion of the string pointed to by nptr to double,float,and long
double representation,respectively.The expected form of the (initial portion of the) string is optional
leading white space as recognized by isspace(3),an optional plus
(‘+’) or minus sign (‘-‘) and then either (i) a decimal number,or
(ii) a hexadecimal number,or (iii) an infinity,or (iv) a NAN
(not-a-number).A decimal number consists of a nonempty sequence of decimal digits
possibly containing a radix character (decimal point,
locale-dependent,usually ‘.’),optionally followed by a decimal
exponent. A decimal exponent consists of an ‘E’ or ‘e’,followed by an
optional plus or minus sign,followed by a nonempty sequence of
decimal digits,and indicates multiplication by a power of 10.A hexadecimal number consists of a “0x” or “0X” followed by a nonempty
sequence of hexadecimal digits possibly containing a radix character,
optionally followed by a binary exponent. A binary exponent consists
of a ‘P’ or ‘p’,followed by an optional plus or minus sign,followed
by a nonempty sequence of decimal digits,and indicates multiplication
by a power of 2. At least one of radix character and binary exponent
must be present.An infinity is either “INF” or “INFINITY”,disregarding case.
A NAN is “NAN” (disregarding case) optionally followed by ‘(‘,a
sequence of characters,followed by ‘)’. The character string
specifies in an implementation-dependent way the type of NAN.
然后我进行了一个实验,我用gcc执行下面的代码
#include <stdlib.h> #include <stdio.h> char head[1024],*tail; void core(const char *stmt){ sprintf(head,"%s",stmt); double d=strtod(head,&tail); printf("cover %s to %.2f with length=%ld.\n",head,d,tail-head); } int main(){ core("3.0x"); core("3e"); core("3ex"); core("3e0x"); return 0; }
并得到结果
cover 3.0x to 3.00 with length=3. cover 3e to 3.00 with length=1. cover 3ex to 3.00 with length=1. cover 3e0x to 3.00 with length=3.
所以,似乎应该有’e’后面的数字.
对于sscanf,我用gcc代码进行了另一个实验:
#include <stdlib.h> #include <stdio.h> char head[1024]; void core(const char *stmt){ int i;sscanf(stmt,"%x%s",&i,head); printf("sscanf %s catch %d with '%s'.\n",stmt,i,head); } int main(){ core("0"); core("0x0g"); core("0x1g"); core("0xg"); return 0; }
然后得到以下输出:
sscanf 0 catch 0 with ''. sscanf 0x0g catch 0 with 'g'. sscanf 0x1g catch 1 with 'g'. sscanf 0xg catch 0 with 'g'.
似乎sscanf会尝试更多的字符,如果它被判定为法律目前(可能与不完整的情况非法),则不会被翻转.