我意识到使用百分比编码可能更安全(关于旧的和不知道IRI的软件),但我正在寻找关于标准的明确答案.
到目前为止,我已经使用W3C validator进行了一些测试,并且URI中未转义的unicode字符不会触发HTML 4/5和XHTML 4/5文档类型的任何警告或错误(但当然缺少错误消息不会意味着没有错误).
至少chrome还支持原始的UTF-8 IRI,但是在触发HTTP请求之前,它们会逃脱它们.此外,我的Web服务器(lighttpd)似乎在HTTP请求中以百分比编码和未编码形式支持UTF-8字符.
解决方法
… the following href value is illegal:
<A href="http://foo.org/Håkon">...</A>
HTML5是不同的.它说IRIs are valid providing they comply with some additional conditions.
A URL is a valid URL if at least one of the following conditions
holds:
The URL is a valid URI reference [RFC3986].
The URL is a valid IRI reference and it has no query component. [RFC3987]
The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]
The URL is a valid IRI reference and the character encoding of the URL’s Document is UTF-8 or a UTF-16 encoding. [RFC3987]
XHTML 1.x遵循与HTML 4.01相同的规则.
XHTML5与HTML5相同.