如何使Nokogiri透明地返回未编码的Html实体?

前端之家收集整理的这篇文章主要介绍了如何使Nokogiri透明地返回未编码的Html实体?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

我怎样才能使用Nokogiri让html实体(如德语变音符号)不受影响?

即:

# this is fine
node = Nokogiri::HTML.fragment('

我试图弄乱PARSE_OPTIONS和:save_with选项,但无法想出让Nokogiri透明地表现得像上面那样的方法.

有什么指针吗?

最佳答案
好的,我的问题已由Aaron通过twitter/gist回答:

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'

# We added a contextual fragment method for the 1.4.2 release. This *might*
# work in 1.4.1. If you want to mess with 1.4.2,build from my github,or
# grab one of our nightly builds:
#
# $sudo gem install nokogiri -s http://tenderlovemaking.com/
#
# Also,libxml2 had a bug with encoding when handling UTF-8 fragments,so I
# suggest you also upgrade to libxml2 2.7.7.
#
# Hope that helps!
puts doc.fragment('
原文链接:https://www.f2er.com/html/426662.html

猜你在找的HTML相关文章