xmllint命令处理xml与html的例子(js Command-line JSON)

curl http://www.111cn.net /ip/?q= 2>/dev/null | xmllint --html --xpath "//ul[@id='csstb']" - 2>/dev/null | sed -e 's/<[^>]*>//g'

1、 --format



#xmllint --format person.xml
<?xml version="1.0"?>

2、 --noblanks


<?xml version="1.0"?>

#xmllint --noblanks person.xml
<?xml version="1.0"?>

使用scheam验证xml文件的正确性(XML Schema 是基于 XML 的 DTD 替代者)


<?xml version="1.0"?>

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="sex">
<xs:restriction base="xs:string">
<xs:enumeration value="male"/>
<xs:enumeration value="female"/>
<xs:element name="person">
<xs:element ref="name"/>
<xs:element ref="age"/>
<xs:element ref="sex"/>

#xmllint --schema person.xsd person.xml
<?xml version="1.0"?>
person.xml validates
注:默认情况下,验证后会输出验证的文件内容,可以使用 --noout选项去掉此输出,这样我们可以只得到最后的验证结果。

#xmllint --noout --schema person.xsd person.xml
person.xml validates

#xmllint --noout --schema person.xsd person.xml
person.xml:4: element age: Schemas validity error : Element 'age': 'not age' is not a valid value of the atomic type 'xs:integer'.
person.xml:5: element sex: Schemas validity error : Element 'sex': [facet 'enumeration'] The value 'test' is not an element of the set {'male','female'}.
person.xml:5: element sex: Schemas validity error : Element 'sex': 'test' is not a valid value of the local atomic type.
person.xml fails to validate

4、 关于--schema的输出


$command = "xmllint --noout --schema person.xsd person.xml";
if ($retval != 0){
echo "yeah!";
执行此代码,你会发现,你拿到的output不是错误,而是array(0) {},amazing!

因为xmllint --schema,如果验证出错误错误信息并不是通过标准输出(stdout)显示的,而是通过标准错误(stderr)进行显示的。

$command = "xmllint --noout --schema person.xsd person.xml 2>$1";

首先建立一份 xml 文档,命名为 po.xml,其内容如下:

<?xml version="1.0"?>
<purchaSEOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<comment>Hurry,my lawn is going wild!</comment>
<item partNum="872-AA">
<comment>Confirm this is electric</comment>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
</purchaSEOrder>然后为 po.xml 写的 schema 文件,取名为 po.xsd,内容如下:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
<xsd:element name="purchaSEOrder" type="PurchaSEOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaSEOrderType">
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
<xsd:attribute name="orderDate" type="xsd:date"/>
<xsd:complexType name="USAddress">
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
<xsd:attribute name="country" type="xsd:NMTOKEN"
<xsd:complexType name="Items">
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
<xsd:attribute name="partNum" type="SKU" use="required"/>
<!-- Stock Keeping Unit,a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="d{3}-[A-Z]{2}"/>
</xsd:schema>使用 xmllint 对 po.xml 文件进行校验:

$ xmllint -schema po.xsd po.xml如果无出错信息,就说明校验通过了。

The xmllint Shell



XML files are human-readable,text files so it is easy to search them from the command line using grep or from within a text editor. But if you want to do something a little more sophisticated-count the number of elements,for example-you'll need to take a different approach. You could write a transformation style sheet to extract such information but this would be overkill. It is much easier to usexmllintfrom the command line to find out this kind of information.

This command is available on Mac OS X and Linux. It is installed by default on Mac OS X and,on Linux,if it isn't already installed,you can quickly do so by installing thelibxml2package.

1. xmllint Options@H_631_301@

One of the primary uses for thexmllintcommand is to validate that an XML file is well formed and that it conforms to a specific DTD or schema; this is done by using the--validoption. If your XML file contains other XIncluded files you can also use xmllint in the following way to resolve included files and output the result to a file:

shell> xmllint --xinclude manual.xml output tmpxml

The output filetmp.xmlwill include the contents of anyxi:includeelements. Also,the--formatoption is very useful for quickly formatting files from the command line. However,the most interesting option is the--shelloption.

For a complete list of all the options available view thexmllintman page.

2. The xmllint Shell@H_631_301@

Use xmllint with the--shelloption in the following way:

shell xmlfile_name

You can use other options with the--shelloption. For example,if you wish to resolve included files,use the--xincludeoption as well.

You can display the list of the commands available from the shell by typinghelp. You should see output similar to the following:

  base         display XML base of the node
  setbase URI  change the XML base of the node
  bye          leave shell
  cat [node]   display node or current node
  cd [path]    change directory to path or to root
  dir [path]   dumps informations about the node 
  du [path]    show the structure of the subtree under 
               path or the current node
  exit         leave shell
  help         display this help
  free         display memory usage
  load [name]  load a new document with name
  ls [path]    list contents of path or the current directory
  set xml_fragment replace the current node content with the 
               fragment parsed in context
  xpath expr   evaluate the XPath expression in that context 
               and print the result
  setns nsreg  register a namespace to a prefix in the 
               XPath evaluation context
               format for nsreg is: prefix=[nsuri] 
               (i.e. prefix= unsets a prefix)
  setrootns    register all namespace found on the 
               root element the default namespace 
               if any uses 'defaultns' prefix
  pwd          display current working directory
  quit         leave shell
  save [name]  save this document to name or the original name
  write [name] write the current node to the filename
  validate     check the document for errors
  relaxng rng  validate the document against the Relax-NG schemas
  grep string  search for a string in the subtree

There are a number of relatively trivial but necessary commands such ashelpandexit. All the commands are useful but this article deals primarily with the following commands:

@H_654_403@ @H_988_404@

catnode- output all nodes below the current node


cdpath- change to another node; you can only use this command with unique nodes.


dir- dump information about the current node


xpathexpression- evaluate and print the XPath expression


setns- register a namespace


writefilename- write the current node to file

If you want to write your complete shell session to file run the shell after first issuing thescriptcommand. This can be particularly useful on Mac OS X where thewritecommand does not work.

3. Using Shell Commands@H_631_301@

When you first open the xmllint shell the cursor,/ >,indicates that you are at the root node. You will likely want to navigate to specific nodes and view the file contents below that node. You can do this with thecdandcatcommands.

/  cd optionsoption[@name= 'address_metrics_lifetime']
option >

On success the cursor changes to the name of the current node. To view the current node,use thecatcommand-this displays output to the screen. To create a text file of the output of cat,usewritefile_name.xml.

You can only usecdto navigate to unique nodes. Attempt to navigate to a non-unique node and you will see output such as the following:

option is a 353 NodeSet

If there is no unique identifier for the node that you wish to navigate to,you can use a subscript in the following way:


To output information about the current node use thedircommand:

option  dir 
ELEMENT option
4. Working with Multiple Files@H_631_301@

You can open the xmllint shell specifying multiple files but the behavIoUr is not intuitive. In the following example,the shell is opened with two different files that have the same structure. Theoptions.xmlhas a root element<options>with 353<option>s while thesmpp_options.xmlhas a root element<options>containing only 57<option>s.

shell optionsxml smpp_optionsxml
options xpath count(//)
Object is a number : 
smpp_options57 setbase optionsxml

If you invokehelpfrom the shell thebyecommand is tersely described asleave shell. As this sequence of commands shows,monospace; font-size:16px; font-weight:bold">byealso exits the first file passed to the--shelloption.

Once you have exited the first shell,you cannot return to it by usingsetbaseeven though the command seems to have performed it's function-as the output ofbaseerroneously indicates. For this reason it is perhaps less confusing to open the shell specifying only one file and then use the load command to switch to a different file:

 load smpp_options57

The second count indicates that the load command executed successfully.

5. Using Namespaces@H_631_301@

To this point none of the examples use namespaces. To use an XML file with namespaces you must use thesetnscommand. Use it in the following way:

xinclude shell manualxml 
 setns xhttp://docbookorgnsdocbook
namespace xml hrefwwww3XML/1998namespace
chapter  dir
ELEMENT chapter

Thedircommand shown above confirms that you have navigated to the specified node. From that node you can executexpathcommands using absolute or relative paths.

chapter (/]/section15refentry135'structs'18140(135

There are 15 sections in theapischapter and these 15 sections have 135 refentries. Note the difference in output between the paths//x:section/x:refentryandx:section/x:refentry. The difference in output shows that only the latter is relative to the current node.

When your XML file uses IDs,an easier way to navigate is to use theidfunction:

 chapter  xpath id( is a Set contains 1 nodes
  ELEMENT chapter
 cd id)/135
$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")$ echo $dt1967-0813
<root> <FIToFICstmrDrctDbt><GrpHdr><MsgId>A</MsgId><CreDtTm>2001-12-17T09:30:47</CreDtTm><NbOfTxs>0</NbOfTxs><TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt><IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt><SttlmInf><SttlmMtd>CLRG</SttlmMtd><ClrSys><Prtry>xx</Prtry></ClrSys></SttlmInf><InstgAgt><FinInstnId><BIC>AAAAAAAAAAA</BIC></FinInstnId></InstgAgt></GrpHdr></FIToFICstmrDrctDbt></root>
