我在学习RegExp时遇到了很多麻烦,并提出了一个很好的算法.我有这个字符串的
HTML,我需要解析.请注意,当我解析它,它仍然是一个字符串对象,而不是浏览器上的HTML,因为我需要在它到达之前解析它. HTML看起来像这样:
<html> <head> <title>Geoserver GetFeatureInfo output</title> </head> <style type="text/css"> table.featureInfo,table.featureInfo td,table.featureInfo th { border:1px solid #ddd; border-collapse:collapse; margin:0; padding:0; font-size: 90%; padding:.2em .1em; } table.featureInfo th { padding:.2em .2em; font-weight:bold; background:#eee; } table.featureInfo td{ background:#fff; } table.featureInfo tr.odd td{ background:#eee; } table.featureInfo caption{ text-align:left; font-size:100%; font-weight:bold; text-transform:uppercase; padding:.2em .2em; } </style> <body> <table class="featureInfo2"> <tr> <th class="dataLayer" colspan="5">Tibetan Villages</th> </tr> <!-- EOF Data Layer --> <tr class="dataHeaders"> <th>ID</th> <th>Latitude</th> <th>Longitude</th> <th>Place Name</th> <th>English Translation</th> </tr> <!-- EOF Data Headers --> <!-- Data --> <tr> <!-- Feature Info Data --> <td>3394</td> <td>29.1</td> <td>93.15</td> <td>བསྡམས་གྲོང་ཚོ།</td> <td>Dam Drongtso </td> </tr> <!-- EOF Feature Info Data --> <!-- End Data --> </table> <br/> </body> </html>
我需要得到它:
3394,29.1,93.15,བསྡམས་གྲོང་ཚོ།,Dam Drongtso
基本上是一个数组…更好的如果它匹配根据它的字段标题和从哪个表他们不知何故,这样看起来像这样:
Tibetan Villages ID Latitude Longitude Place Name English Translation
发现JavaScript不支持精彩的映射是一个很大的,我有我想要的工作已经.但是它是非常严格的编码,我想我应该可以使用RegExp来处理这个更好.不幸的是,我有一个真正的困难时期:(这是我的功能来解析我的字符串(非常丑陋的IMO):
function parseHTML(html){ //Getting the layer name alert(html); //Lousy attempt at RegExp var somestring = html.replace('/m//\<html\>+\<body\>//m/',' '); alert(somestring); var startPos = html.indexOf('<th class="dataLayer" colspan="5">'); var length = ('<th class="dataLayer" colspan="5">').length; var endPos = html.indexOf('</th></tr><!-- EOF Data Layer -->'); var dataLayer = html.substring(startPos + length,endPos); //Getting the data headers startPos = html.indexOf('<tr class="dataHeaders">'); length = ('<tr class="dataHeaders">').length; endPos = html.indexOf('</tr><!-- EOF Data Headers -->'); var newString = html.substring(startPos + length,endPos); newString = newString.replace(/<th>/g,''); newString = newString.substring(0,newString.lastIndexOf('</th>')); var featureInfoHeaders = new Array(); featureInfoHeaders = newString.split('</th>'); //Getting the data startPos = html.indexOf('<!-- Data -->'); length = ('<!-- Data -->').length; endPos = html.indexOf('<!-- End Data -->'); newString = html.substring(startPos + length,endPos); newString = newString.substring(0,newString.lastIndexOf('</tr><!-- EOF Feature Info Data -->')); var featureInfoData = new Array(); featureInfoData = newString.split('</tr><!-- EOF Feature Info Data -->'); for(var s = 0; s < featureInfoData.length; s++){ startPos = featureInfoData[s].indexOf('<!-- Feature Info Data -->'); length = ('<!-- Feature Info Data -->').length; endPos = featureInfoData[s].lastIndexOf('</td>'); featureInfoData[s] = featureInfoData[s].substring(startPos + length,endPos); featureInfoData[s] = featureInfoData[s].replace(/<td>/g,''); featureInfoData[s] = featureInfoData[s].split('</td>'); }//end for alert(featureInfoData); //Put all the feature info in one array var featureInfo = new Array(); var len = featureInfoData.length; for(var j = 0; j < len; j++){ featureInfo[j] = new Object(); featureInfo[j].id = featureInfoData[j][0]; featureInfo[j].latitude = featureInfoData[j][1]; featureInfo[j].longitude = featureInfoData[j][2]; featureInfo[j].placeName = featureInfoData[j][3]; featureInfo[j].translation = featureInfoData[j][4]; }//end for //This can be ignored for now... var string = redesignHTML(featureInfoHeaders,featureInfo); return string; }//end parseHTML
因此,您可以看到该字符串中的内容是否发生变化,我的代码将被严重破坏.我想尽可能地避免,并尝试编写更好的代码.感谢您能给我的所有帮助和建议.
解决方法
您可以使用
jQuery轻松地遍历DOM并自动创建具有结构的对象.
var $dom = $('<html>').html(the_html_string_variable_goes_here); var featureInfo = {}; $('table:has(.dataLayer)',$dom).each(function(){ var $tbl = $(this); var section = $tbl.find('.dataLayer').text(); var obj = []; var $structure = $tbl.find('.dataHeaders'); var structure = $structure.find('th').map(function(){return $(this).text().toLowerCase();}); var $datarows= $structure.nextAll('tr'); $datarows.each(function(i){ obj[i] = {}; $(this).find('td').each(function(index,element){ obj[i][structure[index]] = $(element).text(); }); }); featureInfo[section] = obj; });
该代码可以在多个表内使用不同的结构,而且每个表内还有多个数据行.
featureInfo将保存最终结构和数据,并可以访问
alert( featureInfo['Tibetan Villages'][0]['English Translation'] );
要么
alert( featureInfo['Tibetan Villages'][0].id );