这是我的第一个问题.我试图找到一个答案,但不能老实说,弄清楚我应该使用哪些术语,如果以前有人问过,那么抱歉.
在这里:
我在.txt文件中有数千条记录,格式如下:
(1,3,2,1,'John (Finances)'),(2,7,'Mary Jane'),(3,'Gerald (Janitor),Broflowski'),
… 等等.第一个值是PK,其他3个是外键,第5个是一个字符串.
我需要在Javascript中解析为JSON(或某些东西),但是我遇到麻烦,因为一些字符串带有括号逗号(在第三个记录“Janitor”,例如),所以我不能使用substring …也许修剪正确的部分,但我想知道是否有更聪明的方式来解析它.
任何帮助将非常感激.
谢谢!
解决方法
你不能(读大概不应该)使用正则表达式为此.如果括号中包含另外一个或另一对不匹配,该怎么办?
好消息是,您可以轻松构建一个tokenizer /解析器.
这个想法是跟踪你当前的状态并采取相应的行动.
这里是我刚才在这里写的解析器的草图,重点是向大家展示一般的想法.如果您有任何概念性问题,请告诉我们.
它工作demo here,但我要求你不要在生产中使用它,然后才能理解和修补它.
怎么运行的
那么,我们如何构建解析器呢?
var State = { // remember which state the parser is at. BeforeRecord:0,// at the ( DuringInts:1,// at one of the integers DuringString:2,// reading the name string AfterRecord:3 // after the ) };
var records = []; // to contain the results var state = State.BeforeRecord;
现在,我们迭代字符串,继续进行并读取下一个字符
for(var i = 0;i < input.length; i++){ if(state === State.BeforeRecord){ // handle logic when in ( } ... if(state === State.AfterRecord){ // handle that state } }
现在,剩下的就是在每个状态下将它消耗到对象中:
>如果它在(我们开始解析和跳过任何空白
>读取所有的整数和沟槽,
>四个整数之后,读取字符串从’到下一个’到达它的结尾
>字符串后,读取直到),存储对象,并重新开始循环.
实施也不是很困难.
解析器
var State = { // keep track of the state BeforeRecord:0,DuringInts:1,DuringString:2,AfterRecord:3 }; var records = []; // to contain the results var state = State.BeforeRecord; var input = " (1," // sample input var workingRecord = {}; // what we're reading into. for(var i = 0;i < input.length; i++){ var token = input[i]; // read the current input if(state === State.BeforeRecord){ // before reading a record if(token === ' ') continue; // ignore whitespaces between records if(token === '('){ state = State.DuringInts; continue; } throw new Error("Expected ( before new record"); } if(state === State.DuringInts){ if(token === ' ') continue; // ignore whitespace for(var j = 0; j < 4; j++){ if(token === ' ') {token = input[++i]; j--; continue;} // ignore whitespace var curNum = ''; while(token != ","){ if(!/[0-9]/.test(token)) throw new Error("Expected number,got " + token); curNum += token; token = input[++i]; // get the next token } workingRecord[j] = Number(curNum); // set the data on the record token = input[++i]; // remove the comma } state = State.DuringString; continue; // progress the loop } if(state === State.DuringString){ if(token === ' ') continue; // skip whitespace if(token === "'"){ var str = ""; token = input[++i]; var lenGuard = 1000; while(token !== "'"){ str+=token; if(lenGuard-- === 0) throw new Error("Error,string length bounded by 1000"); token = input[++i]; } workingRecord.str = str; token = input[++i]; // remove ) state = State.AfterRecord; continue; } } if(state === State.AfterRecord){ if(token === ' ') continue; // ignore whitespace if(token === ',') { // got the "," between records state = State.BeforeRecord; records.push(workingRecord); workingRecord = {}; // new record; continue; } throw new Error("Invalid token found " + token); } } console.log(records); // logs [Object,Object,Object] // each object has four numbers and a string,for example // records[0][0] is 1,records[0][1] is 3 and so on,// records[0].str is "John (Finances)"