我试图使用Scanner的正则表达式来匹配文件中的字符串.正则表达式适用于该行以外的所有内容:
- DNA="ITTTAITATIATYAAAYIYI[....]ITYTYITTIYAIAIYIT"
在实际文件中,省略号代表数千个字符.
这是循环:
- while (scanFile.hasNextLine()) {
- final String currentLine = scanFile.findInLine(".*");
- System.out.println("trying to match '" + currentLine + "'");
- Scanner internalScanner = new Scanner(currentLine);
- String matchResult = internalScanner.findInLine(Constants.ANIMAL_INFO_REGEX);
- assert matchResult != null : "there's no reason not to find a match";
- matches.put(internalScanner.match().group(1),internalScanner.match().group(2));
- scanFile.nextLine();
- }
和正则表达式:
- static final String ANIMAL_INFO_REGEX = "([a-zA-Z]+) *= *\"(([a-zA-Z_.]| |\\.)+)";
这是失败追踪:
- java.lang.StackOverflowError
- at java.util.regex.Pattern$CharProperty.match(Pattern.java:3360)
- at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
- at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
- at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
- at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
- at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
- at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
- at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
- at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
- at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
- at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
- at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
- at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
- at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
- at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
- at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
- at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
- at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
- ...etc (it's all regex).
非常感谢!
这看起来像
bug 5050507.我同意Asaph的说法,取消交替应该有所帮助;该bug专门说“尽可能避免交替”.我想你可能会更简单:
- "^([a-zA-Z]+) *= *\"([^\"]+)"