我正在尝试通过实现一个小的正则表达式解析器来学习Parsec.在BNF中,我的语法看起来像:
EXP : EXP * | LIT EXP | LIT
我试图在Haskell中实现这个:
expr = try star <|> try litE <|> lit litE = do c <- noneOf "*" rest <- expr return (c : rest) lit = do c <- noneOf "*" return [c] star = do content <- expr char '*' return (content ++ "*")
这里有一些无限循环(例如expr – > star – > expr,不消耗任何标记),这使得解析器永远循环.我不确定如何修复它,因为star的本质是它最终会消耗它的强制令牌.
有什么想法吗?
你应该使用Parsec.Expr.buildExprParser;它非常适合这个目的.您只需描述您的运算符,它们的优先级和关联性,以及如何解析原子,组合器为您构建解析器!
原文链接:https://www.f2er.com/regex/356985.html您可能还希望添加使用parens对术语进行分组的功能,以便您可以将*应用于多个文字.
这是我的尝试(我投入|,和?为了好的衡量标准):
import Control.Applicative import Control.Monad import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Expr data Term = Literal Char | Sequence [Term] | Repeat (Int,Maybe Int) Term | Choice [Term] deriving ( Show ) term :: Parser Term term = buildExpressionParser ops atom where ops = [ [ Postfix (Repeat (0,Nothing) <$char '*'),Postfix (Repeat (1,Nothing) <$char '+'),Postfix (Repeat (0,Just 1) <$char '?') ],[ Infix (return sequence) AssocRight ],[ Infix (choice <$char '|') AssocRight ] ] atom = msum [ Literal <$> lit,parens term ] lit = noneOf "*+?|()" sequence a b = Sequence $(seqTerms a) ++ (seqTerms b) choice a b = Choice $(choiceTerms a) ++ (choiceTerms b) parens = between (char '(') (char ')') seqTerms (Sequence ts) = ts seqTerms t = [t] choiceTerms (Choice ts) = ts choiceTerms t = [t] main = parseTest term "he(llo)*|wor+ld?"