Re: Parser combinator problem from Steffen Mazanek on 2008-02-11 (Curry mailing list)

From: Steffen Mazanek <steffen.mazanek_at_unibw.de>
Date: Mon, 11 Feb 2008 09:17:29 +0100

Hello,

thank you Bernd and Wolfgang. Due to Wolfgangs example I now
understand the problem. I have to read more about the Curry
evaluation strategy. I first thought, that it might be enough to know
that it is optimal *g*. Of course, I will switch to the more efficient
implementation suggested by Bernd.

In my application domain (graph parsing) sets of edges are parsed,
so, in a sense, there is no real sequential composition. The
right-hand side of a graph grammar rule consists of several edges
that might be consumed in any order. However, for construction of
the result and for the sake of efficiency it is still meaningful to
force a particular order of evaluation.

Thanks again,

Steffen

2008/2/8, Wolfgang Lux <wlux_at_uni-muenster.de>:
> Hello Steffen!
>
>
> > Hello,
> >
> > I have a question regarding the <*> operator (successive
> > application) known from parser combinator libraries,
> >
> > Normally, I would define it as follows:
> >
> > (<*>)::Parser (res1->res2)->Parser res1->Parser res2
> > (p1 <*> p2) g = (pv qv, g'')
> > where
> > (pv, g') = p1 g
> > (qv, g'') = p2 g'
> >
> > However, applied to real input this results in stack overflow
> > in the Münster Curry Compiler and non- or really late
> > termination in PACKS.
> >
> > If I use this formulation, no problems occur:
> >
> > (p1 <*> p2) g | p1 g =:= (pv,g') & p2 g' =:= (qv, g'') = (pv qv, g'')
> > where g',g'',pv,qv free
> >
> > However, I think, that the latter one is not as intuitive as the
> > first one, so I wonder why it performs so much better.
>
>
> Thank you for this interesting example.
>
> It is somewhat hard to tell what is going wrong in your real parser
> without knowing that parser, but given the fact that you get a stack
> overflow in MCC it may well be the case that you are using
> non-deterministic parsing combinators that perform a choice before
> consuming any input and that therefore the search space of the parser
> is explored in the wrong order.
>
> To become a little more concrete, consider the following simple
> parser which recognizes a sequence of three binary digits.
>
> end [] = ([],[])
> char x (c:cs) | x=:=c = ((x:), cs)
> digit = char '0' ? char '1'
> threeDigits = digit <*> (digit <*> (digit <*> end))
>
> Now consider this parser is applied to the string "100". The order
> in which the search space of the parser is explored with the lazy
> definition of <*> depends on whether the first or the second component
> of the parser's result tuple is evaluated first. If the second
> component is evaluated first, the lazy combinator <*> will evaluate
> the application p2 g' first. Given the above definition of digit
> this means that the parser will perform a non-deterministic choice
> for the two alternatives char '0' and char '1'. Only then will it
> enter the evaluation of the first argument of <*>. In effect, this
> means that the parser needlessly tries to match the input string
> against the sequences 000, 001, 010, and 011.
>
> Your non-lazy definition of (<*>) as well as Bernd's definition
> using case expressions simply ensures that the parser's search
> space is explored in the intended order, viz. from left to right.
>
> Regards
>
> Wolfgang
>
>

-- 
Dipl.-Inform. Steffen Mazanek
Institut für Softwaretechnologie
Fakultät Informatik
Universität der Bundeswehr München
85577 Neubiberg
Tel: +49 (0)89 6004-2505
Fax: +49 (0)89 6004-4447
E-Mail: steffen.mazanek_at_unibw.de
_______________________________________________
curry mailing list
curry_at_lists.RWTH-Aachen.DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/curry

Received on Mo Feb 11 2008 - 09:38:40 CET

This archive was generated by hypermail 2.3.0 : Do Jun 20 2024 - 07:15:09 CEST