Table of Contents
This chapter explains how to use the GLR parsing extension,
    which allows Happy to parse ambiguous
    grammars and produce useful results.
    This extension is triggered with the --glr flag,
    which causes Happy
    to use a different driver for the LALR(1) parsing
    tables. The result of parsing is a structure which encodes compactly
    all of the possible parses.
    There are two options for how semantic information is combined with
    the structural information.
    
    This extension was developed by Paul Callaghan and Ben Medlock
    (University of Durham). It is based on the structural parser
    implemented in Medlock's undergraduate project, but significantly
    extended and improved by Callaghan.
    Bug reports, comments, questions etc should be sent to
    <P.C.Callaghan@durham.ac.uk>.
    Further information can be found on Callaghan's
    GLR parser
    page.
    
      Here's an ambiguous grammar. It has no information about the
      associativity of +, so for example,
      1+2+3 can be parsed as
      (1+(2+3)) or ((1+2)+3).
      In conventional mode, Happy,
      would complain about a shift/reduce
      conflict, although it would generate a parser which always shifts
      in such a conflict, and hence would produce only
      the first alternative above.
      
      E -> E + E
      E -> i       -- any integer
      
      GLR parsing will accept this grammar without complaint, and produce
      a result which encodes both alternatives
      simultaneously. Now consider the more interesting example of
      1+2+3+4, which has five distinct parses -- try to
      list them! You will see that some of the subtrees are identical.
      A further property of the GLR output is that such sub-results are
      shared, hence efficiently represented: there is no combinatorial
      explosion.
      Below is the simplified output of the GLR parser for this example.
      
	Root (0,7,G_E)
	(0,1,G_E)     => [[(0,1,Tok '1'))]]
	(0,3,G_E)     => [[(0,1,G_E),(1,2,Tok '+'),(2,3,G_E)]]
	(0,5,G_E)     => [[(0,1,G_E),(1,2,Tok '+'),(2,5,G_E)]
	                 ,[(0,3,G_E),(3,4,Tok '+'),(4,5,G_E)]]
	(0,7,G_E)     => [[(0,3,G_E),(3,4,Tok '+'),(4,7,G_E)]
	                 ,[(0,1,G_E),(1,2,Tok '+'),(2,7,G_E)]
	                 ,[(0,5,G_E),(5,6,Tok '+'),(6,7,G_E)]}]
	(2,3,G_E)     => [[(2,3,Tok '2'))]}]
	(2,5,G_E)     => [[(2,3,G_E),(3,4,Tok '+'),(4,5,G_E)]}]
	(2,7,G_E)     => [[(2,3,G_E),(3,4,Tok '+'),(4,7,G_E)]}
	                 ,[(2,5,G_E),(5,6,Tok '+'),(6,7,G_E)]}]
	(4,5,G_E)     => [[(4,5,Tok '3'))]}]
	(4,7,G_E)     => [[(4,5,G_E),(5,6,Tok '+'),(6,7,G_E)]}]
	(6,7,G_E)     => [[(6,7,Tok '4'))]}]
      
      This is a directed, acyclic and-or graph.
      The node "names" are of form (a,b,c)
      where a and b
      are the start and end points (as positions in the input string)
      and c is a category (or name of grammar rule).
      For example (2,7,G_E) spans positions 2 to 7
      and contains analyses which match the E
      grammar rule.
      Such analyses are given as a list of alternatives (disjunctions),
      each corresponding to some use of a production of that
      category, which in turn are a conjunction of sub-analyses,
      each represented as a node in the graph or an instance of a token.
      
      Hence (2,7,G_E) contains two alternatives,
      one which has (2,3,G_E) as its first child
      and the other with (2,5,G_E) as its first child,
      respectively corresponding to sub-analyses
      (2+(3+4)) and ((2+3)+4).
      Both alternatives have the token + as their
      second child, but note that they are difference occurrences of
      + in the input!
      We strongly recommend looking at such results in graphical form
      to understand these points. If you build the
      expr-eval example in the directory
      examples/glr (NB you need to use GHC for this,
      unless you know how to use the -F flag for Hugs),
      running the example will produce a file which can be viewed with
      the daVinci graph visualization tool.
      (See http://www.informatik.uni-bremen.de/~davinci/
       for more information. Educational use licenses are currently
	available without charge.)
      
The GLR extension also allows semantic information to be attached to productions, as in conventional Happy, although there are further issues to consider. Two modes are provided, one for simple applications and one for more complex use. See Section 3.3, “Including semantic results”. The extension is also integrated with Happy's token handling, e.g. extraction of information from tokens.
One key feature of this implementation in Haskell is that its main result is a graph. Other implementations effectively produce a list of trees, but this limits practical use to small examples. For large and interesting applications, some of which are discussed in Section 3.4.3, “Some Applications of GLR parsing”, a graph is essential due to the large number of possibilities and the need to analyse the structure of the ambiguity. Converting the graph to trees could produce huge numbers of results and will lose information about sharing etc.
One final comment. You may have learnt through using yacc-style tools that ambiguous grammars are to be avoided, and that ambiguity is something that appears only in Natural Language processing. This is definitely not true. Many interesting grammars are ambiguous, and with GLR tools they can be used effectively. We hope you enjoy exploring this fascinating area!