On managing grammar ambiguity

Carlos Eduardo Sanchez Torres
3 min readSep 22, 2022

You want to build a language. You would like to develop a software tool called parser. You’re constructing a compiler. Yes, you’re the odd one out who constructs compilers for fun.

You’re starting to model context-free grammar when an undecidable problem happens. Parsers easily lead to ambiguous grammar.

Syntax level toughs you with ambiguity. Needless to say, they are undesirable since ambiguity needs manual answers.

What is ambiguity?

When a grammar has a terminal string that generates more than one parse tree, it is ambiguous.

We know to determine whether grammar is ambiguous or not is, in general, undecidable. Since 1962 researchers knew it.

In practice, some compilers remove and detect particular grammar ambiguity.

Once upon a time…

In early times the compilers, Knuth discovered the LR(k) condition in 1965 which is a test for unambiguity. LR(k) as a parser analyzes deterministic context-free languages in linear time. It is a recursive descendent parser without backtracking.

At the same time, Knuth and others invented subclasses of unambiguous context-free grammar. Examples are LR-Regular, LL(K), LR(k), or LALL(K) with their respective parsers.

Nowadays, those subsets are not enough. Some real-world parsers -Bison, SDF, and Elkhound- have to support general context-free grammar. e.g. palindrome structures used in bioinformatics and Boltzmann grammars need to.

State-of-art on managing grammar ambiguity

We manage grammar ambiguity with exhaustive methods, and hybrid approaches.

The following are some of those methods:

AMBER is an exhaustive method that generates strings to uncover ambiguity. It leads to infinite state spaces \cite{b1}.

ACLA is an exhaustive method that transforms grammar into a new alternative. The new one accepts a superset of the original \cite{b1}.

Basten’s hybrid approach applies canonical testing, before running Amber \cite{b1}.

CFGAnalyzer is a hybrid approach that uses an SAT solver to explore strings \cite{b1}.

SinBAD is a non-deterministic hybrid approach that explores breath rather than depth \cite{b1}.

Resolvable ambiguity \cite{b3} is a hybrid approach based on property-based testing.

Walking on SR-automata \cite{b4} is a non-deterministic hybrid approach based on LR parsing tables using SR-automata.

CYK algorithm \cite{b5} is a deterministic approach using CYK, a membership algorithm.

References

\begin{thebibliography}{}
\bibitem{b1} Vasudevan, N., Tratt, L. (2013). Detecting Ambiguity in Programming Language Grammars. In: Erwig, M., Paige, R.F., Van Wyk, E. (eds) Software Language Engineering. SLE 2013. Lecture Notes in Computer Science, vol 8225. Springer, Cham. https://doi.org/10.1007/978-3-319-02654-1\_9
\bibitem{b2} M. E. Aprea and M. de Fátima Mastroianni, “Syntactic analysis on programming languages: A simple approach employing automata theory,” IEEE CACIDI 2016 — IEEE Conference on Computer Sciences, 2016, pp. 1–6, doi: 10.1109/CACIDI.2016.7785995.
\bibitem{b3} Viktor Palmkvist, Elias Castegren, Philipp Haller, and David Broman. 2021. Resolvable ambiguity: principled resolution of syntactically ambiguous programs. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction (CC 2021). Association for Computing Machinery, New York, NY, USA, 153–164. https://doi.org/10.1145/3446804.3446846
\bibitem{b4} Quaglia, P. (2019). Walking on SR-automata to detect grammar ambiguity. arXiv preprint arXiv:1902.02439.
\bibitem{b5} R. C. Dharmik, R. S. Bhanuse and A. D. Gaikwad, “Ambiguity of Context Free Grammar using the CYK algorithm,” 2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave), 2016, pp. 1–6, doi: 10.1109/STARTUP.2016.7583989.
\bibitem{b6} C. Brabrand, R. Giegerich, kal A. Moller, ‘Analyzing ambiguity of context-free grammars’, Science of Computer Programming, p. 75, p. 3, p. 176–191, 2010.
\bibitem{b7} B. Basten and T. van der Storm, “AMBIDEXTER: Practical Ambiguity Detection,” 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation, 2010, pp. 101–102, doi: 10.1109/SCAM.2010.21.
\end{thebibliography}

--

--

Carlos Eduardo Sanchez Torres

A computer scientist for want of a better word. GDSC Lead. Ignoramus et ignorabimus. GNU, or not GNU, that is the question.