This is one of the XT UserStories ---------------------------------------------------------------------- RECOVERING A SYNTAX DEFINITION FOR STRATEGO ---------------------------------------------------------------------- This directory contains a syntax definition in SDF2 of the Stratego language. This file describes step by step how the syntax definition was obtained from the YACC source in the source tree of the Stratego Compiler SC. Including the search for tools, their usage, repair if necessary and the occasional implementation of a missing tool. -- Eelco Visser 1/10/2001 ---------------------------------------------------------------------- [step 1] Copy the YACC file > cp ../../sc/spec/syn/stratego.grm . [step 2] Parse the YACC file > parse -l yacc -i stratego.grm -I -o stratego.af [step 3] Translate YACC to SDF > yacc2sdf -i stratego.af -o stratego.asdf [step 4] Pretty-print the syntax definition > pp -l sdf -i stratego.asdf -o stratego.def [step 4a] Find out what is wrong > pp -h => -a switch should be used to indicate that input is abstract syntax [step 4c] Pretty-print the syntax definition with -a > pp -a -l sdf -i stratego.asdf -o stratego.def No pp entry found for: ["Definition"] rewriting failed => wrong pretty-print table [step 4d] Pretty-print the syntax definition with -a and sdf version 2.1 pp -a -l sdf -i stratego.asdf -o stratego.def -v 2.1 [step 5] Inspecting the generated syntax definition > less stratego.def => No templates for lexicals have been included. I remember that these were generated automatically from the %token declarations in the YACC file. => Yes, yacc2sdf is broken; the signature of trees produced by the parse has been changed. Repair this. => Problem was due to change in generation of constructors by sdf-cons. Adapted Yacc syntax definition; inlined injection Nmno* -> Nlist. [Step 6] Repeat steps 1 - 5 > parse -l yacc -i stratego.grm -I -o stratego.af yacc2sdf -i stratego.af -o stratego.asdf pp -a -l sdf -i stratego.asdf -o stratego.def -v 2.1 less stratego.def => Each template rule for token is put in its own lexical syntax section; merge these. Adapt yacc2sdf [Step 7] Repeat step 6 > parse -l yacc -i stratego.grm -I -o stratego.af yacc2sdf -i stratego.af -o stratego.asdf pp -a -l sdf -i stratego.asdf -o stratego.def -v 2.1 less stratego.def => Lexicals are now merged => No list detection and transformation is done by yacc2sdf. Which tool does achieve that? There used to be a deyaccification tool, which I can no longer find. Write a new tool: sdf-regularize which achieves this. Add it to the sdf-tools package. => sdf-regularize recognizes various kinds of lists and optional constructs and translates them into regular expressions. The sorts representing these regular expressions are then superfluous. By inlining their definitions the sorts are removed and the syntax definition is shortened. [Step 8] Add application of sdf-regularize and sdf-bracket > parse -l yacc -i stratego.grm -I -o stratego.af yacc2sdf -i stratego.af -o stratego.asdf sdf-regularize -i stratego.asdf -o stratego.reg.asdf sdf-bracket -i stratego.reg.asdf \ | pp -a -l sdf -o stratego.def -v 2.1 less stratego.def => The syntax definition looks good now, with much fewer productions. No definition for lexical syntax exists yet, however. [Step 9] Generate constructor annotations > parse -l yacc -i stratego.grm -I -o stratego.af yacc2sdf -i stratego.af -o stratego.asdf sdf-regularize -i stratego.asdf -o stratego.reg.asdf sdf-cons -i stratego.reg.asdf -o stratego.reg.cons.asdf sdf-bracket -i stratego.reg.cons.asdf \ | pp -a -l sdf -o stratego.def -v 2.1 less stratego.def => This produces bad results, since the heuristics are based on the literals in the productions; we need to add in the literals. Can this be done automatically? [Step 10] Find the LEX file > cp ../../sc/spec/syn/stratego.lx . => This looks like it could be translated into SDF2 mostly automatically. Let's find a Lex grammar. There is none in the grammar base. I guess we'll have to reverse engineer it. => Created a syntax definition for a subset of LEX that can deal with stripped files that only contain definitions and rules, but not arbitrary C code. The file stratego.mod.l contains the stripped off lex file for stratego. => The syntax definition for lex has been installed in the grammar base. We can now use the parse tool to parse stratego.mod.l [Step 11] Parse the LEX file > parse -l lex -i stratego.mod.l -o stratego.mod.af -I [Step 12] Translating LEX to SDF2 => There is no tool for this yet. We'll have to write it. => lex2sdf is new tool, implemented in grammar-recovery/src/yacc2sdf/ > parse -l lex -i stratego.mod.l -o stratego.mod.af -I lex2sdf -i stratego.mod.af -o stratego.mod.asdf sdf-bracket -i stratego.mod.asdf \ | pp -a -l sdf -o stratego.mod.sdf -v 2.1 less stratego.mod.sdf => This provides a good lexical syntax. Note that layout is missing. [Step 13] Recapitulation. The following actions were taken to derive the SDF2 definition so far. Note that the source file names have been renamed > mv stratego.grm stratego.l mv stratego.mod.l stratego.l => Context-free syntax > parse -l yacc -i stratego.y -I -o stratego-cfg.af yacc2sdf -i stratego-cfg.af -o stratego-cfg.asdf sdf-regularize -i stratego-cfg.asdf -o stratego-cfg.reg.asdf sdf-bracket -i stratego-cfg.reg.asdf \ | pp -a -l sdf -o stratego-cfg.def -v 2.1 less stratego-cfg.def => Lexical syntax > parse -l lex -i stratego.l -o stratego-lex.af -I lex2sdf -i stratego-lex.af -o stratego-lex.asdf sdf-bracket -i stratego-lex.asdf \ | pp -a -l sdf -o stratego-lex.def -v 2.1 less stratego-lex.def [Step 14] Combine lexical and context-free syntax => edit: remove module Lexical from stratego-cfg.def > unpack-sdf stratego-cfg.def unpack-sdf stratego-lex.def => edit: repair error in definition of Backslash > pack-sdf -i Main.sdf -I ./. -dep stratego.af [Step 14] Unfold literals; replace token names by their definitions. > pack-sdf -i Main.sdf -I ./. -dep stratego.af -o stratego.af pp -A -l sdf -i stratego.af -o stratego.def -v 2.1 > parse -l sdf -v 2.1 -i stratego.def -o stratego.af -I unfold-literal -i stratego.af -o stratego.unf.af sdf-bracket -i stratego.unf.af \ | pp -a -l sdf -o stratego.unf.def -v 2.1 less stratego.unf.def => unfold-literals did not work for all cases. Rewrote the tool using dynamic rules, which made the specification much shorter. => Use the result as the new Lexical.sdf and Stratego.sdf modules > unpack-sdf stratego.unf.def [Step 15] Clean up Lexical.sdf and Stratego.sdf manually => Fill in missing lexical syntax - layout definitions (in particular definition of literate comments) => Improve abstract syntax - unfold Optvarlist in StrategyDef - observations: often better to unfold optionals if the optional can also be expressed in terms of the - example: the condition |where id| is equivalent to a rule without condition - example: the strategy definition |f = s| is equivalent to |f() = s|, i.e., with an empty list of arguments - therefore the latter can be desugared into the former allowing uniform treatment without having to deal with None and Some constructors. - this is at the cost of extra productions, however, the *core* syntax is smaller => Add constructors > make check => Stratego files get parsed correctly it seems [Step 16] Compatability with parse-mod. To use the parser based on the SDF2 definition it should be compatible with the existing YACC based parser. => Set up a test script that checks compatability between SDF and YACC based parsers => In order to achieve this abstract syntax trees should be desugared. => In order to implement a desugarer we need the signature of trees produced by the new parser. [Step 17] Derive the signature of Stratego from the syntax definition > parse -l sdf -v 2.1 -I -i stratego.def -o stratego.adef sdf2sig -i stratego.adef -o stratego.ar ast2abox -p /home/visser/res/app/tiger/tmp/front-tiger-0.2/sig/stratego.pp -i stratego.ar -o stratego.ar.abox abox2text -i stratego.ar.abox -o stratego.r unpack stratego.r r less stratego.r => Inspecting the signature unveiled a couple of wrong constructor names (forgot about Var and SVar) [Step 18] Added reject rules for reserved words => The performance of generated parsers was not very good and that time was spent filtering syntax trees. It occurred to me that I had not declared the reserved words of the Stratego. Added productions of the from |"keyword" -> Id {reject}| for all reserved words. => Module names used in import sections are allowed to be reserved words except for "rules", "strategies", "signature", and "overlays". Created separate lexical sort ModName to reflect this. [Step 19] Add stratego.0.6.2 to the grammar base. [Step 20] Project finished. Future work: => Improve the pretty-print table to obtain beautyfier for Stratego specifications. ---- CategoryXT | CategoryUserStory | -- Main.EelcoVisser - 01 Oct 2001