From Stratego In Yacc To Stratego In SDF

XT -- A Bundle of Program Transformation Tools
This is one of the XT UserStories?
----------------------------------------------------------------------

RECOVERING A SYNTAX DEFINITION FOR STRATEGO

----------------------------------------------------------------------

This directory contains a syntax definition in SDF2 of the Stratego
language. This file describes step by step how the syntax definition
was obtained from the YACC source in the source tree of the Stratego
Compiler SC.  Including the search for tools, their usage, repair if
necessary and the occasional implementation of a missing tool.

-- Eelco Visser 1/10/2001

----------------------------------------------------------------------

[step 1] Copy the YACC file

>   cp ../../sc/spec/syn/stratego.grm .   

[step 2] Parse the YACC file

>   parse -l yacc -i stratego.grm -I -o stratego.af 

[step 3] Translate YACC to SDF

>   yacc2sdf -i stratego.af -o stratego.asdf   
  
[step 4] Pretty-print the syntax definition

>   pp -l sdf -i stratego.asdf -o stratego.def 

[step 4a] Find out what is wrong

>   pp -h  

=>    -a switch should be used to indicate that input is 
   abstract syntax

[step 4c] Pretty-print the syntax definition with -a 

>   pp -a -l sdf -i stratego.asdf -o stratego.def
   No pp entry found for: ["Definition"]
   rewriting failed 

=>    wrong pretty-print table              

[step 4d] Pretty-print the syntax definition with -a and sdf version 2.1

   pp -a -l sdf -i stratego.asdf -o stratego.def -v 2.1 

[step 5] Inspecting the generated syntax definition

>   less stratego.def   

=>    No templates for lexicals have been included. I remember that
   these were generated automatically from the %token declarations
   in the YACC file.
   
=>    Yes, yacc2sdf is broken; the signature of trees produced
   by the parse has been changed. Repair
   this.

=>    Problem was due to change in generation of constructors by
   sdf-cons. Adapted Yacc syntax definition; inlined injection
   Nmno* -> Nlist.

[Step 6] Repeat steps 1 - 5

>   parse -l yacc -i stratego.grm -I -o stratego.af 
   yacc2sdf -i stratego.af -o stratego.asdf 
   pp -a -l sdf -i stratego.asdf -o stratego.def -v 2.1  
   less stratego.def    

=>    Each template rule for token is put in its own lexical
   syntax section; merge these. Adapt yacc2sdf

[Step 7] Repeat step 6

>   parse -l yacc -i stratego.grm -I -o stratego.af 
   yacc2sdf -i stratego.af -o stratego.asdf 
   pp -a -l sdf -i stratego.asdf -o stratego.def -v 2.1  
   less stratego.def    

=>    Lexicals are now merged

=>    No list detection and transformation is done by
   yacc2sdf. Which tool does achieve that? 
    There used to be a deyaccification tool, which I can
   no longer find. 
    Write a new tool: sdf-regularize which achieves this.
   Add it to the sdf-tools package.

=>    sdf-regularize recognizes various kinds of lists and
   optional constructs and translates them into regular 
   expressions. The sorts representing these regular expressions
   are then superfluous. By inlining their definitions the
   sorts are removed and the syntax definition is shortened.

[Step 8] Add application of sdf-regularize and sdf-bracket

>   parse -l yacc -i stratego.grm -I -o stratego.af 
   yacc2sdf -i stratego.af -o stratego.asdf 
   sdf-regularize -i stratego.asdf -o stratego.reg.asdf 
   sdf-bracket -i stratego.reg.asdf \
      | pp -a -l sdf -o stratego.def -v 2.1  
   less stratego.def  

=>    The syntax definition looks good now, with much
   fewer productions. No definition for lexical syntax
   exists yet, however.

[Step 9] Generate constructor annotations

>   parse -l yacc -i stratego.grm -I -o stratego.af 
   yacc2sdf -i stratego.af -o stratego.asdf 
   sdf-regularize -i stratego.asdf -o stratego.reg.asdf 
   sdf-cons -i stratego.reg.asdf -o stratego.reg.cons.asdf
   sdf-bracket -i stratego.reg.cons.asdf \
      | pp -a -l sdf -o stratego.def -v 2.1 
   less stratego.def    

=>    This produces bad results, since the heuristics are
   based on the literals in the productions; we need
   to add in the literals. Can this be done automatically?

[Step 10] Find the LEX file

>   cp ../../sc/spec/syn/stratego.lx . 

=>    This looks like it could be translated into SDF2 mostly
   automatically.
   Let's find a Lex grammar. There is none in the grammar base.
   I guess we'll have to reverse engineer it.

=>    Created a syntax definition for a subset of LEX that can deal
   with stripped files that only contain definitions and rules,
   but not arbitrary C code. The file stratego.mod.l contains
   the stripped off lex file for stratego.

=>   The syntax definition for lex has been installed in the
   grammar base. We can now use the parse tool to parse stratego.mod.l

[Step 11] Parse the LEX file

>   parse -l lex -i stratego.mod.l -o stratego.mod.af -I  

[Step 12] Translating LEX to SDF2

=>    There is no tool for this yet. We'll have to write it.

=>    lex2sdf is new tool, implemented in grammar-recovery/src/yacc2sdf/

>   parse -l lex -i stratego.mod.l -o stratego.mod.af -I  
   lex2sdf -i stratego.mod.af -o stratego.mod.asdf 
   sdf-bracket -i stratego.mod.asdf \
      | pp -a -l sdf -o stratego.mod.sdf -v 2.1 
   less stratego.mod.sdf    

=>    This provides a good lexical syntax. Note that layout is missing.

[Step 13] Recapitulation. The following actions were taken to derive
   the SDF2 definition so far. Note that the source file names
   have been renamed

>   mv stratego.grm stratego.l
   mv stratego.mod.l stratego.l

=>    Context-free syntax

>   parse -l yacc -i stratego.y -I -o stratego-cfg.af 
   yacc2sdf -i stratego-cfg.af -o stratego-cfg.asdf 
   sdf-regularize -i stratego-cfg.asdf -o stratego-cfg.reg.asdf 
   sdf-bracket -i stratego-cfg.reg.asdf \
      | pp -a -l sdf -o stratego-cfg.def -v 2.1 
   less stratego-cfg.def

=>   Lexical syntax

>   parse -l lex -i stratego.l -o stratego-lex.af -I  
   lex2sdf -i stratego-lex.af -o stratego-lex.asdf 
   sdf-bracket -i stratego-lex.asdf \
      | pp -a -l sdf -o stratego-lex.def -v 2.1
   less stratego-lex.def

[Step 14] Combine lexical and context-free syntax

=>   edit: remove module Lexical from stratego-cfg.def

>    unpack-sdf stratego-cfg.def
    unpack-sdf stratego-lex.def

=>   edit: repair error in definition of Backslash

>    pack-sdf -i Main.sdf -I ./. -dep stratego.af

[Step 14] Unfold literals; replace token names by their definitions.

>    pack-sdf -i Main.sdf -I ./. -dep stratego.af -o stratego.af
   pp -A -l sdf -i stratego.af -o stratego.def -v 2.1
   
>   parse -l sdf -v 2.1 -i stratego.def -o stratego.af -I
   unfold-literal -i stratego.af -o stratego.unf.af
   sdf-bracket -i stratego.unf.af \
      | pp -a -l sdf -o stratego.unf.def -v 2.1
   less stratego.unf.def

=>    unfold-literals did not work for all cases. Rewrote the tool using
   dynamic rules, which made the specification much shorter.

=>    Use the result as the new Lexical.sdf and Stratego.sdf modules

>    unpack-sdf stratego.unf.def 

[Step 15] Clean up Lexical.sdf and Stratego.sdf manually

=>   Fill in missing lexical syntax
   - layout definitions (in particular definition of literate comments)

=>    Improve abstract syntax
   - unfold Optvarlist in StrategyDef
   - observations: often better to unfold optionals if the optional can
   also be expressed in terms of the 
   - example: the condition |where id| is equivalent to a rule without condition
   - example: the strategy definition |f = s| is equivalent to |f() = s|, i.e.,
     with an empty list of arguments
   - therefore the latter can be desugared into the former allowing uniform treatment
     without having to deal with None and Some constructors.
   - this is at the cost of extra productions, however, the *core* syntax is
     smaller

=>   Add constructors

>   make check

=>    Stratego files get parsed correctly it seems

[Step 16] Compatability with parse-mod. To use the parser based on the SDF2 definition
   it should be compatible with the existing YACC based parser.

=>    Set up a test script that checks compatability between SDF and YACC based parsers

=>   In order to achieve this abstract syntax trees should be desugared. 

=>    In order to implement a desugarer we need the signature of trees produced by
   the new parser.

[Step 17] Derive the signature of Stratego from the syntax definition

>    parse -l sdf -v 2.1 -I -i stratego.def -o stratego.adef
    sdf2sig -i stratego.adef -o stratego.ar
    ast2abox -p /home/visser/res/app/tiger/tmp/front-tiger-0.2/sig/stratego.pp -i  stratego.ar -o stratego.ar.abox
    abox2text -i stratego.ar.abox -o stratego.r
    unpack stratego.r r                         
   less stratego.r                    

=>    Inspecting the signature unveiled a couple of wrong constructor names (forgot about
   Var and SVar)

[Step 18] Added reject rules for reserved words

=>    The performance of generated parsers was not very good and that time was spent
   filtering syntax trees. It occurred to me that I had not declared the reserved 
   words of the Stratego. Added productions of the from |"keyword" -> Id {reject}|
   for all reserved words.

=>    Module names used in import sections are allowed to be reserved words except
   for "rules", "strategies", "signature", and "overlays". Created separate lexical
   sort ModName to reflect this.

[Step 19] Add stratego.0.6.2 to the grammar base.

[Step 20] Project finished. Future work:

=>   Improve the pretty-print table to obtain beautyfier for Stratego specifications.

CategoryXT? | CategoryUserStory? | -- EelcoVisser - 01 Oct 2001