SDF2 is a rich formalism for the definition of the syntax of all kinds
of computer languages. This page explores the possibilities of the
formalism by means of a number of fragments of syntax definitions from
the SDF2 Sdf.GrammarBase. In particular, the language is contrasted with
traditional formalisms that use a separate scanner to deal with
lexical syntax.

(When editing this page, uncheck the ConvertSpacesToTabs checkbox to
avoid spoiling the layout of the examples.)

*Context-free Syntax Definition*

expression grammar

*Disambiguating Expressions*


*Lexical Syntax Definition*

idenitifiers, layout

*Regular Expressions*


*Overloading Delimiters*

BibTeX is a language for the describtion of bibliographical
information such as articles and books. For example, the following
entry describes a PhD thesis.

<pre>
 @PhdThesis{Vis97.thesis,
	author = {Visser, Eelco},
	title  = {Syntax Definition for Language Prototyping},
	year	= {1997},
	month  = {September},
	school = {University of Amsterdam},
	URL	 = { http://www.cs.uu.nl/~visser/thesis/ }
 }
</pre>

In the syntax of BibTeX entries the symbol { has two meanings: (1)
indicating the start of the list of fields and (2) indicating the
start of a field body. The second kind of use requires a lexical
treatment since the body of a field consists of an arbitrary list of
characters until the closing } is found.  In an approach where scanner
and parser are separated it is not possible for the scanner to know
which kind of { is encountered. Furthermore, a field body can contain
nested occurences of { and }, which should only occur in matching
pairs. In 

Finally, the treatment of whitespace is different between entries,
between the fields of an entry and inside the body of a field.

<pre>
  context-free syntax
	 C {Entry C}* C						-&gt; Entries 
	 "@" EName "{" Key "," 
						{Field ","}* ","? 
				  "}"						 -&gt; Entry 
	 Name "=" Value						-&gt; Field
	 "{" ValWords "}"					 -&gt; Value
	 (ValWord | ("{" ValWords "}"))* -&gt; ValWords
  lexical syntax
	 ~[\{\}\ \t\n]+  -&gt; ValWord
  lexical restrictions
	 ValWord	-/- ~[\{\}\ \t\n]
</pre>

The complete syntax definition for BibTeX (that also treats double
quotes in field bodies correctly) can be found at

	 http://www.cwi.nl/~mdejonge/grammar-base/bibtex.0/index.html


*Solving Lexical Ambiguities*


*Longest Match*

follow restrictions

*Reserved Words*

in a normal scanner generator like LEX 

when combining languages we want to have separate sets of reserved words;

a COBOL reserved word should not be used as a COBOL identifier, but
might be quite usable as a SQL identifier


*Ignoring Whitespace in Lexicals*

In Fortran whitespace inside lexicals is not significant. This can be
accomodated in Trash.SDFII by using context-free syntax to define lexicals.


*Dividing a Syntax Definition into Modules*

reuse of pieces of syntax

renaming

*Combining Languages*

COBOL is a language for manipulating business information represented
by means of lists of records. COBOL programs are often mixed with
fragments from other languages. For example, SQL queries can be
embedded to access a database and CICS programs are used for process
control. It is desirable to describe the syntax of each of the
language separately and combine these descriptions as needed.

In a traditional syntax definition formalism this is not possible: (1)
The grammars restrictions such LL or LALR on which the context-free
syntax is based are not closed under composition. (2) the regular
grammars on which the definition of the lexical syntax are based are
not closed under composition either.

In practice, this translates to the following: A scanner does not
consider the context in determining the sort of a token.  Therefore,
normal scanners cannot deal with

(LEX provides a workaround by means of _modes_.)

In Trash.SDFII the syntax of the composing languages can be described in
separate modules and combined at will. For example, consider the
following fragments from a syntax definition for COBOL. (Note that the
actual combined syntax definition for COBOL, CICS and SQL combined
consists of 1600 LOC divided into 38 modules.)

<pre>
Module ID defines the syntax of identifiers. The 

 module ID
	lexical syntax
	  [0-9]* [A-Z]														-&gt; Lex-Id
	  [0-9]* [A-Z] [A-Za-z0-9\-]* [A-Za-z0-9]					-&gt; Lex-Id
	  [0-9]+ [\-] [0-9\-]* [A-Z] [A-Za-z0-9\-]* [A-Za-z0-9] -&gt; Lex-Id
	context-free syntax
	  Lex-Id -&gt; Id 
	lexical restrictions
	  Lex-Id -/- [A-Za-z0-9\-]
</pre>

Module COBOL defines the syntax of COBOL programs. The actual syntax
definition for cobol consists of 36 modules. Here only the productions
relevant for the example are shown. Note that the syntax of Picture
overlaps with the syntax for Id. This overlap is disambiguated by
context.

<pre>
 module COBOL
	imports ID %% ...
	lexical syntax
	  [0-9XxAa\(\)pZzVvSszBCRD\/\,\$\+\-\*\:]+ -&gt; Picture
	context-free syntax
	  Ident-div Env-div Data-div Proc-div				-&gt; Program
	  "DATA" "DIVISION" "." File-sec Ws-sec Link-sec -&gt; Data-div
	  "FILE" "SECTION" "." File-desc*					 -&gt; File-sec
	  "FD" Id Fd-item* "." Data-desc*					 -&gt; File-desc
	  Dd-header Dd-body*									  -&gt; Data-desc
</pre>

Module SQL defines the syntax for SQL queries. Queries are embedded
into COBOL programs by means of the keywords EXEC SQL ... END-EXEC.

<pre>
 module SQL
	lexical syntax
	  [A-Z0-9\-\_\.\:]+ -&gt; Sql-id
	context-free syntax
	  "SELECT" Distinct Select-list From-into Where Order-by -&gt; Select
	  Select																 -&gt; Sql-item
	  "EXEC" "SQL" Sql-item+ "END-EXEC" "."						-&gt; Data-desc
	  "EXEC" "SQL" Sql-item+ "END-EXEC"							 -&gt; Stat

</pre>
Module CICS defines the syntax of CICS commands and their embedding in
COBOL programs. Note that a command can have a reference to an A-exp,
which is a COBOL expression.

<pre>
 module CICS
	imports PROGRAM
	lexical syntax
	  [A-Z]+ -&gt; Cics-kw
	context-free syntax
	  Stat* "EXEC" "CICS" Cics-command Cics-opt* "." -&gt; Sentence
	  "EXEC" "CICS" Cics-command Cics-opt* "END-EXEC" -&gt; Stat
				
	  Cics-kw						-&gt; Cics-opt
	  Cics-kw "(" Cics-arg ")" -&gt; Cics-opt
	  A-exp						  -&gt; Cics-arg
	  Str							 -&gt; Cics-arg
	  "ADDRESS" "OF" A-exp	  -&gt; Cics-arg
	  "LENGTH" "OF" A-exp		-&gt; Cics-arg
	  "ABEND"						-&gt; Cics-command
	  %% etc.
</pre>