SDF2 is a rich formalism for the definition of the syntax of all kinds of computer languages. This page explores the possibilities of the formalism by means of a number of fragments of syntax definitions from the SDF2 Sdf.GrammarBase. In particular, the language is contrasted with traditional formalisms that use a separate scanner to deal with lexical syntax. (When editing this page, uncheck the ConvertSpacesToTabs checkbox to avoid spoiling the layout of the examples.) *Context-free Syntax Definition* expression grammar *Disambiguating Expressions* *Lexical Syntax Definition* idenitifiers, layout *Regular Expressions* *Overloading Delimiters* BibTeX is a language for the describtion of bibliographical information such as articles and books. For example, the following entry describes a PhD thesis.
 @PhdThesis{Vis97.thesis,
	author = {Visser, Eelco},
	title  = {Syntax Definition for Language Prototyping},
	year	= {1997},
	month  = {September},
	school = {University of Amsterdam},
	URL	 = { http://www.cs.uu.nl/~visser/thesis/ }
 }
In the syntax of BibTeX entries the symbol { has two meanings: (1) indicating the start of the list of fields and (2) indicating the start of a field body. The second kind of use requires a lexical treatment since the body of a field consists of an arbitrary list of characters until the closing } is found. In an approach where scanner and parser are separated it is not possible for the scanner to know which kind of { is encountered. Furthermore, a field body can contain nested occurences of { and }, which should only occur in matching pairs. In Finally, the treatment of whitespace is different between entries, between the fields of an entry and inside the body of a field.
  context-free syntax
	 C {Entry C}* C						-> Entries 
	 "@" EName "{" Key "," 
						{Field ","}* ","? 
				  "}"						 -> Entry 
	 Name "=" Value						-> Field
	 "{" ValWords "}"					 -> Value
	 (ValWord | ("{" ValWords "}"))* -> ValWords
  lexical syntax
	 ~[\{\}\ \t\n]+  -> ValWord
  lexical restrictions
	 ValWord	-/- ~[\{\}\ \t\n]
The complete syntax definition for BibTeX (that also treats double quotes in field bodies correctly) can be found at http://www.cwi.nl/~mdejonge/grammar-base/bibtex.0/index.html *Solving Lexical Ambiguities* *Longest Match* follow restrictions *Reserved Words* in a normal scanner generator like LEX when combining languages we want to have separate sets of reserved words; a COBOL reserved word should not be used as a COBOL identifier, but might be quite usable as a SQL identifier *Ignoring Whitespace in Lexicals* In Fortran whitespace inside lexicals is not significant. This can be accomodated in Trash.SDFII by using context-free syntax to define lexicals. *Dividing a Syntax Definition into Modules* reuse of pieces of syntax renaming *Combining Languages* COBOL is a language for manipulating business information represented by means of lists of records. COBOL programs are often mixed with fragments from other languages. For example, SQL queries can be embedded to access a database and CICS programs are used for process control. It is desirable to describe the syntax of each of the language separately and combine these descriptions as needed. In a traditional syntax definition formalism this is not possible: (1) The grammars restrictions such LL or LALR on which the context-free syntax is based are not closed under composition. (2) the regular grammars on which the definition of the lexical syntax are based are not closed under composition either. In practice, this translates to the following: A scanner does not consider the context in determining the sort of a token. Therefore, normal scanners cannot deal with (LEX provides a workaround by means of _modes_.) In Trash.SDFII the syntax of the composing languages can be described in separate modules and combined at will. For example, consider the following fragments from a syntax definition for COBOL. (Note that the actual combined syntax definition for COBOL, CICS and SQL combined consists of 1600 LOC divided into 38 modules.)
Module ID defines the syntax of identifiers. The 

 module ID
	lexical syntax
	  [0-9]* [A-Z]														-> Lex-Id
	  [0-9]* [A-Z] [A-Za-z0-9\-]* [A-Za-z0-9]					-> Lex-Id
	  [0-9]+ [\-] [0-9\-]* [A-Z] [A-Za-z0-9\-]* [A-Za-z0-9] -> Lex-Id
	context-free syntax
	  Lex-Id -> Id 
	lexical restrictions
	  Lex-Id -/- [A-Za-z0-9\-]
Module COBOL defines the syntax of COBOL programs. The actual syntax definition for cobol consists of 36 modules. Here only the productions relevant for the example are shown. Note that the syntax of Picture overlaps with the syntax for Id. This overlap is disambiguated by context.
 module COBOL
	imports ID %% ...
	lexical syntax
	  [0-9XxAa\(\)pZzVvSszBCRD\/\,\$\+\-\*\:]+ -> Picture
	context-free syntax
	  Ident-div Env-div Data-div Proc-div				-> Program
	  "DATA" "DIVISION" "." File-sec Ws-sec Link-sec -> Data-div
	  "FILE" "SECTION" "." File-desc*					 -> File-sec
	  "FD" Id Fd-item* "." Data-desc*					 -> File-desc
	  Dd-header Dd-body*									  -> Data-desc
Module SQL defines the syntax for SQL queries. Queries are embedded into COBOL programs by means of the keywords EXEC SQL ... END-EXEC.
 module SQL
	lexical syntax
	  [A-Z0-9\-\_\.\:]+ -> Sql-id
	context-free syntax
	  "SELECT" Distinct Select-list From-into Where Order-by -> Select
	  Select																 -> Sql-item
	  "EXEC" "SQL" Sql-item+ "END-EXEC" "."						-> Data-desc
	  "EXEC" "SQL" Sql-item+ "END-EXEC"							 -> Stat

Module CICS defines the syntax of CICS commands and their embedding in COBOL programs. Note that a command can have a reference to an A-exp, which is a COBOL expression.
 module CICS
	imports PROGRAM
	lexical syntax
	  [A-Z]+ -> Cics-kw
	context-free syntax
	  Stat* "EXEC" "CICS" Cics-command Cics-opt* "." -> Sentence
	  "EXEC" "CICS" Cics-command Cics-opt* "END-EXEC" -> Stat
				
	  Cics-kw						-> Cics-opt
	  Cics-kw "(" Cics-arg ")" -> Cics-opt
	  A-exp						  -> Cics-arg
	  Str							 -> Cics-arg
	  "ADDRESS" "OF" A-exp	  -> Cics-arg
	  "LENGTH" "OF" A-exp		-> Cics-arg
	  "ABEND"						-> Cics-command
	  %% etc.