Flex and Bison

Purpose of Flex and Bison

Flex and Bison allow for the generation of programs that process structured data. A prime example for such programs are interpreters or compilers of programming languages. Other applications are possible such as a loader for specific file formats.

Flex is a successor of lex. They are both lexer generators.

Bison is a successor of Yacc. They are both parser generators.

A lexer reads a string of characters and given rules, returns token matched from the character stream. Token are groups of characters. The rules applied to convert characters into token are defined by the programmer. The file ending .l is usually used for flex files.

A parser takes a stream of token and given rules, recognized constructs in the language it is supposed to parse. A parser could for example recognize function definitions or definitions of if-statements. Parts or rules or entire rules can be bound to actions. As soon as a rule is recognized, the action is executed. Actions could be code generation, for example assembler code could be output when a rule was parsed.

Usage of Flex and Bison

When compiling, the first step is to use the bison parser generator on the .y file. This will generate a .c and a .h file.

The second step is to use the flex lexer generator on the .l file. The .l file has to import the .h file that was generated by the bison run in the first step. The reason for importing that generated header is that the header defines constants for token that have to be returned by the generated lexer.

This means that first the grammar rules are processed and then after that the lexer rules are processed. This is kind of backwards. One would expect that first the token are defined and then the rules are build on top of the available token. The process is more or less backwards. You defined the rules using token and at that point you do not care how the token are defined. In the second step, when you know which token the grammar rules need, you define what the token look like by designing appropriate lexer rules.

Gotchas

On MacOS you have to link agains -ll instead of -lfl

On MacOS, your parser file has to define yyerror() and yylex()

%{
   #include <stdio.h>
   void yyerror(const char* msg) {
   	  printf("bla %s\n", msg);
      fprintf(stderr, "bli %s\n", msg);
   }
   int yylex();
%}

 

Leave a Reply