Regular Expression in compiler

Laila Alahaideb
Mar 2, 2021
1 min read

Regular Expression concept!

Regular expressions are used to characterize tokens (lexical constructs). A token characterizes a pattern of characters having the same meaning in the source program.

Regexps known as Regular expressions is a crucial notation for describing lexeme patterns! , and is composed of smaller regular expressions representing different languages (by applying defining rules).

A lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes.

A regular set is a language that a regular expression can identify.

Regular Expression Review!

Symbol: an abstract concept that we won't formally describe->(0,a,..)

Alphabet: a limited collection of symbols from which we can create greater structures -> ( Σ={0,a,…})

String: a juxtaposed, finite series of symbols from a specific alphabet-> (abcd,abd,…)

Formal language Σ*: a set of all strings that can be created from a given alphabet in formal language-> {set of all string with length2}={ab,ac,ad,….etc}

Regular Expression Rules!

Built up from three operators:

Concatenation xy

Alternation x|y (x or y)

Repetition x* (x repeated 0 or more times) OR x+ (x repeated 1 or more times)

Recursive rules :

Regular expressions can be defined in the recursive rule as:

Every symbol of Σ is a regular expression
ε is a regular expression
if r1 and r2 are regular expressions, so are (r1) r1r2 r1 | r2 r1*
Nothing else is a regular expression.

Related knowledge:

A. V. Aho and A. V. Aho, Eds., Compilers: principles, techniques, & tools, 2nd ed. Boston: Pearson/Addison Wesley, 2007.

Regular Expression in compiler

Recent Posts

Comments