**Regular Expression concept! **

**Regular expressions** are used to characterize tokens (lexical constructs). A token characterizes a pattern of characters having the same meaning in the source program.

**Regexps** known as **Regular expressions** is a crucial notation for describing** lexeme** patterns! , and is composed of smaller regular expressions representing different languages (by applying defining rules).

**A lexical analyzer** reads the stream of characters making up the source program and groups the characters into meaningful sequences called **lexemes.**

A **regular set **is a language that a regular expression can identify.

**Regular Expression Review!**

**Symbol**: an abstract concept that we won't formally describe->(0,a,..)

**Alphabet**: a limited collection of symbols from which we can create greater structures -> ( Σ={0,a,…})

**String**: a juxtaposed, finite series of symbols from a specific alphabet-> (abcd,abd,…)

**Formal language Σ***: a set of all strings that can be created from a given alphabet in formal language-> {set of all string with length2}={ab,ac,ad,….etc}

**Regular Expression Rules!**

Built up from three operators:

**Concatenation** xy

**Alternation** x|y (x or y)

**Repetition** x* (x repeated 0 or more times) **OR **x+ (x repeated 1 or more times)

**Recursive rules :**

Regular expressions can be defined in the recursive rule as:

Every symbol of Σ is a regular expression

ε is a regular expression

if r1 and r2 are regular expressions, so are (r1) r1r2 r1 | r2 r1*

Nothing else is a regular expression.

Related knowledge:

A. V. Aho and A. V. Aho, Eds., *Compilers: principles, techniques, & tools*, 2nd ed. Boston: Pearson/Addison Wesley, 2007.

** **

## Comments