Most computer languages have a regular expression facility that allows you to specify a string of characters you are looking for. There is usually an OR symbol, often |, so A|B matches A or B. Then there is a quantifier usually * which means zero or more. So A* means A, AA, AAA and so on or the null string. There are also symbols which match whole sets of characters. For example, \d specifies any digit, \w specifies any character other than white space or punctuation and \s specifies white space or punctuation. There are usually more symbols so you can match complicated patterns but these three enable you to match a lot of patterns easily. For example, a file name ending in .txt is specified as:
\w*.txt
A file name ending in .txt or .bak as:
\w*.txt|\w*.bak
A name starting with A and ending with Z as:
A\w*Z
and so on.
As already mentioned, regular expressions usually allow many more types of specifiers, far too many to list here. The key point is that a regular expression is another way to specifying a string that a finite state machine can recognize, i.e. it is a regular grammar and this means there are limits on what you can do with it. In particular, you generally can’t use it to parse a real programming language and there is no point in trying.
It is also worth knowing that the "*" operator is often called the Kleene star operator after the logician Stephen Kleene. It is used generally in programming and usually means "zero or more". For example z* means zero or more z characters.
Not in this extract but included in chapter
Other Grammars
Turing Machines
Turing Thinking
Summary
The finite state machine is simple and less powerful than other machines with unbounded storage such as the Turing machine.
Finite state machines accept sequences generated by a regular grammar.
Regular expressions found in most programming languages are a practical implementation of a regular grammar.
The pushdown machine is slightly more powerful than a finite state machine and it accepts sequences generated by a context-free grammar.
A full Turing machine can accept sequences generated by a phrase structured grammar.
The different types of machine and grammars form a hierarchy of increasingly complex languages.
Although a Turing machine is more powerful than a finite state machine, in practice all machines are finite state machines.
A Turing machine with a tape limited to n cells is equivalent to a finite state machine with snx where s is the number of symbols it uses and x is the number of states in its controller.
Even though Turing machines are an abstraction and do not exist, we tend to create algorithms as if they did, making no assumptions about resource limits. If we do think of resource limits they are afterthoughts bolted on to rescue our inherently unbound algorithms.
Algorithms that are unbounded are the pure thought of computer science.
A Programmers Guide To Theory
Now available as a paperback and ebook from Amazon.
Contents
What Is Computer Science? Part I What Is Computable?
Google is making a new differential privacy library available as open source. PipelineDP4J is a Java-based library that can be used to analyse data sets while preserving privacy.
At last ISO C23 has been published, but at $250 you probably aren't going to read it. Can we really tolerate this sort of profiteering on the work of others? This is worse than academic publishing!