On the use of Regular Expressions for Searching Text Charles L. A. Clarke and Gordon V. Cormack Department of Computer Science University of Waterloo, Waterloo, Canada Technical Report CS-95-07 February, 15, 1995 ABSTRACT The use of regular expressions to search text is well known and understood as a useful technique. It is then surprising that the standard techniques and tools prove to be of limited use for searching text formatted with SGML or other similar markup languages. Experience with structured text search has caused us to carefully re-examine the current practice. The generally accepted rule of "left-most longest match" is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner and is incidentally more simple and efficient to implement. This rule is generally applicable to any text search application.