Welcome to Linux Forums! With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.
Find the answer to your Linux question:
Site Navigation
Linux Forums
Linux Articles
Product Showcase
Linux Downloads
Linux Hosting
Free Magazines
Job Board
IRC Chat
RSS Feeds
Free Publications


This article describes some applications that support regular expressions and some simple ways of usage. A few interesting extensions for different types of applications are mentioned. Just thinking about them can give a good approximation where one might find them and possibly use them.

All programmers encounter regular expressions sooner or later. Making logical operations is common in every programming language and differs mainly in syntax. They were the first reason for making these functions common in many different types of applications. I wouldn't know anything about it, though. It is something we, the users, don't usually stumble upon. To be exact, we do encounter code bearing regexp many times, but we just might not be aware of it. When was the last time you registered to a forum or other site requiring registration and you accidentally mistyped your email address? Did the error page on the site tell you it was wrong? If it did, then the code was able to check the formatting of every single existing email address. The code running in the background could be something like: ^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,4}$

The string says nothing else then start at first character (^) which can be almost anything ([A-Za-z0-9._%-]) and can be any number of them as long as it is one or more then one (+), but they must be followed by the at (@) sign. After this sigh there can again be almost any character ([A-Za-z0-9.-]) and again any number of them except for none of them or, to be exact, zero characters (+). Whatever comes before is almost not important, but in the end these characters have to be followed by a dot (.) after which there are only letters ([A-Za-z]), but not less then two and not more then four ({2,4}). The expression says that after that the string must end ($). Easy, right?

For us, users of operating systems, programming techniques do not only seem too cryptic, but also not very useful in our everyday work. Using the same technique could be useful in let's say spreadsheets or databases, if we are making a list of some clients and also including their email addresses. We need pre-prepared usability which is to some extent given to us, but mostly we must try to work things out, if the feature we need is missing. The same thing applies to regular expressions. We search a lot for text strings or other data. Why not then use advanced patterns and make our lives easier. Applying this logic gives a bundle of opportunities to applications that support them. Mostly we are dealing with applications that are made for comparing and editing text or programs that were made for searching and other ways of managing your system. If we start by first going deep enough, leaving the user interface, we find the one, that was designed for such functions, the grep.

Grep is a command that searches the appropriate input files or standard input if no files are given, for lines of text that contain a match to the given pattern. By default, grep prints out the matching lines. In the descriptive command below grep returns all the lines that contain a search string in the file named somefile or any file.

grep search_string somefile or grep search_string *

As you can see the grep is used for searching strings of text. It is not just a search, because it does not only name the files that contain the search string, but also shows where in these files the searched strings are placed. Even though we prefer graphical interface, this one is very important especially when some changes to the system have to be done without it. Sometimes it's just enough to know, that there is an option in case of emergency. Funny enough when looking back, most of us probably already needed its help. There is also an extended version of grep, called egrep, that supports a few additional metacharacters in its regular expressions.

An important application that is executed from the command line is GNU Midnight Commander. MC is a file manager for free operating systems. It works on a wide range of operating systems that are to some extent *nix compatible. It is said that this is the most important piece of software bridging the gaps between the shell and basic user interface. It supports wildcards as a form of regular expressions while working with files and real regexps when working with text in the text editor mode or when searching for the files on your system. Very important application promoted by many users is also the VI editor, which is a screen-based text editor with many powerful features and strong support towards regular expressions.

The idea of using regular expressions to search for text can be, if we rise to graphical interface, found in some other file managers. Krusader and GNOME Commander, the members of twin panel file managers for graphical user interfaces, support regular expressions for the same reason as grep does, for searching strings of text, being in the filenames or inside the files. Krusader for instance, uses same approach and has the abilities to show where in the file this string was found. The string itself is listed for on the fly review, which allows users to redefine search pattern, if the one currently used is not exactly what they expected. In Krusader, for example, search results can even be sent to a virtual folder, which is basically just a list of files represented as being part of some folder. This way we are able to work separately with only those files we want to. It is easier this way to navigate through the files that contain searched pattern. When opening one of these files, we should not be surprised that Krusader's internal editor also supports regular expressions. On the top of all, this file manager also bundles a Krename module, that uses logical search patterns for matching and multi-renaming files. Being also a standalone application, you can use from any file manager.

To turn to more productive views of usage there is another important program supporting regular expressions. It is a full featured office suite OpenOffice.org. OpenOffice,org is a multiplatform and multilingual open-source project office suite compatible with other suites. This suite allows using regular expressions in your language and also in the same way on different operating systems, which is important since a lot of users want inter-platform compatibility. You can find regular expressions under the search and replace, but have to press "more options" button. Using them in OpenOffice.org can be very useful, since documents can have tens or hundreds of pages. There is no way that you can check all the occurrences of some pattern completely without errors by hand in a timely fashion. Same functions can be found in Koffice and Gnome office and other text editors. Very useful are also the diff applications which check and compare the contents of two files. Finding regular expressions there is not unexpected. One among many is Meld diff and merge tool which supports comparing of three documents simultaneously. Other similar applications share the same roots.

Web developers can use powerful search features in Bluefish editor. Bluefish is an editor for more experienced web designers and programmers. It supports many different programming and markup languages and syntax highlighting, especially for making dynamic and interactive websites. It is a matter of minutes when you want to clean up a html file made with some wysiwyg editor and remove unwanted tags or formatting and instead placing them into cascading style sheets. In CSS itself regular expressions allow fast replacement of colors, fonts or whole formatting strings. Not using margins? No problems, in CSS margins start with the word margin and always end, if of course the syntax is correct, with semicolon. Removing all margins, no matter how they are defined needs a simple regular expression: margin:.*;.

Again this is nothing more than searching for literal string that the tagging begins with (margin: ), then taking any character (.) and any number of them including zero (*) until we reach the semicolon (;).

Mostly you can find regular expressions on many common applications. Some of them come with a simple tutorial, which may be good for learning basics and theory. When actually using them, additional help could be welcome. When you start, you need more help on building expressions and one of the best tutoring applications is kregexpeditor, which is a part of KDE-utils. If you are running KDE, then the program might just be sitting in front of you. Multiplatform application working on any operating system that supports Java is the jregexptester, but if you just want to quickly check something you put together, then there are also some online regexp testers like reWork: regular expressions workbench, which was designed not to do the job, but test end evaluate the search string you are using.

Poking around the net you find many interesting applications. There are many that somehow adopted logical string search, among which there are some you could not have dreamt about. One example is Firefox, the web browser. It extensions system supports a few extraordinery applications that use expressions for different task, among which you can also find an extension that helps you test the regular expressions. The Regular Expressions Tester uses a three panel view, first being the search string, second the text window and third the results panel. Applying the search string lists possible matches instantly. The author is also preparing the replace functionality that will probably be integrated in the next version. In future also the submatching feature is planned.

The importance of regular expressions can also be seen in the ways that programmers try to enrich their software. Krusader Krew, for example, is considering adding support for using regular expressions in common file management functions like sort, select, delete, copy, making directory and similar. For advanced users this is a useful feature usually not found in many file managers. It would probably change a bit the whole usability paradigm, but as the philosophy states; anything as long as it is fast and reliable. The call for the implementation of such features comes from developers and managers of computer systems that need to work with multiple files following some pattern and were forced to execute each command by hand one by one or work with command line. Following the multi-rename support idea such extended implementations do not sound so far off.

We, the regular users, can probably agree that regular expressions mostly look like something we have no use for. On the other hand I'm convinced that merely playing with them will give you the real impression of their power and what you can do with them in the ways not yet imagined. The more you play, the more confident you'll become in using them in various situations. This statement is really not far fetched. Searching user forums and news groups shows that users have a lot of ideas how to make them useful. The email client Thunderbird, for example, will probably have to consider adding some basic support for regexp filtering and matching, since there were quite a few calls for such functionality. Users also find them useful for spam filtering. On the other side the Sylpheed Claws email client, built for those who want to stay on the edge of the development, reported that the extended regular expression has been supported for filtering messages.

There is really nothing else to say about it. Hopefully you accepted the vide variety of usages and broadness of different approaches for using them. Since you've reached to the end of this article, then maybe you are thinking about giving them a try, if you haven't done that yet. There are great tutorials available and many possible ways of using them, but that can not be mastered only on paper. Sometimes it pays to get dirty. I say go for it.

Rate This Article: poorexcellent
 
Comments about this article

Comment title: * please do not put your response text here