You are on page 1of 7

Express Yourself Regularly

Applies To:
ABAP, Netweaver 2004s

Summary
One can use regular expressions in ones code with Netweaver 2004s. However, since not everybody is acquainted with this relatively old technology, this tutorial will try to explain the basics. By: Eddy De Clercq Company: Katholieke Universiteit Leuven Date: 01 April 2006

Having simple tastes in life I seek happiness in small things. That is also the case when a new SAP release is announced. I was happy when Karl Kessler said that ABAP wasnt locked out from regular expressions. In fact, I was more than satisfied when I saw RE as one of the features of 2004s. Why am I so happy? When I wrote the BSP port for the Honeypot Project a year ago I moaned about the fact that something simple like skipping out non-(alpha)numeric characters couldnt be done in a one-liner. Whereas this can be done within PHP via ereg_replace("[^a-zA-Z0-9]","",$contents) ABAP/BSP needs this code WHILE origStr IS NOT INITIAL. IF origStr(1) CA '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'. CONCATENATE newStr origStr(1) INTO newStr. ENDIF. origStr = origStr+1(*). ENDWHILE. You cannot call this very elegant, can you? Luckily with Netweaver 2004s this is now history.

Mathematics
Most people know regular expressions from Ken Thompsons QED editor, but the godfather of regular expressions is Stephen Kleene, who defined a notation he called the algebra of regular sets. Some of you might know that the * wildcard in a search is also called the Kleene Star. Kleene is also known for his work on recursion together with people like Turing. Being a mathematician he laid the basis for theoretical science, as we know it. There is a drawback though. As with many powerful and small footnote things it can get rather complicated. On top of that, there are different types of regular expressions: Perl, Tcl, Python, etc. each have their own version. SAP makes use of

2005 SAP AG

Express Yourself Regularly

the POSIX variant a.k.a modern extended regular expressions. I agree that it is all a bit confusing and even I sometimes need to read/think things over twice in order to understand them thoroughly. I therefore decided to make a tutorial that could be very useful for the SDN community. Sure, there is already a lot of reference material available. Most of it just sums up the semantics though. I thought it would be nice if things were explained by way of some examples.

Traditional expressions
In this part we will discuss the basic principles. Lets take SAP Developers Network is the place to be for SAP Developers. as the text that you will be looking at. Lets start searching. As such you just need to put the text that youre looking for as a search pattern. A pattern is always built up character by character. The Regex jargon for this is literal characters. A first example. SAP will match for SAP Developers Network is the place to be for SAP Developers. It seems obvious that this matches, but it isnt as obvious as you might think at first glance. It depends a bit on the Regex engine. There are text-directed engines, and regex-directed engines also known as respectively DFA and NFA engines. In addition, if thats not enough there is also POSIX NFA. Since the POSIX standard is supported in NW2004s, we will concentrate on the NFA engine. The basic rule is that it is always the leftmost match that will be returned. That means that the engine starts by trying to match the first character of the search string with the first character in the text that it is searching in. If that doesnt match itll go on to the next character in the text that it is searching in, and then the next, until it finds a first match. When it finds a match for this first character itll continue with the next character of the search string and so on and so forth until all the possible permutations have been carried out. If we apply this to the above example, it works like this. First the S character is checked. It has a match, thus it continuous with the A of the search string and so forth. As with many of these things, the search string is case sensitive. Sap will not match Each character is significant, thus spaces are too SAP Dev will match for SAP Developers Network is the place to be for SAP Developers. SAPDev will not match If you dont know which character will follow, you need to put a replacement character in brackets, for example a full stop (.). It can be used to match any character. SAP.Dev will match for SAP Developers Network is the place to be for SAP Developers. since the space is replaced by a full stop If you want to search for the full stop itself youll need to escape it. Escaping is done via a backslash (\) . will match Developers. for SAP Developers Network is the place to be for SAP

\. will match result for the full stop 2005 SAP AG 2

Express Yourself Regularly

.\. will match for SAP Developers Network is the place to be for SAP Developers. \.. will not match since there is no character after the full stop in the text You can specify whether things must be at the start (^) or at the end ($) of the line. ^SAP will match for SAP Developers Network is the place to be for SAP Developers. Developers\.$ will match for SAP Developers Network is the place to be for SAP Developers. Developers$ will not match since the full stop, which is at the end of the text, was omitted

As with the full stop, and all other special characters, you need to escape it if you want to look for ^, $ or \ itself. ^ also has another meaning when used in conjunction with square brackets []. Inside those square brackets you can provide a list of characters. The characters between square brackets are often called character sets or character classes. For the purpose of clarity, I will show all matches and not the first match as I did before. [SDN] will match for SAP Developers Network is the place to be for SAP Developers. [EDC] will not match If such a character set starts with ^, the specified characters will not be used to match. In other words itll match everything but the characters specified [^SDN] will match for SAP Developers Network is the place to be for SAP Developers. In order to prevent having to provide all the characters, as in the old ABAP way, you can define ranges with the hyphen -. ABCDEFGHIJKLMNOPQRSTUVWXYZ can thus be shortened to A-Z, abcdefghijklmnopqrstuvwxyz to a-z (things are case sensitive remember) and 0123456789 to 0-9. [A-Z] will match for SAP Developers Network is the place to be for SAP

Developers.
You can also combine character sets with plain characters. [DN]e will match for SAP Developers Network is the place to be for SAP Developers.

2005 SAP AG

Express Yourself Regularly

Repetitions
Repetitions is a very interesting feature that enables you to specify how many times a character and/or character class needs to be matched. There are a couple of meta characters to enable this: ? indicates that the preceding character is optional, meaning that it can occur 0, once or several times. Developers? will match for SAP Developers Network is the place to be for SAP Developers. Developers? will also match for SAP Developers Network is the place to be for SAP Developers. * indicates that the preceding character can occur any number of times, meaning 0 to n time(s) De*v will match for SAP Developers Network is the place to be for SAP

Developers.
Da*v will match for SAP Developers Network is the place to be for SAP Developers. + indicates that the preceding character needs to occur at least once De+v will match for SAP Developers Network is the place to be for SAP Developers. Da+v will not match

You can be even more specific in determining how many times a character may occur. This can be done via curly brackets {}. Within these curly brackets, you can set the minimum and maximum occurrences -> {X,Y} where X is the minimum and Y the maximum. Y is optional, thus {X} and {X,} are allowed. .{10} will match for SAP Developers Network is the place to be for SAP Developers. [A-Z]{1,5} will match for SAP Developers Network is the place to be for SAP Developers. [A-Z]{5} will not match This means that the previous mentioned meta characters can be replaced by curly brackets Developers{0,1} and Developers? will match for SAP Developers Network is the place to be for SAP Developers. De*v and De{0}v will match for SAP Developers Network is the place to be for SAP Developers.

2005 SAP AG

Express Yourself Regularly

The next step is to group characters to determine the repetitions. This can be done via round brackets. These brackets/parentheses indicate that something is optional. SAP.Dev(elopers) will match for SAP Developers Network is the place to be for SAP Developers. it will also match for SAP Developers Network is the place to be for SAP Developers. With these parentheses, you can also specify alternatives via | (SAP|PHP).Dev will match for SAP Developers Network is the place to be for

SAP Developers. Summing it up


I want to finish with an overview of the things covered in this tutorial. A character . Will match if the character matches. Characters are case sensitive Is a replacement for a single character. If you want to find a full stop you need to escape it with backslash (\) Will match at the start of the line Will match at the end of the line A list of characters that needs to be matched A range of characters Optional character Can occur 0 to n times Has to occur at least once Has to occur for a minimum of X and maximum of Y times. Y is optional Group of characters

^ $ [] ? * + {X,Y} ()

2005 SAP AG

Express Yourself Regularly

Specifies alternatives

This doesnt cover everything about regular expressions. There is much more to it which I will try to cover in a later tutorial.

Disclaimer & Liability Notice


This document may discuss sample coding, which does not include official interfaces and therefore is not supported. Changes made based on this information are not supported and can be overwritten during an upgrade. SAP will not be held liable for any damages caused by using or misusing of the code and methods suggested here, and anyone using these methods, is doing it under his/her own responsibility. SAP offers no guarantees and assumes no responsibility or liability of any type with respect to the content of the technical article, including any liability resulting from incompatibility between the content of the technical article and the materials and services offered by SAP. You agree that you will not hold SAP responsible or liable with respect to the content of the Technical Article or seek to do so.

Author Bio
Eddy De Clercq has 20 years experience in computing. He currently works at the Katholieke Universiteit Leuven, the oldest university of the Low Countries and the largest Flemish university. Eddy is a member of the E-university team that creates self services (web) applications.

2005 SAP AG

Express Yourself Regularly

Copyright 2005 SAP AG, Inc. All Rights Reserved. SAP, mySAP, mySAP.com, xApps, xApp, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product, service names, trademarks and registered trademarks mentioned are the trademarks of their respective owners.

2005 SAP AG

You might also like