Professional Documents
Culture Documents
Applies To:
ABAP, Netweaver 2004s
Summary
One can use regular expressions in ones code with Netweaver 2004s. However, since not everybody is acquainted with this relatively old technology, this tutorial will try to explain the basics. By: Eddy De Clercq Company: Katholieke Universiteit Leuven Date: 01 April 2006
Having simple tastes in life I seek happiness in small things. That is also the case when a new SAP release is announced. I was happy when Karl Kessler said that ABAP wasnt locked out from regular expressions. In fact, I was more than satisfied when I saw RE as one of the features of 2004s. Why am I so happy? When I wrote the BSP port for the Honeypot Project a year ago I moaned about the fact that something simple like skipping out non-(alpha)numeric characters couldnt be done in a one-liner. Whereas this can be done within PHP via ereg_replace("[^a-zA-Z0-9]","",$contents) ABAP/BSP needs this code WHILE origStr IS NOT INITIAL. IF origStr(1) CA '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'. CONCATENATE newStr origStr(1) INTO newStr. ENDIF. origStr = origStr+1(*). ENDWHILE. You cannot call this very elegant, can you? Luckily with Netweaver 2004s this is now history.
Mathematics
Most people know regular expressions from Ken Thompsons QED editor, but the godfather of regular expressions is Stephen Kleene, who defined a notation he called the algebra of regular sets. Some of you might know that the * wildcard in a search is also called the Kleene Star. Kleene is also known for his work on recursion together with people like Turing. Being a mathematician he laid the basis for theoretical science, as we know it. There is a drawback though. As with many powerful and small footnote things it can get rather complicated. On top of that, there are different types of regular expressions: Perl, Tcl, Python, etc. each have their own version. SAP makes use of
2005 SAP AG
the POSIX variant a.k.a modern extended regular expressions. I agree that it is all a bit confusing and even I sometimes need to read/think things over twice in order to understand them thoroughly. I therefore decided to make a tutorial that could be very useful for the SDN community. Sure, there is already a lot of reference material available. Most of it just sums up the semantics though. I thought it would be nice if things were explained by way of some examples.
Traditional expressions
In this part we will discuss the basic principles. Lets take SAP Developers Network is the place to be for SAP Developers. as the text that you will be looking at. Lets start searching. As such you just need to put the text that youre looking for as a search pattern. A pattern is always built up character by character. The Regex jargon for this is literal characters. A first example. SAP will match for SAP Developers Network is the place to be for SAP Developers. It seems obvious that this matches, but it isnt as obvious as you might think at first glance. It depends a bit on the Regex engine. There are text-directed engines, and regex-directed engines also known as respectively DFA and NFA engines. In addition, if thats not enough there is also POSIX NFA. Since the POSIX standard is supported in NW2004s, we will concentrate on the NFA engine. The basic rule is that it is always the leftmost match that will be returned. That means that the engine starts by trying to match the first character of the search string with the first character in the text that it is searching in. If that doesnt match itll go on to the next character in the text that it is searching in, and then the next, until it finds a first match. When it finds a match for this first character itll continue with the next character of the search string and so on and so forth until all the possible permutations have been carried out. If we apply this to the above example, it works like this. First the S character is checked. It has a match, thus it continuous with the A of the search string and so forth. As with many of these things, the search string is case sensitive. Sap will not match Each character is significant, thus spaces are too SAP Dev will match for SAP Developers Network is the place to be for SAP Developers. SAPDev will not match If you dont know which character will follow, you need to put a replacement character in brackets, for example a full stop (.). It can be used to match any character. SAP.Dev will match for SAP Developers Network is the place to be for SAP Developers. since the space is replaced by a full stop If you want to search for the full stop itself youll need to escape it. Escaping is done via a backslash (\) . will match Developers. for SAP Developers Network is the place to be for SAP
.\. will match for SAP Developers Network is the place to be for SAP Developers. \.. will not match since there is no character after the full stop in the text You can specify whether things must be at the start (^) or at the end ($) of the line. ^SAP will match for SAP Developers Network is the place to be for SAP Developers. Developers\.$ will match for SAP Developers Network is the place to be for SAP Developers. Developers$ will not match since the full stop, which is at the end of the text, was omitted
As with the full stop, and all other special characters, you need to escape it if you want to look for ^, $ or \ itself. ^ also has another meaning when used in conjunction with square brackets []. Inside those square brackets you can provide a list of characters. The characters between square brackets are often called character sets or character classes. For the purpose of clarity, I will show all matches and not the first match as I did before. [SDN] will match for SAP Developers Network is the place to be for SAP Developers. [EDC] will not match If such a character set starts with ^, the specified characters will not be used to match. In other words itll match everything but the characters specified [^SDN] will match for SAP Developers Network is the place to be for SAP Developers. In order to prevent having to provide all the characters, as in the old ABAP way, you can define ranges with the hyphen -. ABCDEFGHIJKLMNOPQRSTUVWXYZ can thus be shortened to A-Z, abcdefghijklmnopqrstuvwxyz to a-z (things are case sensitive remember) and 0123456789 to 0-9. [A-Z] will match for SAP Developers Network is the place to be for SAP
Developers.
You can also combine character sets with plain characters. [DN]e will match for SAP Developers Network is the place to be for SAP Developers.
2005 SAP AG
Repetitions
Repetitions is a very interesting feature that enables you to specify how many times a character and/or character class needs to be matched. There are a couple of meta characters to enable this: ? indicates that the preceding character is optional, meaning that it can occur 0, once or several times. Developers? will match for SAP Developers Network is the place to be for SAP Developers. Developers? will also match for SAP Developers Network is the place to be for SAP Developers. * indicates that the preceding character can occur any number of times, meaning 0 to n time(s) De*v will match for SAP Developers Network is the place to be for SAP
Developers.
Da*v will match for SAP Developers Network is the place to be for SAP Developers. + indicates that the preceding character needs to occur at least once De+v will match for SAP Developers Network is the place to be for SAP Developers. Da+v will not match
You can be even more specific in determining how many times a character may occur. This can be done via curly brackets {}. Within these curly brackets, you can set the minimum and maximum occurrences -> {X,Y} where X is the minimum and Y the maximum. Y is optional, thus {X} and {X,} are allowed. .{10} will match for SAP Developers Network is the place to be for SAP Developers. [A-Z]{1,5} will match for SAP Developers Network is the place to be for SAP Developers. [A-Z]{5} will not match This means that the previous mentioned meta characters can be replaced by curly brackets Developers{0,1} and Developers? will match for SAP Developers Network is the place to be for SAP Developers. De*v and De{0}v will match for SAP Developers Network is the place to be for SAP Developers.
2005 SAP AG
The next step is to group characters to determine the repetitions. This can be done via round brackets. These brackets/parentheses indicate that something is optional. SAP.Dev(elopers) will match for SAP Developers Network is the place to be for SAP Developers. it will also match for SAP Developers Network is the place to be for SAP Developers. With these parentheses, you can also specify alternatives via | (SAP|PHP).Dev will match for SAP Developers Network is the place to be for
^ $ [] ? * + {X,Y} ()
2005 SAP AG
Specifies alternatives
This doesnt cover everything about regular expressions. There is much more to it which I will try to cover in a later tutorial.
Author Bio
Eddy De Clercq has 20 years experience in computing. He currently works at the Katholieke Universiteit Leuven, the oldest university of the Low Countries and the largest Flemish university. Eddy is a member of the E-university team that creates self services (web) applications.
2005 SAP AG
Copyright 2005 SAP AG, Inc. All Rights Reserved. SAP, mySAP, mySAP.com, xApps, xApp, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product, service names, trademarks and registered trademarks mentioned are the trademarks of their respective owners.
2005 SAP AG