Oracle / PLSQL: REGEXP_SUBSTR Function

This article is written about how to use the Oracle/PLSQL REGEXP_SUBSTR function with syntax and examples.

Description

The Oracle/PLSQL REGEXP_SUBSTR feature is an extension of the SUBSTR function. This function, brought in Oracle 10g, will enable you to extract a substring from a string the use of normal expression pattern matching.

Syntax

The syntax for the REGEXP_SUBSTR function in Oracle is:

REGEXP_SUBSTR( string, pattern [, start_position [, nth_appearance [, match_parameter [, sub_expression ] ] ] ] )

Parameters or Arguments

string The string to search. It can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. pattern The normal expression matching information. It can be a combination of the following: Value Description ^ Matches the opening of a string. If used with a match_parameter of ‘m’, it suits the begin of a line somewhere within expression. $ Matches the cease of a string. If used with a match_parameter of ‘m’, it fits the stop of a line anywhere within expression. * Matches zero or extra occurrences. + Matches one or extra occurrences. ? Matches zero or one occurrence. . Matches any personality besides NULL. | Used like an “OR” to specify more than one alternative. [ ] Used to specify a matching listing where you are attempting to match any one of the characters in the list. [^ ] Used to specify a nonmatching listing where you are making an attempt to match any character except for the ones in the list. ( ) Used to group expressions as a subexpression. {m} Matches m times. {m,} Matches at least m times. {m,n} Matches at least m times, however no greater than n times. \n n is a variety between 1 and 9. Matches the nth subexpression located within ( ) earlier than encountering \n. [..] Matches one collation component that can be extra than one character. [::] Matches personality classes. [==] Matches equivalence classes. \d Matches a digit character. \D Matches a nondigit character. \w Matches a word character. \W Matches a nonword character. \s Matches a whitespace character. \S fits a non-whitespace character. \A Matches the starting of a string or fits at the cease of a string earlier than a newline character. \Z Matches at the end of a string. *? Matches the preceding sample zero or more occurrences. +? Matches the previous sample one or extra occurrences. ?? Matches the previous sample zero or one occurrence. {n}? Matches the previous sample n times. {n,}? Matches the previous pattern at least n times. {n,m}? Matches the previous sample at least n times, but no longer greater than m times. start_position Optional. It is the function in string the place the search will start. If omitted, it defaults to 1 which is the first position in the string. nth_appearance Optional. It is the nth appearance of sample in string. If omitted, it defaults to 1 which is the first appearance of pattern in string. match_parameter Optional. It lets in you to regulate the matching behavior for the REGEXP_SUBSTR function. It can be a combination of the following: Value Description ‘c’ Perform case-sensitive matching. ‘i’ Perform case-insensitive matching. ‘n’ Allows the period character (.) to in shape the newline character. By default, the period is a wildcard. ‘m’ expression is assumed to have a couple of lines, the place ^ is the begin of a line and $ is the end of a line, regardless of the function of these characters in expression. By default, expression is assumed to be a single line. ‘x’ Whitespace characters are ignored. By default, whitespace characters are matched like any different character. subexpression Optional. This is used when pattern has subexpressions and you wish to indicate which subexpression in pattern is the target. It is an integervalue from zero to 9 indicating the subexpression to suit on in pattern.

Returns

The REGEXP_SUBSTR characteristic returns a string value. If the REGEXP_SUBSTR feature does now not discover any prevalence of pattern, it will return NULL.

Note

If there are conflicting values furnished for match_parameter, the REGEXP_SUBSTR feature will use the remaining value. If you leave out the match_behavior parameter, the REGEXP_SUBSTR feature will use the NLS_SORT parameter to determine if it should use a case-sensitive search, it will anticipate that string is a single line, and expect the duration persona to healthy any personality (not the newline character). See also the SUBSTR function.

Applies To

The REGEXP_SUBSTR feature can be used in the following variations of Oracle/PLSQL:

Oracle 12c, Oracle 11g, Oracle 10g

Example – Match on Words

Let’s begin through extracting the first word from a string.

For example:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)(\s)')
FROM dual;

Result: 'TechOnTheNet '

This example will return ‘TechOnTheNet ‘ because it will extract all non-whitespace characters as distinct via (\S*) and then the first whitespace personality as certain via (\s). The result will encompass both the first phrase as properly as the space after the word.

If you didn’t favor to encompass the area in the result, we could regulate our instance as follows:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)')
FROM dual;

Result: 'TechOnTheNet'

This instance would return ‘TechOnTheNet’ with no house at the end.

If we desired to discover the 2nd phrase in the string, we ought to regulate our function as follows:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)(\s)', 1, 2)
FROM dual;

Result: 'is '

This example would return ‘is ‘ with a house at the give up of the string.

If we desired to find the 0.33 phrase in the string, we could alter our feature as follows:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)(\s)', 1, 3)
FROM dual;

Result: 'a '

This example would return ‘a ‘ with a house at the cease of the string.

Example – Match on Digit Characters

Let’s look subsequent at how we would use the REGEXP_SUBSTR feature to in shape on a single digit character pattern.

For example:

SELECT REGEXP_SUBSTR ('2, 5, and 10 are numbers in this example', '\d')
FROM dual;

Result: 2

This instance will extract the first numeric digit from the string as special by using \d. In this case, it will in shape on the variety 2

We may want to exchange our sample to search for a two-digit number.

For example:

SELECT REGEXP_SUBSTR ('2, 5, and 10 are numbers in this example', '(\d)(\d)')
FROM dual;

Result: 10

This instance will extract a range that has two digits side-by-side as detailed with the aid of (\d)(\d). In this case, it will skip over the 2 and 5 numeric values and return 10.

Now, let’s appear how we would use the REGEXP_SUBSTR characteristic with a desk column and search for a two digit number.

For example:

SELECT REGEXP_SUBSTR (address, '(\d)(\d)')
FROM contacts;

In this example, we are going to extract the first two-digit cost from the tackle subject in the contacts table.

Example – Match on more than one alternative

The next instance that we will seem at involves the usage of the | pattern. The | pattern is used like an “OR” to specify more than one alternative.

For example:

SELECT REGEXP_SUBSTR ('Anderson', 'a|e|i|o|u')
FROM dual;

Result: 'e'

This example will return ‘e’ because it is looking for the first vowel (a, e, i, o, or u) in the string. Since we did now not specify a match_parameter value, the REGEXP_SUBSTR function will perform a case-sensitive search which capacity that the ‘A’ in ‘Anderson’ will not be matched.

We ought to alter our query as follows to operate a case-insensitive search as follows:

SELECT REGEXP_SUBSTR ('Anderson', 'a|e|i|o|u', 1, 1, 'i')
FROM dual;

Result: 'A'

Now because we have supply a match_parameter of ‘i’, the query will return ‘A’ as the result. This time, the ‘A’ in ‘Anderson’ will be located as a match.

Now, let’s shortly exhibit how you would use this characteristic with a column.

So let’s say we have a contact desk with the following data:

contact_id last_name 1000 Anderson 2000 Smith 3000 Johnson

Now, let’s run the following query:

SELECT contact_id, last_name, REGEXP_SUBSTR (last_name, 'a|e|i|o|u', 1, 1, 'i') AS "First Vowel"
FROM contacts;

These are the outcomes that would be again with the aid of the query:

contact_id last_name First vowel 1000 Anderson A 2000 Smith i 3000 Johnson o

Example – Match on nth_occurrence

The next instance that we will seem at entails the nth_occurrence parameter. The nth_occurrence parameter lets in you to choose which incidence of the pattern you wish to extract the substring for.

First Occurrence

Let’s seem at how to extract the first occurrence of a pattern in a string.

For example:

SELECT REGEXP_SUBSTR ('TechOnTheNet', 'a|e|i|o|u', 1, 1, 'i')
FROM dual;

Result: 'e'

This example will return ‘e’ due to the fact it is extracting the first incidence of a vowel (a, e, i, o, or u) in the string.

Second Occurrence

Next, we will extract for the second occurrence of a sample in a string.

For example:

SELECT REGEXP_SUBSTR ('TechOnTheNet', 'a|e|i|o|u', 1, 2, 'i')
FROM dual;

Result: 'O'

This example will return ‘O’ because it is extracting the 2nd occurrence of a vowel (a, e, i, o, or u) in the string.

Third Occurrence

For example:

SELECT REGEXP_SUBSTR ('TechOnTheNet', 'a|e|i|o|u', 1, 3, 'i')
FROM dual;

Result: 'e'

This example will return ‘e’ due to the fact it is extracting the 0.33 occurrence of a vowel (a, e, i, o, or u) in the string.