Oracle / PLSQL: REGEXP_COUNT Function

This article is written about how to use the Oracle/PLSQL REGEXP_COUNT function with syntax and examples.

Description

The Oracle/PLSQL REGEXP_COUNT function counts the number of times that a pattern occurs in a string. This function, introduced in Oracle 11g, will allow you to count the number of times a substring occurs in a string using regular expression pattern matching.

Syntax

The syntax for the REGEXP_COUNT function in Oracle is:

REGEXP_COUNT( string, pattern [, start_position [, match_parameter ] ] )

Parameters or Arguments

string

The string to search. string can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.

pattern

The regular expression matching information. It can be a combination of the following:

Value Description

^ Matches the beginning of a string. If used with a match_parameter of ‘m’, it matches the start of a line anywhere within expression.

$ Matches the end of a string. If used with a match_parameter of ‘m’, it matches the end of a line anywhere within expression.

  • Matches zero or more occurrences.
  • + Matches one or more occurrences.
  • ? Matches zero or one occurrence.
  • . Matches any character except NULL.
  • | Used like an “OR” to specify more than one alternative.
  • [ ] Used to specify a matching list where you are trying to match any one of the characters in the list.
  • [^ ] Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
  • ( ) Used to group expressions as a subexpression.
  • {m} Matches m times.
  • {m,} Matches at least m times.
  • {m,n} Matches at least m times, but no more than n times.
  • \n n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \n.
  • [..] Matches one collation element that can be more than one character.
  • [::] Matches character classes.
  • [==] Matches equivalence classes.
  • \d Matches a digit character.
  • \D Matches a nondigit character.
  • \w Matches a word character.
  • \W Matches a nonword character.
  • \s Matches a whitespace character.
  • \S matches a non-whitespace character.
  • \A Matches the beginning of a string or matches at the end of a string before a newline character.
  • \Z Matches at the end of a string.
  • *? Matches the preceding pattern zero or more occurrences.
  • +? Matches the preceding pattern one or more occurrences.
  • ?? Matches the preceding pattern zero or one occurrence.
  • {n}? Matches the preceding pattern n times.
  • {n,}? Matches the preceding pattern at least n times.
  • {n,m}? Matches the preceding pattern at least n times, but not more than m times.
  • start_position
  • Optional. It is the position in string where the search will start. If omitted, it defaults to 1 which is the first position in the string.
  • match_parameter
  • Optional. It allows you to modify the matching behavior for the REGEXP_COUNT function. It can be a combination of the following:
  • Value Description
  • ‘c’ Perform case-sensitive matching.
  • ‘i’ Perform case-insensitive matching.
  • ‘n’ Allows the period character (.) to match the newline character. By default, the period is a wildcard.
  • ‘m’ expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line.
  • ‘x’ Whitespace characters are ignored. By default, whitespace characters are matched like any other character.

Returns

The REGEXP_COUNT function returns a numeric value.

Note

If there are conflicting values provided for match_parameter, the REGEXP_COUNT function will use the last value.

If you omit the match_behavior parameter, the REGEXP_COUNT function will use the NLS_SORT parameter to determine if it should use a case-sensitive search, it will assume that string is a single line, and assume the period character to match any character (not the newline character).

If the REGEXP_COUNT function does not find any occurrence of pattern, it will return 0.

Applies To

The REGEXP_COUNT function can be used in the following versions of Oracle/PLSQL:

Oracle 12c, Oracle 11g

Example – Match on Single Character

Let’s start by looking at the simplest case. Let’s count the number of times the character ‘t’ appears in a string.

For example:

SELECT REGEXP_COUNT ('TechOnTheNet is a great resource', 't')
FROM dual;

Result: 2

This example will return 2 because it is counting the number of occurrences of ‘t’ in the string. Since we did not specify a match_parameter value, the REGEXP_COUNT function will perform a case-sensitive search which means that the ‘T’ characters will not be included in the count.

If we wanted to include both ‘t’ and ‘T’ in our results and perform a case-insensitive search, we could modify our query as follows:

SELECT REGEXP_COUNT ('TechOnTheNet is a great resource', 't', 1, 'i')
FROM dual;

Result: 4

Now because we have provide a start_position of 1 and a match_parameter of ‘i’, the query will return 4 as the result. This time, both ‘t’ and ‘T’ values would be included in the count.

If we wanted to count the number of ‘t’ in a column, we could try something like this:

SELECT REGEXP_COUNT (last_name, 't', 1, 'i') AS total
FROM contacts;

This would count the number of ‘t’ or ‘T’ values in the last_name field from the contacts table.

Example – Match on Multiple Characters

Let’s look next at how we would use the REGEXP_COUNT function to match on a multi-character pattern.

For example:

SELECT REGEXP_COUNT ('The example shows how to use the REGEXP_COUNT function', 'the', 1, 'i')
FROM dual;

Result: 2

This example will return the number of times that the word ‘the’ appears in the string. It will perform a case-insensitive search so it will return 2.

For example:

SELECT REGEXP_COUNT ('The example shows how to use the REGEXP_COUNT function', 'the', 4, 'i')
FROM dual;

Result: 1

This example will return the number of times that the word ‘the’ appears in the string starting from position 4. In this case, it will return 1 because it will skip over the first 3 characters in the string before searching for the pattern.

Now, let’s look how we would use the REGEXP_COUNT function with a table column and search for multiple characters.

For example:

SELECT REGEXP_COUNT (other_comments, 'the', 1, 'i')
FROM contacts;

Result: 3

In this example, we are going to count the number of occurrence of ‘the’ in the other_comments field in the contacts table.

Example – Match on more than one alternative

The next example that we will look at involves using the | pattern. The | pattern is used like an “OR” to specify more than one alternative.

For example:

SELECT REGEXP_COUNT ('Anderson', 'a|e|i|o|u')
FROM dual;

Result: 2

This example will return 2 because it is counting the number of vowels (a, e, i, o, or u) in the string ‘Anderson’. Since we did not specify a match_parameter value, the REGEXP_COUNT function will perform a case-sensitive search which means that the ‘A’ in ‘Anderson’ will not be counted.

We could modify our query as follows to perform a case-insensitive search as follows:

SELECT REGEXP_COUNT ('Anderson', 'a|e|i|o|u', 1, 'i')
FROM dual;

Result: 3

Now because we have provide a start_position of 1 and a match_parameter of ‘i’, the query will return 3 as the result. This time, the ‘A’ in ‘Anderson’ will be included in the count.

Now, let’s quickly show how you would use this function with a column.

So let’s say we have a contact table with the following data:

contact_id last_name

1000 Anderson

2000 Smith

3000 Johnson

Now, let’s run the following query:

SELECT contact_id, last_name, REGEXP_COUNT (last_name, 'a|e|i|o|u', 1, 'i') AS total
FROM contacts;

These are the results that would be returned by the query:

contact_id last_name total

1000 Anderson 3

2000 Smith 1

3000 Johnson 2