int pcre_copy_substring(const char *subject, int *ovector, int stringcount, int stringnumber, char *buffer, int buffersize); int pcre_get_substring(const char *subject, int *ovector, int stringcount, int stringnumber, const char **stringptr); int pcre_get_substring_list(const char *subject, int *ovector, int stringcount, const char ***listptr);
Captured substrings can be accessed directly by using the offsets returned by pcre_exec() in ovector. For convenience, the functions pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- string_list() are provided for extracting captured substrings as new, separate, zero-terminated strings. These functions identify substrings by number. The next section describes functions for extracting named substrings.
A substring that contains a binary zero is correctly extracted and has a further zero added on the end, but the result is not, of course, a C string. However, you can process such a string by referring to the length that is returned by pcre_copy_substring() and pcre_get_sub- string(). Unfortunately, the interface to pcre_get_substring_list() is not adequate for handling strings containing binary zeros, because the end of the final string is not independently indicated.
The first three arguments are the same for all three of these func- tions: subject is the subject string that has just been successfully matched, ovector is a pointer to the vector of integer offsets that was passed to pcre_exec(), and stringcount is the number of substrings that were captured by the match, including the substring that matched the entire regular expression. This is the value returned by pcre_exec() if it is greater than zero. If pcre_exec() returned zero, indicating that it ran out of space in ovector, the value passed as stringcount should be the number of elements in the vector divided by three.
The functions pcre_copy_substring() and pcre_get_substring() extract a single substring, whose number is given as stringnumber. A value of zero extracts the substring that matched the entire pattern, whereas higher values extract the captured substrings. For pcre_copy_sub- string(), the string is placed in buffer, whose length is given by buffersize, while for pcre_get_substring() a new block of memory is obtained via pcre_malloc, and its address is returned via stringptr. The yield of the function is the length of the string, not including the terminating zero, or one of these error codes:
The buffer was too small for pcre_copy_substring(), or the attempt to get memory failed for pcre_get_substring().
There is no substring whose number is stringnumber.
The pcre_get_substring_list() function extracts all available sub- strings and builds a list of pointers to them. All this is done in a single block of memory that is obtained via pcre_malloc. The address of the memory block is returned via listptr, which is also the start of the list of string pointers. The end of the list is marked by a NULL pointer. The yield of the function is zero if all went well, or the error code
if the attempt to get the memory block failed.
When any of these functions encounter a substring that is unset, which can happen when capturing subpattern number n+1 matches some part of the subject, but subpattern n has not been used at all, they return an empty string. This can be distinguished from a genuine zero-length sub- string by inspecting the appropriate offset in ovector, which is nega- tive for unset substrings.
The two convenience functions pcre_free_substring() and pcre_free_sub- string_list() can be used to free the memory returned by a previous call of pcre_get_substring() or pcre_get_substring_list(), respec- tively. They do nothing more than call the function pointed to by pcre_free, which of course could be called directly from a C program. However, PCRE is used in some situations where it is linked via a spe- cial interface to another programming language that cannot use pcre_free directly; it is for these cases that the functions are pro- vided.