PCRE:Извлечение нумерованных подстрок

Captured substrings  can  be  accessed  directly  by using the offsets returned by pcre_exec in ovector. For convenience,  the  functions pcre_copy_substring,   pcre_get_substring,    and    pcre_get_sub- string_list are provided for extracting captured substrings as  new, separate, zero-terminated strings. These functions identify substrings by number. The next section describes functions for  extracting  named substrings.

A substring that contains a binary zero is correctly extracted and has a further zero added on the end, but the result is not, of course, a C string. However, you  can  process such a string by referring to the length that is returned  by  pcre_copy_substring  and  pcre_get_sub- string. Unfortunately, the interface to pcre_get_substring_list is not adequate for handling strings containing binary zeros, because the end of the final string is not independently indicated.

The first  three  arguments  are the same for all three of these func- tions: subject is the subject string that has just  been  successfully matched, ovector is a pointer to the vector of integer offsets that was passed to pcre_exec, and stringcount is the number of substrings that were captured  by  the match, including the substring that matched the entire regular expression. This is the value returned by pcre_exec if it is greater than zero. If pcre_exec returned zero, indicating that it ran out of space in ovector, the value passed as stringcount should be the number of elements in the vector divided by three.

The functions pcre_copy_substring and pcre_get_substring extract a single substring, whose number is given as  stringnumber. A value  of zero  extracts  the  substring that matched the entire pattern, whereas higher values extract  the  captured  substrings. For pcre_copy_sub- string, the  string  is  placed  in buffer, whose length is given by buffersize, while for pcre_get_substring a new  block  of  memory  is obtained  via  pcre_malloc,  and its address is returned via stringptr. The yield of the function is the length of the string,  not  including the terminating zero, or one of these error codes:

PCRE_ERROR_NOMEMORY      (-6)

The buffer  was too small for pcre_copy_substring, or the attempt to get memory failed for pcre_get_substring.

PCRE_ERROR_NOSUBSTRING   (-7)

There is no substring whose number is stringnumber.

The pcre_get_substring_list function  extracts  all  available  sub- strings and  builds  a list of pointers to them. All this is done in a single block of memory that is obtained via pcre_malloc. The address of the memory  block  is returned via listptr, which is also the start of the list of string pointers. The end of the list is marked by  a  NULL pointer. The yield  of  the function is zero if all went well, or the error code

PCRE_ERROR_NOMEMORY      (-6)

if the attempt to get the memory block failed.

When any of these functions encounter a substring that is unset, which can happen  when  capturing subpattern number n+1 matches some part of the subject, but subpattern n has not been used at all, they return  an empty string. This can be distinguished from a genuine zero-length sub- string by inspecting the appropriate offset in ovector, which is nega- tive for unset substrings.

The two convenience functions pcre_free_substring and pcre_free_sub- string_list can be used to free the memory returned  by  a  previous call of  pcre_get_substring  or  pcre_get_substring_list,  respec- tively. They do nothing more than call  the  function  pointed  to  by pcre_free,  which  of course could be called directly from a C program. However, PCRE is used in some situations where it is linked via a spe- cial  interface  to  another  programming  language  that  cannot  use pcre_free directly; it is for these cases that the functions are  pro- vided.