Pcre dfa exec

Извлечение нескольких подстрок
The function  pcre_dfa_exec  is  called  to  match  a subject string against a compiled pattern, using a matching algorithm that scans  the subject string  just  once, and does not backtrack. This has different characteristics to the normal algorithm, and is  not  compatible  with Perl. Some of the features of PCRE patterns are not supported. Never- theless, there are times when this kind of matching can be useful. For a discussion  of  the  two matching algorithms, and a list of features that pcre_dfa_exec does not support, see the pcrematching documenta- tion.

The arguments  for  the  pcre_dfa_exec  function are the same as for pcre_exec, plus two extras. The ovector argument is used in a differ- ent way,  and  this is described below. The other common arguments are used in the same way as for pcre_exec, so their description  is  not repeated here.

The two  additional  arguments provide workspace for the function. The workspace vector should contain at least 20 elements. It is  used  for keeping track  of  multiple  paths  through  the  pattern  tree. More workspace will be needed for patterns and subjects where there  are  a lot of potential matches.

Here is an example of a simple call to pcre_dfa_exec:

Option bits for pcre_dfa_exec
The unused  bits  of  the options argument for pcre_dfa_exec must be zero. The only bits that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW- LINE_xxx,       PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,     PCRE_NO_UTF8_CHECK,       PCRE_BSR_ANYCRLF, PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD, PCRE_PAR- TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the  last four of  these  are  exactly  the  same  as  for pcre_exec, so their description is not repeated here.

PCRE_PARTIAL_HARD PCRE_PARTIAL_SOFT

These have the same general effect as they do for pcre_exec, but the details are  slightly  different. When PCRE_PARTIAL_HARD  is set for pcre_dfa_exec, it returns PCRE_ERROR_PARTIAL if the end of the  sub- ject is  reached  and there is still at least one matching possibility that requires additional characters. This happens even if some complete matches have also been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the  subject  is  reached, there have been no complete matches, but there is still at least one matching possibility. The portion  of  the string that  was inspected when the longest partial match was found is set as the first matching string  in  both  cases. There is  a  more detailed discussion  of partial and multi-segment matching, with exam- ples, in the pcrepartial documentation.

PCRE_DFA_SHORTEST

Setting the PCRE_DFA_SHORTEST option causes the matching algorithm  to stop as soon as it has found one match. Because of the way the alterna- tive algorithm works, this is necessarily the shortest possible  match at the first possible matching point in the subject string.

PCRE_DFA_RESTART

When pcre_dfa_exec returns a partial match, it is possible to call it again, with additional subject characters, and have it  continue  with the same match. The PCRE_DFA_RESTART option requests this action; when it is set, the workspace and wscount options must reference  the  same vector as  before  because data about the match so far is left in them after a partial match. There is more discussion of this facility in the pcrepartial documentation.

Successful returns from pcre_dfa_exec

When pcre_dfa_exec  succeeds, it may have matched more than one sub- string in the subject. Note, however, that all the matches from one run of the  function  start  at the same point in the subject. The shorter matches are all initial substrings of the longer matches. For example, if the pattern

<.*>

is matched against the string

This is  no more

the three matched strings are

On success,  the  yield of the function is a number greater than zero, which is the number of matched substrings. The substrings  themselves are returned  in  ovector. Each string uses two elements; the first is the offset to the start, and the second is the offset to the  end. In fact, all  the  strings  have the same start offset. (Space could have been saved by giving this only once, but it was decided to retain some compatibility  with  the  way pcre_exec returns data, even though the meaning of the strings is different.)

The strings are returned in reverse order of length; that is, the long- est matching  string is given first. If there were too many matches to fit into ovector, the yield of the function is zero, and the vector is filled with the longest matches.

Error returns from pcre_dfa_exec
The pcre_dfa_exec  function returns a negative number when it fails. Many of the errors are the same as  for  pcre_exec,  and  these  are described above. There are in addition the following errors that are specific to pcre_dfa_exec:

PCRE_ERROR_DFA_UITEM     (-16)

This return is given if pcre_dfa_exec encounters an item in the pat- tern that  it  does not support, for instance, the use of \C or a back reference.

PCRE_ERROR_DFA_UCOND     (-17)

This return is given if pcre_dfa_exec encounters  a  condition  item that uses  a back reference for the condition, or a test for recursion in a specific group. These are not supported.

PCRE_ERROR_DFA_UMLIMIT   (-18)

This return is given if pcre_dfa_exec is called with an extra  block that contains a setting of the match_limit field. This is not supported (it is meaningless).

PCRE_ERROR_DFA_WSSIZE    (-19)

This return is given if pcre_dfa_exec  runs  out  of  space  in  the workspace vector.

PCRE_ERROR_DFA_RECURSE   (-20)

When a  recursive subpattern is processed, the matching function calls itself recursively, using private vectors for ovector  and  workspace. This error  is  given  if  the output vector is not large enough. This should be extremely rare, as a vector of size 1000 is used.