Извлечение нескольких подстрокEdit

int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
    const char *subject, int length, int startoffset,
    int options, int *ovector, int ovecsize,
    int *workspace, int wscount);

The function pcre_dfa_exec() is called to match a subject string against a compiled pattern, using a matching algorithm that scans the subject string just once, and does not backtrack. This has different characteristics to the normal algorithm, and is not compatible with Perl. Some of the features of PCRE patterns are not supported. Never- theless, there are times when this kind of matching can be useful. For a discussion of the two matching algorithms, and a list of features that pcre_dfa_exec() does not support, see the pcrematching documenta- tion.

The arguments for the pcre_dfa_exec() function are the same as for pcre_exec(), plus two extras. The ovector argument is used in a differ- ent way, and this is described below. The other common arguments are used in the same way as for pcre_exec(), so their description is not repeated here.

The two additional arguments provide workspace for the function. The workspace vector should contain at least 20 elements. It is used for keeping track of multiple paths through the pattern tree. More workspace will be needed for patterns and subjects where there are a lot of potential matches.

Here is an example of a simple call to pcre_dfa_exec():

 int rc;
 int ovector[10];
 int wspace[20];
 rc = pcre_dfa_exec(
   re,             /* result of pcre_compile() */
   NULL,           /* we didn't study the pattern */
   "some string",  /* the subject string */
   11,             /* the length of the subject string */
   0,              /* start at offset 0 in the subject */
   0,              /* default options */
   ovector,        /* vector of integers for substring information */
   10,             /* number of elements (NOT size in bytes) */
   wspace,         /* working space vector */
   20);            /* number of elements (NOT size in bytes) */

Option bits for pcre_dfa_exec()Edit

The unused bits of the options argument for pcre_dfa_exec() must be zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF, PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD, PCRE_PAR- TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last four of these are exactly the same as for pcre_exec(), so their description is not repeated here.


These have the same general effect as they do for pcre_exec(), but the details are slightly different. When PCRE_PARTIAL_HARD is set for pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of the sub- ject is reached and there is still at least one matching possibility that requires additional characters. This happens even if some complete matches have also been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached, there have been no complete matches, but there is still at least one matching possibility. The portion of the string that was inspected when the longest partial match was found is set as the first matching string in both cases. There is a more detailed discussion of partial and multi-segment matching, with exam- ples, in the pcrepartial documentation.


Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as soon as it has found one match. Because of the way the alterna- tive algorithm works, this is necessarily the shortest possible match at the first possible matching point in the subject string.


When pcre_dfa_exec() returns a partial match, it is possible to call it again, with additional subject characters, and have it continue with the same match. The PCRE_DFA_RESTART option requests this action; when it is set, the workspace and wscount options must reference the same vector as before because data about the match so far is left in them after a partial match. There is more discussion of this facility in the pcrepartial documentation.

Successful returns from pcre_dfa_exec()

When pcre_dfa_exec() succeeds, it may have matched more than one sub- string in the subject. Note, however, that all the matches from one run of the function start at the same point in the subject. The shorter matches are all initial substrings of the longer matches. For example, if the pattern


is matched against the string

This is <something> <something else> <something further> no more

the three matched strings are

<something> <something else>
<something> <something else> <something further>

On success, the yield of the function is a number greater than zero, which is the number of matched substrings. The substrings themselves are returned in ovector. Each string uses two elements; the first is the offset to the start, and the second is the offset to the end. In fact, all the strings have the same start offset. (Space could have been saved by giving this only once, but it was decided to retain some compatibility with the way pcre_exec() returns data, even though the meaning of the strings is different.)

The strings are returned in reverse order of length; that is, the long- est matching string is given first. If there were too many matches to fit into ovector, the yield of the function is zero, and the vector is filled with the longest matches.

Error returns from pcre_dfa_exec()Edit

The pcre_dfa_exec() function returns a negative number when it fails. Many of the errors are the same as for pcre_exec(), and these are described above. There are in addition the following errors that are specific to pcre_dfa_exec():


This return is given if pcre_dfa_exec() encounters an item in the pat- tern that it does not support, for instance, the use of \C or a back reference.


This return is given if pcre_dfa_exec() encounters a condition item that uses a back reference for the condition, or a test for recursion in a specific group. These are not supported.


This return is given if pcre_dfa_exec() is called with an extra block that contains a setting of the match_limit field. This is not supported (it is meaningless).


This return is given if pcre_dfa_exec() runs out of space in the workspace vector.


When a recursive subpattern is processed, the matching function calls itself recursively, using private vectors for ovector and workspace. This error is given if the output vector is not large enough. This should be extremely rare, as a vector of size 1000 is used.