stringtranslate.com

Comparison of regular expression engines

This is a comparison of regular expression engines.

Libraries

  1. ^ Formerly called Regex++.
  2. ^ a b One of fuzzy regular expression engines.
  3. ^ Included since version 2.13.0.
  4. ^ ICU4J, the Java version, does not support regular expressions.
  5. ^ C++ bindings were developed by Google and became officially part of PCRE in 2006.

Languages

  1. ^ "STD.regex - D Programming Language - Digital Mars".
  2. ^ "Dotnet/Corefx". GitHub. 16 February 2022.
  3. ^ "Dotnet/Corefx". GitHub. 16 February 2022.

Language features

NOTE: An application using a library for regular expression support does not necessarily support the full set of features of the library, e.g., GNU grep uses PCRE, but supports no lookahead, though PCRE does.

Part 1

  1. ^ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.
  2. ^ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.
  3. ^ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".
  4. ^ "Perl Regular Expression Syntax - 1.47.0".
  5. ^ "User's Guide - 1.47.0".
  6. ^ a b FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.
  7. ^ a b As of ES2018
  8. ^ Lua's only non-greedy quantifier is -, which is a non-greedy version of *. It does not have non-greedy versions of + or ?; in the former case, the non-greedy effect can be achieved by repeating the token followed by -, but in the latter case, there is no equivalent.
  9. ^ Supported by the optional regex library only.

Part 2

  1. ^ Also known as flags modifiers, modes modifiers or option letters. Example pattern: "(?i:test)".
  2. ^ Also called independent sub-expressions.
  3. ^ Similar to back references, but with names instead of indices.
  4. ^ Special feature allowing to match balanced constructs without recursion.
  5. ^ Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable.
  6. ^ a b c d e f g h i Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.
  7. ^ Available as of ICU55.
  8. ^ Available as of JDK7.
  9. ^ The support and range of properties is dependent on implementation.
  10. ^ Experimental support added in v5.29.9.
  11. ^ Supported by Python v3.11 and later, and the optional regex library only.
  12. ^ May only be available in the regex library when used with Python versions after 3.3.
  13. ^ Supported by the optional regex library only.

API features

  1. ^ a b Means the format can be used internally without explicit conversion.
  2. ^ Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully.[1].
  3. ^ a b Supports Unicode 15.0 standard from 2023.[2].
  4. ^ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010.[3].
  5. ^ Since version 8.30.
  6. ^ Partial matching is performed implicitly, requiring a separate call to matchedLength() if an exact match fails.
  7. ^ Tcl includes facilities to convert to and from UTF-8.
  8. ^ wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.

See also

References

  1. ^ "Getting Started – Hyperscan 5.4.0 documentation".
  2. ^ "Regex - Regular Expressions in OCaml".
  3. ^ "Recursive Regex—Tutorial".
  4. ^ "UTS #18: Unicode Regular Expressions".
  5. ^ "ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification". www.ecma-international.org. Retrieved 4 August 2020.

External links