Struct Regex

Source

pub struct Regex { /* private fields */ }

Expand description

A GRegex is a compiled form of a regular expression.

After instantiating a GRegex, you can use its methods to find matches in a string, replace matches within a string, or split the string at matches.

GRegex implements regular expression pattern matching using syntax and semantics (such as character classes, quantifiers, and capture groups) similar to Perl regular expression. See the PCRE documentation for details.

A typical scenario for regex pattern matching is to check if a string matches a pattern. The following statements implement this scenario.

⚠️ The following code is in { .c } ⚠️

const char *regex_pattern = ".*GLib.*";
const char *string_to_search = "You will love the GLib implementation of regex";
g_autoptr(GMatchInfo) match_info = NULL;
g_autoptr(GRegex) regex = NULL;

regex = g_regex_new (regex_pattern, G_REGEX_DEFAULT, G_REGEX_MATCH_DEFAULT, NULL);
g_assert (regex != NULL);

if (g_regex_match (regex, string_to_search, G_REGEX_MATCH_DEFAULT, &match_info))
  {
    int start_pos, end_pos;
    g_match_info_fetch_pos (match_info, 0, &start_pos, &end_pos);
    g_print ("Match successful! Overall pattern matches bytes %d to %d\n", start_pos, end_pos);
  }
else
  {
    g_print ("No match!\n");
  }

The constructor for GRegex includes two sets of bitmapped flags:

RegexCompileFlags—These flags control how GLib compiles the regex. There are options for case sensitivity, multiline, ignoring whitespace, etc.
RegexMatchFlags—These flags control GRegex’s matching behavior, such as anchoring and customizing definitions for newline characters.

Some regex patterns include backslash assertions, such as \d (digit) or \D (non-digit). The regex pattern must escape those backslashes. For example, the pattern "\\d\\D" matches a digit followed by a non-digit.

GLib’s implementation of pattern matching includes a start_position argument for some of the match, replace, and split methods. Specifying a start position provides flexibility when you want to ignore the first n characters of a string, but want to incorporate backslash assertions at character n - 1. For example, a database field contains inconsistent spelling for a job title: healthcare provider and health-care provider. The database manager wants to make the spelling consistent by adding a hyphen when it is missing. The following regex pattern tests for the string care preceded by a non-word boundary character (instead of a hyphen) and followed by a space.

⚠️ The following code is in { .c } ⚠️

const char *regex_pattern = "\\Bcare\\s";

An efficient way to match with this pattern is to start examining at start_position 6 in the string healthcare or health-care.

⚠️ The following code is in { .c } ⚠️

const char *regex_pattern = "\\Bcare\\s";
const char *string_to_search = "healthcare provider";
g_autoptr(GMatchInfo) match_info = NULL;
g_autoptr(GRegex) regex = NULL;

regex = g_regex_new (
  regex_pattern,
  G_REGEX_DEFAULT,
  G_REGEX_MATCH_DEFAULT,
  NULL);
g_assert (regex != NULL);

g_regex_match_full (
  regex,
  string_to_search,
  -1,
  6, // position of 'c' in the test string.
  G_REGEX_MATCH_DEFAULT,
  &match_info,
  NULL);

The method match_full() (and other methods implementing start_pos) allow for lookback before the start position to determine if the previous character satisfies an assertion.

Unless you set the [flags@GLib.RegexCompileFlags.RAW] as one of the GRegexCompileFlags, all the strings passed to GRegex methods must be encoded in UTF-8. The lengths and the positions inside the strings are in bytes and not in characters, so, for instance, \xc3\xa0 (i.e., à) is two bytes long but it is treated as a single character. If you set G_REGEX_RAW, the strings can be non-valid UTF-8 strings and a byte is treated as a character, so \xc3\xa0 is two bytes and two characters long.

Regarding line endings, \n matches a \n character, and \r matches a \r character. More generally, \R matches all typical line endings: CR + LF (\r\n), LF (linefeed, U+000A, \n), VT (vertical tab, U+000B, \v), FF (formfeed, U+000C, \f), CR (carriage return, U+000D, \r), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029).

The behaviour of the dot, circumflex, and dollar metacharacters are affected by newline characters. By default, GRegex matches any newline character matched by \R. You can limit the matched newline characters by specifying the [flags@GLib.RegexMatchFlags.NEWLINE_CR], [flags@GLib.RegexMatchFlags.NEWLINE_LF], and [flags@GLib.RegexMatchFlags.NEWLINE_CRLF] compile options, and with [flags@GLib.RegexMatchFlags.NEWLINE_ANY], [flags@GLib.RegexMatchFlags.NEWLINE_CR], [flags@GLib.RegexMatchFlags.NEWLINE_LF] and [flags@GLib.RegexMatchFlags.NEWLINE_CRLF] match options. These settings are also relevant when compiling a pattern if [flags@GLib.RegexCompileFlags.EXTENDED] is set and an unescaped # outside a character class is encountered. This indicates a comment that lasts until after the next newline.

Because GRegex does not modify its internal state between creation and destruction, you can create and modify the same GRegex instance from different threads. In contrast, MatchInfo is not thread safe.

The regular expression low-level functionalities are obtained through the excellent PCRE library written by Philip Hazel.

GLib type: Shared boxed type with reference counted clone semantics.

Regex

Struct Regex Copy item path

Implementations§

impl Regex

pub fn as_ptr(&self) -> *mut GRegex

pub unsafe fn from_glib_ptr_borrow(ptr: &*mut GRegex) -> &Self

impl Regex

pub fn new( pattern: &str, compile_options: RegexCompileFlags, match_options: RegexMatchFlags, ) -> Result<Option<Regex>, Error>

§pattern

§compile_options

§match_options

§Returns

pub fn capture_count(&self) -> i32

§Returns

pub fn compile_flags(&self) -> RegexCompileFlags

§Returns

pub fn has_cr_or_lf(&self) -> bool

§Returns

pub fn match_flags(&self) -> RegexMatchFlags

§Returns

pub fn max_backref(&self) -> i32

§Returns

pub fn max_lookbehind(&self) -> i32

§Returns

pub fn pattern(&self) -> GString

§Returns

impl Regex

pub fn string_number(&self, name: impl IntoGStr) -> i32

§name

§Returns

pub fn escape_nul(string: impl IntoGStr) -> GString

§string

§length

§Returns

pub fn escape_string(string: impl IntoGStr) -> GString

§string

§length

§Returns

pub fn check_replacement(replacement: impl IntoGStr) -> Result<bool, Error>

§replacement

§Returns

§has_references

pub fn match_simple( pattern: impl IntoGStr, string: impl IntoGStr, compile_options: RegexCompileFlags, match_options: RegexMatchFlags, ) -> bool

§pattern

§string

§compile_options

§match_options

§Returns

pub fn replace( &self, string: impl IntoGStr, start_position: i32, replacement: impl IntoGStr, match_options: RegexMatchFlags, ) -> Result<GString, Error>

§string

§start_position

§replacement

§match_options

§Returns

pub fn match_all<'input>( &self, string: &'input GStr, match_options: RegexMatchFlags, ) -> Result<MatchInfo<'input>, Error>

§string

§match_options

§Returns

§match_info

pub fn match_all_full<'input>( &self, string: &'input GStr, start_position: i32, match_options: RegexMatchFlags, ) -> Result<MatchInfo<'input>, Error>

§string

§start_position

§match_options

§Returns

§match_info

pub fn match_<'input>( &self, string: &'input GStr, match_options: RegexMatchFlags, ) -> Result<MatchInfo<'input>, Error>

§string

§match_options

§Returns

§match_info

pub fn match_full<'input>( &self, string: &'input GStr, start_position: i32, match_options: RegexMatchFlags, ) -> Result<MatchInfo<'input>, Error>

§string

§start_position

§match_options

§Returns

§match_info

pub fn replace_literal( &self, string: impl IntoGStr, start_position: i32, replacement: impl IntoGStr, match_options: RegexMatchFlags, ) -> Result<GString, Error>

§string

§start_position

§replacement

Struct Regex

§`pattern`

§`compile_options`

§`match_options`

§`name`

§`string`

§`length`

§`string`

§`length`

§`replacement`

§`has_references`

§`pattern`

§`string`

§`compile_options`

§`match_options`

§`string`

§`start_position`

§`replacement`

§`match_options`

§`string`

§`match_options`

§`match_info`

§`string`

§`start_position`

§`match_options`

§`match_info`

§`string`

§`match_options`

§`match_info`

§`string`

§`start_position`

§`match_options`

§`match_info`

§`string`

§`start_position`

§`replacement`

§`match_options`

§`string`

§`match_options`

§`string`

§`start_position`

§`match_options`

§`max_tokens`

§`pattern`

§`string`

§`compile_options`

§`match_options`

fn hash<H: Hasher>(&self, state: &mut H)

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

fn max(self, other: Self) -> Self
where Self: Sized,

fn min(self, other: Self) -> Self
where Self: Sized,

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,