# librematch **Repository Path**: dannyniu/librematch ## Basic Information - **Project Name**: librematch - **Description**: POSIX正则表达式库。 - **Primary Language**: Unknown - **License**: Unlicense - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-03 - **Last Updated**: 2025-09-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README The `librematch` Regular Expression Library. ==== The `librematch` regex library implements a byte-oriented ASCII-based subset of the POSIX-2024 regular expressions specification. Getting Started ==== The library comes with a POSIX-compliant Makefile and a typical `configure` script. To use the library, do what most people would typically do: ``` ./configure make make install # optional. ``` Here's a brief interface listing, they're further described in the `librematch.h` header, which contain API that're POSIX-compatible. - Types - `libregexp_t` - `libre_match_t := { rm_so, rm_eo, ... }` - Functions - `int libregcomp(preg, pattern, cflags);` - `int libregexec(preg, string, nmatch, pmatch, eflags);` - `void libregfree(preg);` - Compile Flags - `LIBREG_{EXTENDED,ICASE,MINIMAL,NOSUB,NEWLINE,...}` - Regular Expression Execution Flags - `LIBREG_{NOTBOL,NOTEOL}` ***Caveats*** The interface corresponding to `regerror` had not been implemented yet, although, it's not sure if anyone would pick up interest to use it. Design ==== The librematch regular expression engine is byte-oriented and ASCII-based. The reason for this is twofold: First, UTF-8 is not the only character set out there, even if it is, a single glyph my have multiple grapheme representations. As such the issue of normalization comes into play. Second, restricting to ASCII makes the behavior of both the implementation, and application codes deterministic, the implementation can have less burdon, and the application can rely on its pattern not being interpreted differently depending on the language setting. Additionally, it makes sense to restrict to the ASCII character set. POSIX didn't specify the behavior for additional locales, as such, applications that depend on them (being provided by the system rather than `localedef` it by themselves) are already non-portable. Implementation ---- The implementation make use of recursive function call. At a glance, this may look dangerous as it carry the risk of stack overflow. However, balancing the effort required to allocate and resize memory on the heap, as well as potential stack frame optimization that might be performed by the compiler, @dannyniu don't think this risk merits the hassle of inventing custom stack allocator. If it's ever done, there could be 2 choices: 1. Chained stack, where a failure will cause the leak of those chained "stack frames". 2. Single relocatable stack chuck with offsets replacing pointers, which would complicate address calculation. And finally, if it's ever a concern, one can just create a secondary thread, allocate a *big* stack for it, execute the regular expression in the thread and wait for it to complete.