libgrapheme, branch HEAD unicode string library bf20d2f7bce13c7d006a9ca442221399753bce9d 2025-12-24T07:08:54Z 2025-12-24T07:08:54Z Bump version to 3.0.0 Laslo Hunhold laslo@hunhold.de commit bf20d2f7bce13c7d006a9ca442221399753bce9d parent 40bc7dffb7a0ae319dba093809fc2f27a59ed8d1 Author: Laslo Hunhold <laslo@hunhold.de> Date: Wed, 24 Dec 2025 08:08:54 +0100 Bump version to 3.0.0 We need a major bump as the bidirectional API changed since the last release. There need to be some refinements to some of the implementations for conformance, but overall it works reliably. Signed-off-by: Laslo Hunhold <dev@frign.de> 40bc7dffb7a0ae319dba093809fc2f27a59ed8d1 2025-12-24T07:05:10Z 2025-12-24T07:05:10Z Update README Laslo Hunhold laslo@hunhold.de commit 40bc7dffb7a0ae319dba093809fc2f27a59ed8d1 parent 2f5fb9740e29dd344ec8fbbcde966b71fcd3ffcf Author: Laslo Hunhold <laslo@hunhold.de> Date: Wed, 24 Dec 2025 08:05:10 +0100 Update README Signed-off-by: Laslo Hunhold <dev@frign.de> 2f5fb9740e29dd344ec8fbbcde966b71fcd3ffcf 2025-11-30T21:42:57Z 2025-11-30T21:42:57Z Update Unicode data to version 17.0.0 Laslo Hunhold laslo@hunhold.de commit 2f5fb9740e29dd344ec8fbbcde966b71fcd3ffcf parent 748658dfe549d531cf615c61de7453f8ace92b2b Author: Laslo Hunhold <laslo@hunhold.de> Date: Sun, 30 Nov 2025 22:42:57 +0100 Update Unicode data to version 17.0.0 While some tests fail for the bidirectional and line segmentation algorithms, the other algorithms pass all conformance tests. Thus, overall, including the new data files brings a net-benefit. Signed-off-by: Laslo Hunhold <dev@frign.de> 748658dfe549d531cf615c61de7453f8ace92b2b 2025-10-14T20:08:17Z 2025-10-14T20:08:17Z Rename .out.h to .gen.h Laslo Hunhold dev@frign.de commit 748658dfe549d531cf615c61de7453f8ace92b2b parent 5c252ef6a4a7f82364bc59c2733d858c3c7927e0 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 14 Oct 2025 22:08:17 +0200 Rename .out.h to .gen.h Signed-off-by: Laslo Hunhold <dev@frign.de> 5c252ef6a4a7f82364bc59c2733d858c3c7927e0 2025-10-14T19:58:20Z 2025-10-14T20:05:20Z Fully rework LUT generation Laslo Hunhold dev@frign.de commit 5c252ef6a4a7f82364bc59c2733d858c3c7927e0 parent 400ae9b5343687ebac8c1f3194197e792c34bfb4 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 14 Oct 2025 21:58:20 +0200 Fully rework LUT generation As you may have noticed, libgrapheme currently is two versions behind on Unicode. This is because they massively overhaul their algorithms with each release, and the existing data model I developed came to its limits. For each algorithm, it is necessary to extract properties from multiple files, and it is kind of a hack when two properties coincide, complicating the code. The only solution was to fully rethink the data generation, including the compression. Here's what's changed: 1) Multiple properties are now possible, using a bitfield approach 2) Data compression is facilitated by a third dictionary stage. For the provided first port of the character properties, we reduce the LUT size from 35K to 23K, making it possible for them to reside in L1, promising more performance. 3) We don't need any of the ugly postprocessing, or magic 'temporary' classes, etc., to work around the too stiff data structures. The old infrastructure remains in gen/, the new one is put in gen2/. Once everything is fully ported, gen/ is removed and gen2/ renamed to gen/. One after another, this will allow us to port libgrapheme to the latest Unicode version. Signed-off-by: Laslo Hunhold <dev@frign.de> 400ae9b5343687ebac8c1f3194197e792c34bfb4 2025-10-14T19:55:36Z 2025-10-14T19:55:36Z Update license Laslo Hunhold dev@frign.de commit 400ae9b5343687ebac8c1f3194197e792c34bfb4 parent 65b354f0fcb1d925f4340dbb4415ea06e8af2bec Author: Laslo Hunhold <dev@frign.de> Date: Tue, 14 Oct 2025 21:55:36 +0200 Update license Signed-off-by: Laslo Hunhold <dev@frign.de> 65b354f0fcb1d925f4340dbb4415ea06e8af2bec 2024-09-01T20:42:18Z 2024-09-01T20:45:28Z Update grapheme break algorithm to Unicode version 15.1.0 Laslo Hunhold dev@frign.de commit 65b354f0fcb1d925f4340dbb4415ea06e8af2bec parent 3ee106e4ab1d5fe4696ab9089f052706d7cb9a48 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 1 Sep 2024 22:42:18 +0200 Update grapheme break algorithm to Unicode version 15.1.0 While the change to the algorithm looks harmless in the specification, it comes at the price of more complexity because we have to keep track of a relatively complex state for a sequence of indic conjunct breaks. Fortunately adding so many additional classes only decreases the compression ratio for the grapheme cluster LUTs by ~0.5%. We now pass all 1187 character tests. Signed-off-by: Laslo Hunhold <dev@frign.de> 3ee106e4ab1d5fe4696ab9089f052706d7cb9a48 2024-09-01T15:04:18Z 2024-09-01T15:04:18Z Bump Unicode version and data to 15.1.0 Laslo Hunhold dev@frign.de commit 3ee106e4ab1d5fe4696ab9089f052706d7cb9a48 parent f01674957f6816d55b8ed1b38a96c4ef5e3120b2 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 1 Sep 2024 17:04:18 +0200 Bump Unicode version and data to 15.1.0 With this commit we just add the updated Unicode data. Given the Unicode consortium, again, fiddled with their algorithms it takes a few subsequent commits to fully support Unicode 15.1.0. Unicode 16.0, scheduled to be released this month, should be a simple corollary; let's see... Signed-off-by: Laslo Hunhold <dev@frign.de> f01674957f6816d55b8ed1b38a96c4ef5e3120b2 2024-09-01T12:57:28Z 2024-09-01T12:57:28Z Don't warn about overlength strings in test data Laslo Hunhold dev@frign.de commit f01674957f6816d55b8ed1b38a96c4ef5e3120b2 parent d56ad5ac8ac47037a86d52e3445e3c5d4dc81a4b Author: Laslo Hunhold <dev@frign.de> Date: Sun, 1 Sep 2024 14:57:28 +0200 Don't warn about overlength strings in test data Signed-off-by: Laslo Hunhold <dev@frign.de> d56ad5ac8ac47037a86d52e3445e3c5d4dc81a4b 2024-09-01T12:26:07Z 2024-09-01T12:56:16Z Fix typo in man/libgrapheme.sh Laslo Hunhold dev@frign.de commit d56ad5ac8ac47037a86d52e3445e3c5d4dc81a4b parent c8b34aa04ac8702e55ba4b8946d6794c9c6056f5 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 1 Sep 2024 14:26:07 +0200 Fix typo in man/libgrapheme.sh Thanks to Omar Polo <op@omarpolo.com> for reporting these! Signed-off-by: Laslo Hunhold <dev@frign.de> c8b34aa04ac8702e55ba4b8946d6794c9c6056f5 2023-12-01T08:37:28Z 2023-12-01T08:39:38Z Close data file in parse_file_with_callback() at the end Laslo Hunhold dev@frign.de commit c8b34aa04ac8702e55ba4b8946d6794c9c6056f5 parent af792ebe99c6301fb5b5436f856b9589ad0fd5ea Author: Laslo Hunhold <dev@frign.de> Date: Fri, 1 Dec 2023 09:37:28 +0100 Close data file in parse_file_with_callback() at the end This otherwise leads to build problems on macOS because of too many open files due to leaked file descriptors. Thank you, zeromake (https://blog.zeromake.com), for reporting this! Signed-off-by: Laslo Hunhold <dev@frign.de> af792ebe99c6301fb5b5436f856b9589ad0fd5ea 2023-05-29T20:21:57Z 2023-05-29T20:21:57Z Free generated mirror-LUT data structures to avoid memory leak Laslo Hunhold dev@frign.de commit af792ebe99c6301fb5b5436f856b9589ad0fd5ea parent 719d805b28b9e34d5f5e83fcbdb0fbb41c20ec6d Author: Laslo Hunhold <dev@frign.de> Date: Mon, 29 May 2023 22:21:57 +0200 Free generated mirror-LUT data structures to avoid memory leak This is a technicality, but this satisfies the clang dynamic memory analyzer. Signed-off-by: Laslo Hunhold <dev@frign.de> 719d805b28b9e34d5f5e83fcbdb0fbb41c20ec6d 2023-05-29T08:37:49Z 2023-05-29T08:37:49Z Reflect mirroring in the bidi-tests Laslo Hunhold dev@frign.de commit 719d805b28b9e34d5f5e83fcbdb0fbb41c20ec6d parent a17b629bb30ac9c0e3e7343449dc42085bb2fc59 Author: Laslo Hunhold <dev@frign.de> Date: Mon, 29 May 2023 10:37:49 +0200 Reflect mirroring in the bidi-tests The bidi-tests do not contain mirrored test data, so we need to generate it ad-hoc using the generated mirror-LUTs. Signed-off-by: Laslo Hunhold <dev@frign.de> a17b629bb30ac9c0e3e7343449dc42085bb2fc59 2023-05-29T08:34:37Z 2023-05-29T08:34:37Z Fix bidi-line-level-loop boundaries Laslo Hunhold dev@frign.de commit a17b629bb30ac9c0e3e7343449dc42085bb2fc59 parent ba923230c7b25b0737d151c3f607a75b63676456 Author: Laslo Hunhold <dev@frign.de> Date: Mon, 29 May 2023 10:34:37 +0200 Fix bidi-line-level-loop boundaries The first change was caught using dynamic code analysis and prevents access to uninitialized memory (it wouldn't be worse than that, though, given we do not access memory we are not allowed to and the consequences are harmless). The second change was found by eyesight. Signed-off-by: Laslo Hunhold <dev@frign.de> ba923230c7b25b0737d151c3f607a75b63676456 2023-05-29T06:37:11Z 2023-05-29T08:33:09Z Silence strict casting warnings and apply bidi mirroring Laslo Hunhold dev@frign.de commit ba923230c7b25b0737d151c3f607a75b63676456 parent c2aa140007c3fe8f6b58839668219e9c8414865b Author: Laslo Hunhold <dev@frign.de> Date: Mon, 29 May 2023 08:37:11 +0200 Silence strict casting warnings and apply bidi mirroring The mirroring-part must have been accidentally dropped in one previous refactoring. Signed-off-by: Laslo Hunhold <dev@frign.de> c2aa140007c3fe8f6b58839668219e9c8414865b 2023-05-26T10:19:55Z 2023-05-26T10:19:55Z Apply clang format Laslo Hunhold dev@frign.de commit c2aa140007c3fe8f6b58839668219e9c8414865b parent 98e8632689b89f9f25d2a7091e7315f7d48881bc Author: Laslo Hunhold <dev@frign.de> Date: Fri, 26 May 2023 12:19:55 +0200 Apply clang format Signed-off-by: Laslo Hunhold <dev@frign.de> 98e8632689b89f9f25d2a7091e7315f7d48881bc 2023-05-26T08:23:06Z 2023-05-26T08:23:06Z Update Unicode data license Laslo Hunhold dev@frign.de commit 98e8632689b89f9f25d2a7091e7315f7d48881bc parent 4a4919e8764d3e88c4e33da537f42a0557a8bcf5 Author: Laslo Hunhold <dev@frign.de> Date: Fri, 26 May 2023 10:23:06 +0200 Update Unicode data license Signed-off-by: Laslo Hunhold <dev@frign.de> 4a4919e8764d3e88c4e33da537f42a0557a8bcf5 2023-05-26T08:20:32Z 2023-05-26T08:20:32Z Properly parse reorder list Laslo Hunhold dev@frign.de commit 4a4919e8764d3e88c4e33da537f42a0557a8bcf5 parent 7ddf17bf2f20b598d204f32d441e8ea30765b577 Author: Laslo Hunhold <dev@frign.de> Date: Fri, 26 May 2023 10:20:32 +0200 Properly parse reorder list It worked all fine for the almost million conformance tests, except for test number 490894, given its length exceeds 127 and thus the reorder levels don't fit in a signed 8-bit-integer. This is now fixed by making it 16 bits and making the parsing even stricter so we will not miss out on errors of this kind in this part of the code again. We now pass all the tests. Signed-off-by: Laslo Hunhold <dev@frign.de> 7ddf17bf2f20b598d204f32d441e8ea30765b577 2023-05-26T08:02:58Z 2023-05-26T08:02:58Z Add resolved paragraph direction to tests Laslo Hunhold dev@frign.de commit 7ddf17bf2f20b598d204f32d441e8ea30765b577 parent 1815d4d8d141da580372c678c3e38fab0e948d52 Author: Laslo Hunhold <dev@frign.de> Date: Fri, 26 May 2023 10:02:58 +0200 Add resolved paragraph direction to tests Only the tests in BidiCharacterTests.txt specify the resolved direction, so we express the non-specification by using the neutral-direction-enum-type. Running the tests, I noticed a small mistake I made, leading to the wrong resolved type being emitted. The final solution is to use a proper enum-return-type for the paragraph_level-function, which has been added as a TODO. Signed-off-by: Laslo Hunhold <dev@frign.de> 1815d4d8d141da580372c678c3e38fab0e948d52 2023-05-26T07:53:24Z 2023-05-26T07:53:45Z Update bidi tests to also check reordering Laslo Hunhold dev@frign.de commit 1815d4d8d141da580372c678c3e38fab0e948d52 parent 52ee78ea80d51b163f7fc85e9387389266d2331b Author: Laslo Hunhold <dev@frign.de> Date: Fri, 26 May 2023 09:53:24 +0200 Update bidi tests to also check reordering We already implemented the reordering extraction, which is why we only needed to add the handling in the test-binary itself. Signed-off-by: Laslo Hunhold <dev@frign.de> 52ee78ea80d51b163f7fc85e9387389266d2331b 2023-05-26T07:40:10Z 2023-05-26T07:51:44Z Refactor bidi and add reordering function Laslo Hunhold dev@frign.de commit 52ee78ea80d51b163f7fc85e9387389266d2331b parent 77e30a69ce0807fbee01d43eebedda34b54f41af Author: Laslo Hunhold <dev@frign.de> Date: Fri, 26 May 2023 09:40:10 +0200 Refactor bidi and add reordering function - Rename bidi-override enum to bidi-direction, including entries. This better reflects the general nature of it. - Remove UTF-8-related bidi-functions, given it would be too complicated to reflect in an API and opens up some very difficult challenges. - Rename *_preprocess to *_preprocess_paragraph and return the resolved paragraph embedding level as an optional out-parameter. This is the only way to meaningfully handle large chunks of text with paragraphs of different embedding levels. - Separate the get_paragraph_level() function into two for isolated-paragraphs and whole paragraphs. This simplifies it a lot, as we don't have the crazy bool-flag-mess any more. - Add a grapheme_bidirectional_reorder_line function that directly operates on preprocessed data and returns the reordered string without any additionally necessary buffering. For this the get_line_embedding_levels had to be made a bit more general to allow different ways of writing the levels into the output. This function makes use of the mirror-LUT and has a small section still commented out regarding the proper inversion of grapheme clusters that will need more investigation. Signed-off-by: Laslo Hunhold <dev@frign.de> 77e30a69ce0807fbee01d43eebedda34b54f41af 2023-05-24T16:05:39Z 2023-05-24T16:05:39Z Add generating code for bidirectional character mirror mappings Laslo Hunhold dev@frign.de commit 77e30a69ce0807fbee01d43eebedda34b54f41af parent f320b0ad8b7b2bc7ab5b63e91379012adbd19d12 Author: Laslo Hunhold <dev@frign.de> Date: Wed, 24 May 2023 18:05:39 +0200 Add generating code for bidirectional character mirror mappings Signed-off-by: Laslo Hunhold <dev@frign.de> f320b0ad8b7b2bc7ab5b63e91379012adbd19d12 2023-05-11T16:16:09Z 2023-05-11T16:16:09Z Allow level-array to have different size from line length Laslo Hunhold dev@frign.de commit f320b0ad8b7b2bc7ab5b63e91379012adbd19d12 parent c0cab63c5300fa12284194fbef57aa2ed62a94c0 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 11 May 2023 18:16:09 +0200 Allow level-array to have different size from line length This may not be apparent at first, but it allows you to only extract as many levels of a line as you need, e.g. only the first 10. Truncation is indicated by the return value being larger than levlen. Signed-off-by: Laslo Hunhold <dev@frign.de> c0cab63c5300fa12284194fbef57aa2ed62a94c0 2023-02-24T17:26:22Z 2023-02-24T17:26:22Z Fix a small typo in configure Laslo Hunhold dev@frign.de commit c0cab63c5300fa12284194fbef57aa2ed62a94c0 parent c0d28c3cad5c9e02dfa93b3ff3e6953ad0f22d75 Author: Laslo Hunhold <dev@frign.de> Date: Fri, 24 Feb 2023 18:26:22 +0100 Fix a small typo in configure Signed-off-by: Laslo Hunhold <dev@frign.de> c0d28c3cad5c9e02dfa93b3ff3e6953ad0f22d75 2023-02-24T17:21:02Z 2023-02-24T17:23:45Z Replace all POSIX-features to become fully ISO-C99 Laslo Hunhold dev@frign.de commit c0d28c3cad5c9e02dfa93b3ff3e6953ad0f22d75 parent 0e95e5c797b1dc41117e1ea5455f2a7f2932868d Author: Laslo Hunhold <dev@frign.de> Date: Fri, 24 Feb 2023 18:21:02 +0100 Replace all POSIX-features to become fully ISO-C99 As it turned out, the only things that needed replacing were getline(), strdup() and the timing in the benchmarks. Analogously, we replace -D_DEFAULT_SOURCE with -D_ISOC99_SOURCE. This way we further extend the number of platforms where libgrapheme can be compiled and run on, e.g. MSVC still does not include getline(). Signed-off-by: Laslo Hunhold <dev@frign.de> 0e95e5c797b1dc41117e1ea5455f2a7f2932868d 2023-02-23T22:16:46Z 2023-02-23T22:16:46Z Port build system to MinGW-W64/Cygwin Laslo Hunhold dev@frign.de commit 0e95e5c797b1dc41117e1ea5455f2a7f2932868d parent 53f5421ae389b0312bdcab1c715a03f175a58b07 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 23 Feb 2023 23:16:46 +0100 Port build system to MinGW-W64/Cygwin This requires the ability to specify executable-suffixes. We trick a bit by not diving into the import library madness for MSVC and instead act as if we exported the import library "libgrapheme.lib", which however is just the static library. Signed-off-by: Laslo Hunhold <dev@frign.de> 53f5421ae389b0312bdcab1c715a03f175a58b07 2022-11-29T22:45:10Z 2022-11-29T22:45:10Z Fix bidi purge loop logic a bit Laslo Hunhold dev@frign.de commit 53f5421ae389b0312bdcab1c715a03f175a58b07 parent bbbc72cba69445535dd035dfe1ee10d473655629 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 29 Nov 2022 23:45:10 +0100 Fix bidi purge loop logic a bit Otherwise you could skip one element by accident. This does not have direct consequences, but may lead to slightly wrong behaviour when there are stray opening brackets. Signed-off-by: Laslo Hunhold <dev@frign.de> bbbc72cba69445535dd035dfe1ee10d473655629 2022-11-29T22:23:53Z 2022-11-29T22:23:53Z Implement bidirectional bracket support Laslo Hunhold dev@frign.de commit bbbc72cba69445535dd035dfe1ee10d473655629 parent b9e1d4bbd4ce6a539999560c1cc863b645a080cd Author: Laslo Hunhold <dev@frign.de> Date: Tue, 29 Nov 2022 23:23:53 +0100 Implement bidirectional bracket support The single rule N0 in the Unicode Bidirectional Algorithm may not sound like much, but it packs quite a punch and required some deep work. It wasn't exactly made simpler by the fact that the document is very convoluted and not easy to follow. However, it helps to have experience from the other algorithms and the automatic tests allow very broad confirmation of proper function. In particular, the following changes needed to be made: The generator had to be modified to - Implement a decompositon to match canonically equivalent brackets. This requires us to have UnicodeData.txt present, but what matters is that the end result is fast and small. - The LUT-printing automatically detects type, because it's just too fragile otherwise. The implementation of the algorithm itself had the following changes: - The last strong type property of an isolate runner has been refactored to be stateless. Otherwise, you can end up with subtle bugs where strong types are added beforehand, yielding a TOCTOU-problem. - The bracket parsing makes use of a novel FIFO structure that combines the best of both worlds between a stack and naive implementation. As an end result, we now pass all ~900k bidi tests from the Unicode standard. Signed-off-by: Laslo Hunhold <dev@frign.de> b9e1d4bbd4ce6a539999560c1cc863b645a080cd 2022-11-24T12:29:31Z 2022-11-24T14:51:06Z Do not falsely read entire buffer instead of simply the filled with Laslo Hunhold dev@frign.de commit b9e1d4bbd4ce6a539999560c1cc863b645a080cd parent 0d043e0a0cd062ea09d8238b33a97049fea9bc8b Author: Laslo Hunhold <dev@frign.de> Date: Thu, 24 Nov 2022 13:29:31 +0100 Do not falsely read entire buffer instead of simply the filled with This was caught via dynamic analysis (clang asan), which I can definitely recommend. Rust evangelists might see this as a prime example for why C is bad, but I still think the benefits outweigh the risks if you consider the maturity of tooling to catch these kinds of errors. In an ideal world we would all be programming in Ada, but C's portability is unmatched. Signed-off-by: Laslo Hunhold <dev@frign.de> 0d043e0a0cd062ea09d8238b33a97049fea9bc8b 2022-11-24T12:29:10Z 2022-11-24T12:29:10Z Apply format Laslo Hunhold dev@frign.de commit 0d043e0a0cd062ea09d8238b33a97049fea9bc8b parent 4e43b1bc0e0e50f883ed25b1e542828529006216 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 24 Nov 2022 13:29:10 +0100 Apply format Signed-off-by: Laslo Hunhold <dev@frign.de> 4e43b1bc0e0e50f883ed25b1e542828529006216 2022-11-21T11:34:22Z 2022-11-21T11:34:22Z Add "check" target to .PHONY Laslo Hunhold dev@frign.de commit 4e43b1bc0e0e50f883ed25b1e542828529006216 parent ea1be565ad117a3e9846ae0e855d41021d94ee8a Author: Laslo Hunhold <dev@frign.de> Date: Mon, 21 Nov 2022 12:34:22 +0100 Add "check" target to .PHONY Thanks to Tom Schwindl for noticing this! ea1be565ad117a3e9846ae0e855d41021d94ee8a 2022-11-21T10:05:26Z 2022-11-21T10:06:37Z Refactor state into unsigned integer Laslo Hunhold dev@frign.de commit ea1be565ad117a3e9846ae0e855d41021d94ee8a parent f517655a98a155694cf57c180531724baa081c26 Author: Laslo Hunhold <dev@frign.de> Date: Mon, 21 Nov 2022 11:05:26 +0100 Refactor state into unsigned integer Now that we separated the level-determination itself, there is no need to have a signed integer for this purpose. This simplifies the masking. f517655a98a155694cf57c180531724baa081c26 2022-11-21T08:46:38Z 2022-11-21T08:46:38Z Implement bidirectional rule L1.4 Laslo Hunhold dev@frign.de commit f517655a98a155694cf57c180531724baa081c26 parent 07ba2622e073850bbdd6acd8dff88b391cc5ad5c Author: Laslo Hunhold <dev@frign.de> Date: Mon, 21 Nov 2022 09:46:38 +0100 Implement bidirectional rule L1.4 For this, we first make use of our paragraph level slot in each data point and store it for each. This way, even if the data buffer is arbitrarily split up, we always know what the current paragraph level is. Secondly, we add the rule L1.4 itself, which is very similar to the existing implementation of rules L1.1-L1.3. 07ba2622e073850bbdd6acd8dff88b391cc5ad5c 2022-11-21T07:53:14Z 2022-11-21T07:53:14Z Split bidi-level-processing into preprocessing and line step Laslo Hunhold dev@frign.de commit 07ba2622e073850bbdd6acd8dff88b391cc5ad5c parent aafe6c300e59ed1b4407c71917fb2034fdc7798a Author: Laslo Hunhold <dev@frign.de> Date: Mon, 21 Nov 2022 08:53:14 +0100 Split bidi-level-processing into preprocessing and line step The bidirectional algorithm is a bit convoluted in this regard, but the canonical choice for the implementation is to do preprocessing on all paragraphs first (applying all rules up to L1.3) and applying rule L1.4 separately. The reason for this is that rule L1.4 requires the knowledge about line break positions, which we don't have (yet). We could take it as a parameter for the preprocessing-function, however, line breaks may change often (think of an ncurses-context with window resizes), making constant complete reprocessings very wasteful. Thus, the line-specific processing is put into a separate function. This way, the user passes each individual line together with its preprocessing data. Rule L1.4 will be implemented in a later commit. aafe6c300e59ed1b4407c71917fb2034fdc7798a 2022-11-20T22:37:17Z 2022-11-20T22:37:17Z Refactor bidirectional state handling Laslo Hunhold dev@frign.de commit aafe6c300e59ed1b4407c71917fb2034fdc7798a parent fd2d1969084185ff5e638c28066d0d35d510b7f0 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 20 Nov 2022 23:37:17 +0100 Refactor bidirectional state handling The best approach is to have only one place where state is kept and no risk of "stale" state disturbing program execution. Hand-managing state in the isolate-runner was thus problematic, as there was the real risk of sliding into stale state. Even though this is manageable, it makes the code relatively fragile and hard to debug. In another aspect, the serialization was a mess and was in dire need of more structure. The state currently still contains a "raw property", but this will be removed once the API has been properly split between the preprocessing and line-processing steps. The modified array is put within an #if 0-guard. Signed-off-by: Laslo Hunhold <dev@frign.de> fd2d1969084185ff5e638c28066d0d35d510b7f0 2022-11-17T22:47:45Z 2022-11-17T22:47:45Z Refactor prev_prop into prev-struct with a single member prop Laslo Hunhold dev@frign.de commit fd2d1969084185ff5e638c28066d0d35d510b7f0 parent a796095218b0524f957f76d6f3b501ebda700d44 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 17 Nov 2022 23:47:45 +0100 Refactor prev_prop into prev-struct with a single member prop This makes it more consistent across the "cur" and "next" structs. Signed-off-by: Laslo Hunhold <dev@frign.de> a796095218b0524f957f76d6f3b501ebda700d44 2022-11-15T20:08:50Z 2022-11-15T20:08:50Z Add a check make-target as an alias for test Laslo Hunhold dev@frign.de commit a796095218b0524f957f76d6f3b501ebda700d44 parent abdc2ba0c764c527aaa2ed9fe42db27d71a10bc2 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 15 Nov 2022 21:08:50 +0100 Add a check make-target as an alias for test It's one extra line but helps a bit as the "community" seems to be a bit split on how to call it (test or check). Signed-off-by: Laslo Hunhold <dev@frign.de> abdc2ba0c764c527aaa2ed9fe42db27d71a10bc2 2022-11-15T14:53:56Z 2022-11-15T14:54:35Z Apply clang-format Laslo Hunhold dev@frign.de commit abdc2ba0c764c527aaa2ed9fe42db27d71a10bc2 parent 50efb9a3396588e6e1266f51ec5446a9fa8013ea Author: Laslo Hunhold <dev@frign.de> Date: Tue, 15 Nov 2022 15:53:56 +0100 Apply clang-format Even though this disrupts the backtrackability of the code a bit, it's better to rip the band aid off now than to push it on into the future. With these changes, formatting is automatically governed and ensured by a simple call to make format Signed-off-by: Laslo Hunhold <dev@frign.de> 50efb9a3396588e6e1266f51ec5446a9fa8013ea 2022-11-15T14:35:01Z 2022-11-15T14:40:26Z Add .clang-format file and make-rule Laslo Hunhold dev@frign.de commit 50efb9a3396588e6e1266f51ec5446a9fa8013ea parent 3a735213d6da553d9235c5cad2732048242ada97 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 15 Nov 2022 15:35:01 +0100 Add .clang-format file and make-rule This is inspired by OpenBSD KNF and the common suckless style approach (with possible deviations). Tabs are used for indentation, spaces for alignment. Signed-off-by: Laslo Hunhold <dev@frign.de> 3a735213d6da553d9235c5cad2732048242ada97 2022-11-15T14:32:15Z 2022-11-15T14:32:15Z Also mark argv as unused in test/bidirectional.c Laslo Hunhold dev@frign.de commit 3a735213d6da553d9235c5cad2732048242ada97 parent 64c136162a2830374522b993df86d8a0a852422a Author: Laslo Hunhold <dev@frign.de> Date: Tue, 15 Nov 2022 15:32:15 +0100 Also mark argv as unused in test/bidirectional.c Signed-off-by: Laslo Hunhold <dev@frign.de> 64c136162a2830374522b993df86d8a0a852422a 2022-11-13T08:41:03Z 2022-11-13T08:41:03Z Remove redundant initialization Laslo Hunhold dev@frign.de commit 64c136162a2830374522b993df86d8a0a852422a parent be3430ca6b7d275d3691f126ad65e84d732ebbb1 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 13 Nov 2022 09:41:03 +0100 Remove redundant initialization Signed-off-by: Laslo Hunhold <dev@frign.de> be3430ca6b7d275d3691f126ad65e84d732ebbb1 2022-11-13T08:15:33Z 2022-11-13T08:15:33Z Only copy current reorder into test if it is not NULL in bidi-testgen Laslo Hunhold dev@frign.de commit be3430ca6b7d275d3691f126ad65e84d732ebbb1 parent 558b9cc3bc6961d26104cf726fe148f58ba36940 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 13 Nov 2022 09:15:33 +0100 Only copy current reorder into test if it is not NULL in bidi-testgen Signed-off-by: Laslo Hunhold <dev@frign.de> 558b9cc3bc6961d26104cf726fe148f58ba36940 2022-11-13T08:12:25Z 2022-11-13T08:12:25Z Prevent two theoretical null-pointer-dereferences in gen/util.c Laslo Hunhold dev@frign.de commit 558b9cc3bc6961d26104cf726fe148f58ba36940 parent 5a3f01e8a1b9a7847dad17260dd859d5c92bb6bd Author: Laslo Hunhold <dev@frign.de> Date: Sun, 13 Nov 2022 09:12:25 +0100 Prevent two theoretical null-pointer-dereferences in gen/util.c This was found using static analysis and is not a security issue given this is in the generating code, so no runtime-affection. The worst that could've happened beforehand is that the generating code segfaults and produces garbage tables which would lead to compilation failure. Signed-off-by: Laslo Hunhold <dev@frign.de> 5a3f01e8a1b9a7847dad17260dd859d5c92bb6bd 2022-11-02T19:18:27Z 2022-11-02T21:31:17Z Add configure-script to dist-archive Laslo Hunhold dev@frign.de commit 5a3f01e8a1b9a7847dad17260dd859d5c92bb6bd parent 2165664f6e2fa381eea54b9f887f152df2d9f817 Author: Laslo Hunhold <dev@frign.de> Date: Wed, 2 Nov 2022 20:18:27 +0100 Add configure-script to dist-archive Signed-off-by: Laslo Hunhold <dev@frign.de> 2165664f6e2fa381eea54b9f887f152df2d9f817 2022-10-29T23:29:19Z 2022-10-29T23:30:14Z Keep direct pointer at bracket-struct in bidi-state Laslo Hunhold dev@frign.de commit 2165664f6e2fa381eea54b9f887f152df2d9f817 parent df25b40e3ba37e63bf914c199de448c01b3d1b6e Author: Laslo Hunhold <dev@frign.de> Date: Sun, 30 Oct 2022 01:29:19 +0200 Keep direct pointer at bracket-struct in bidi-state This makes the information easier to access instead of having to turn the offset in the bracket-array to a pointer in every case we use it. Signed-off-by: Laslo Hunhold <dev@frign.de> df25b40e3ba37e63bf914c199de448c01b3d1b6e 2022-10-28T23:29:53Z 2022-10-28T23:30:15Z Update configure to make it idempotent again and add MidnightBSD Laslo Hunhold dev@frign.de commit df25b40e3ba37e63bf914c199de448c01b3d1b6e parent 6769c08f08ab6bb86301f941028641e6314b8e9e Author: Laslo Hunhold <dev@frign.de> Date: Sat, 29 Oct 2022 01:29:53 +0200 Update configure to make it idempotent again and add MidnightBSD Signed-off-by: Laslo Hunhold <dev@frign.de> 6769c08f08ab6bb86301f941028641e6314b8e9e 2022-10-28T23:11:48Z 2022-10-28T23:11:48Z Add bracket-pair-parsing and refactor bidi-state-management Laslo Hunhold dev@frign.de commit 6769c08f08ab6bb86301f941028641e6314b8e9e parent c031ada2cb11489c032f6ddd84fa7091efe6c784 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 29 Oct 2022 01:11:48 +0200 Add bracket-pair-parsing and refactor bidi-state-management As announced, to fully implement the bidirectional algorithm, it is still necessary to implement rule N0 which requires access to the bracket properties of each character. Inspired by how we solved it in gen/case.h we go with a bitwise-approach. Regarding the state-management, it's a difficult balance between type-safety, readability and correctness, but I went with the approach that offered the least redundancy and relatively good readability. Signed-off-by: Laslo Hunhold <dev@frign.de> c031ada2cb11489c032f6ddd84fa7091efe6c784 2022-10-28T15:09:44Z 2022-10-28T15:09:44Z Add UINT32_C()-macro around constant Laslo Hunhold dev@frign.de commit c031ada2cb11489c032f6ddd84fa7091efe6c784 parent 6375ae6d522413ba1a6e3b2a62c6e5e99349aafa Author: Laslo Hunhold <dev@frign.de> Date: Fri, 28 Oct 2022 17:09:44 +0200 Add UINT32_C()-macro around constant Signed-off-by: Laslo Hunhold <dev@frign.de> 6375ae6d522413ba1a6e3b2a62c6e5e99349aafa 2022-10-28T15:08:41Z 2022-10-28T15:09:21Z Refactor post_process()-function to take the entire property-array Laslo Hunhold dev@frign.de commit 6375ae6d522413ba1a6e3b2a62c6e5e99349aafa parent cd3a639d18c25942d0d48c8001f18222ba5899ef Author: Laslo Hunhold <dev@frign.de> Date: Fri, 28 Oct 2022 17:08:41 +0200 Refactor post_process()-function to take the entire property-array This does not make much of a difference, but gives enough flexibility for a later change to incorporate the bidi-bracket-property into the bidi-LUT. Signed-off-by: Laslo Hunhold <dev@frign.de> cd3a639d18c25942d0d48c8001f18222ba5899ef 2022-10-25T15:16:21Z 2022-10-25T15:16:21Z Move comments on macro-definition-lines to separate lines Laslo Hunhold dev@frign.de commit cd3a639d18c25942d0d48c8001f18222ba5899ef parent 4027860f6a5384fe60181d79337862bf53116bec Author: Laslo Hunhold <dev@frign.de> Date: Tue, 25 Oct 2022 17:16:21 +0200 Move comments on macro-definition-lines to separate lines The standard says Macro definitions are in the form: string1 = [string2] The macro named string1 is defined as having the value of string2, where string2 is defined as all characters, if any, after the <equals-sign>, up to a comment character ( '#' ) or an unescaped <newline>. Any <blank> characters immediately before or after the <equals-sign> shall be ignored. and thus having a declaration like MACRO = helloworld # comment yields with MACRO containing the value "helloworld ", which is obviously undesired for path-declarations. This is fixed now. Thanks to Ionen Wolkens for reporting this issue! Signed-off-by: Laslo Hunhold <dev@frign.de> 4027860f6a5384fe60181d79337862bf53116bec 2022-10-25T13:35:30Z 2022-10-25T13:39:12Z Install a simple pkg-config-file if desired Laslo Hunhold dev@frign.de commit 4027860f6a5384fe60181d79337862bf53116bec parent 5998352d2d2e6e37531548f8e986abae5ff8ef02 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 25 Oct 2022 15:35:30 +0200 Install a simple pkg-config-file if desired This was requested by a few packagers and it doesn't hurt to add a bit of metadata. Signed-off-by: Laslo Hunhold <dev@frign.de> 5998352d2d2e6e37531548f8e986abae5ff8ef02 2022-10-25T11:20:47Z 2022-10-25T11:49:50Z Implement the Unicode Bidirectional Algorithm (UAX #9) Laslo Hunhold dev@frign.de commit 5998352d2d2e6e37531548f8e986abae5ff8ef02 parent dd15fea026c3e0b389381ae8cc08e0f39fa1a8f7 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 25 Oct 2022 13:20:47 +0200 Implement the Unicode Bidirectional Algorithm (UAX #9) To be frank, I never heard about this until I started learning more about Unicode, but this is an absolute must for all languages that go from right to left (Hebrew, Arabic, Farsi, etc.) and any case where you mix RTL and LTR languages. The Unicode Bidirectional Algorithm is the normative procedure you apply on a string to obtain embedding levels that can then be used to reorder the string such that you obtain the proper reading direction. The central aspect is that strings are always stored LTR in memory and only reordered for presentation on the screen. Currently, only ICU and GNU fribidi implement the algorithm, and as usual it's pretty convoluted to use them. There are many memory allocations, kitchen-sink-madness and legacy cruft, but the demand is there (there's even a bidi-patch for dwm[0]). What's special about this implementation? There are no memory allocations at runtime. The user provides a 32-bit-integer-array which is then filled with the embedding levels. The levels themselves only range from -1 to 125 (by the standard!) and would fit in a signed 8-bit-integer, but the algorithm naturally needs a scratchpad to store processing data. A complication of the algorithm is that you, at some point, have to break the paragraph into lines and based on the line breaks the level determination is affected. GNU fribidi and ICU make this very complicated and hard to understand. The API is not final as you see it here, but the final process will be (each number corresponding to a function): 1) "preprocessing" the string up to the part where the algorithm does not depend on the line breaks 2) determining line embedding levels for a line (by specifying the preprocessed data buffer and an output level-buffer) 3) reordering a line (by specifying the preprocessed data buffer and an output string that is allowed to be the input string) Conformance is obviously a large priority: There are literally over a million automatic conformance tests for the bidirectional algorithm split across the files BidiTest.txt and BidiCharacterTest.txt that are automatically parsed into the header gen/bidirectional-test.h. Currently, only BidiTest.txt is used for tests (which we all pass), given bracket-pairs have not been implemented yet. This and (maybe) arabic shaping are what is left to be implemented, but this here is already a big step. One more note: Yes, the data files are very large, but they compress down very well and the tarball stays below 800K. It's very important to me that there's no need to pull any data from the web for compilation or testing for obvious reasons. [0]:https://dwm.suckless.org/patches/bidi/ Signed-off-by: Laslo Hunhold <dev@frign.de> dd15fea026c3e0b389381ae8cc08e0f39fa1a8f7 2022-10-13T22:40:37Z 2022-10-13T22:41:37Z Refactor src/bidirectional.c with Herodotus Laslo Hunhold dev@frign.de commit dd15fea026c3e0b389381ae8cc08e0f39fa1a8f7 parent efb2f452b6d1327ba091ac8a69556a060401afed Author: Laslo Hunhold <dev@frign.de> Date: Fri, 14 Oct 2022 00:40:37 +0200 Refactor src/bidirectional.c with Herodotus This simplifies a lot of the code and makes it more consistent as it now uses patterns that are similar to those in src/case.c. The most significant effect is of course the guarantees that come with using this interface. Signed-off-by: Laslo Hunhold <dev@frign.de> efb2f452b6d1327ba091ac8a69556a060401afed 2022-10-13T21:54:28Z 2022-10-13T21:54:28Z Merge branch 'master' into bidirectional Laslo Hunhold dev@frign.de commit efb2f452b6d1327ba091ac8a69556a060401afed parent f2783665bc71b9b1f1b72830629c3724bd8e1ae4 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 13 Oct 2022 23:54:28 +0200 Merge branch 'master' into bidirectional This brings this branch up to speed with the previous work. Signed-off-by: Laslo Hunhold <dev@frign.de> a591d58a3fb3abf40956c3017118da7f33a84bea 2022-10-11T21:21:54Z 2022-10-11T21:21:54Z Update README to reflect the ./configure-script Laslo Hunhold dev@frign.de commit a591d58a3fb3abf40956c3017118da7f33a84bea parent 30766915c37d88fc423a4d750227a769e7a307ae Author: Laslo Hunhold <dev@frign.de> Date: Tue, 11 Oct 2022 23:21:54 +0200 Update README to reflect the ./configure-script Signed-off-by: Laslo Hunhold <dev@frign.de> 30766915c37d88fc423a4d750227a769e7a307ae 2022-10-11T20:21:47Z 2022-10-11T20:21:47Z Add ./configure-script with presets for common systems Laslo Hunhold dev@frign.de commit 30766915c37d88fc423a4d750227a769e7a307ae parent 858c34a1e19bd790510bb918c583cea73487e64e Author: Laslo Hunhold <dev@frign.de> Date: Tue, 11 Oct 2022 22:21:47 +0200 Add ./configure-script with presets for common systems After quite a few requests and a bit of reflection on my behalf I've decided to add a very simple ./configure-script that automatically modifies config.mk to make it fit for common systems. Even though it's reasonable to simply have out-commentable options in the config.mk, it is admittedly more convenient to have such a script available, especially to accomodate more systems along the way. uname(1) is Posix compliant and this ./configure-script is in no way comparable to the horrible autoconf-insanity and won't take an eternity to run. It's also completely optional and merely a quality-of-life-addition for those working with libgrapheme manually. Signed-off-by: Laslo Hunhold <dev@frign.de> 858c34a1e19bd790510bb918c583cea73487e64e 2022-10-09T10:13:42Z 2022-10-09T10:14:58Z Bump to version 2.0.1 Laslo Hunhold dev@frign.de commit 858c34a1e19bd790510bb918c583cea73487e64e parent 657e9379807b215593e8c0706a51872b7870e8fe Author: Laslo Hunhold <dev@frign.de> Date: Sun, 9 Oct 2022 12:13:42 +0200 Bump to version 2.0.1 Hardened the code using static analysis and improved the build system to work perfectly on OpenBSD and macOS. Signed-off-by: Laslo Hunhold <dev@frign.de> 657e9379807b215593e8c0706a51872b7870e8fe 2022-10-08T11:17:47Z 2022-10-08T11:17:47Z Explicitly pop the reader-limit in to_titlecase() Laslo Hunhold dev@frign.de commit 657e9379807b215593e8c0706a51872b7870e8fe parent a1913f83b643e883aa6754d8078aee7d46f53aec Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 13:17:47 +0200 Explicitly pop the reader-limit in to_titlecase() This ensures that we don't have any stray limits on the stack and always have a clean state. Signed-off-by: Laslo Hunhold <dev@frign.de> a1913f83b643e883aa6754d8078aee7d46f53aec 2022-10-08T11:17:16Z 2022-10-08T11:17:30Z Avoid memory leak in break_test_list_free() Laslo Hunhold dev@frign.de commit a1913f83b643e883aa6754d8078aee7d46f53aec parent decd5b53f1f1303d1f351e85238cad4987b8b6f0 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 13:17:16 +0200 Avoid memory leak in break_test_list_free() Signed-off-by: Laslo Hunhold <dev@frign.de> decd5b53f1f1303d1f351e85238cad4987b8b6f0 2022-10-08T11:16:51Z 2022-10-08T11:16:51Z Avoid memory leak in character-benchmark Laslo Hunhold dev@frign.de commit decd5b53f1f1303d1f351e85238cad4987b8b6f0 parent 4182a14424c1e27b943187e230948ee31d6d66ba Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 13:16:51 +0200 Avoid memory leak in character-benchmark Signed-off-by: Laslo Hunhold <dev@frign.de> 4182a14424c1e27b943187e230948ee31d6d66ba 2022-10-08T11:14:48Z 2022-10-08T11:14:48Z Avoid undefined behaviour and memory leaks in case-data-generator Laslo Hunhold dev@frign.de commit 4182a14424c1e27b943187e230948ee31d6d66ba parent 004bdcf210baf1a63772bb7eca452bb0aeba010b Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 13:14:48 +0200 Avoid undefined behaviour and memory leaks in case-data-generator This was found using the clang-sanitizers and was pretty tough to spot. The first part does not influence program-operation as is, but checking first if tmp2 is NULL avoids undefined behaviour of adding a non-zero offset to NULL. Signed-off-by: Laslo Hunhold <dev@frign.de> 004bdcf210baf1a63772bb7eca452bb0aeba010b 2022-10-08T11:13:03Z 2022-10-08T11:13:03Z Prevent undefined behaviour in herodotus_reader_copy() Laslo Hunhold dev@frign.de commit 004bdcf210baf1a63772bb7eca452bb0aeba010b parent ef3e52a7f560f66df8ed1e2487872a1e62c5cedb Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 13:13:03 +0200 Prevent undefined behaviour in herodotus_reader_copy() The first part usually catches harmless cases like "NULL + 0", but the last part prevents integer overflow in some really crazy cases that are unlikely but can still happen. Signed-off-by: Laslo Hunhold <dev@frign.de> ef3e52a7f560f66df8ed1e2487872a1e62c5cedb 2022-10-08T09:22:18Z 2022-10-08T09:22:18Z Call ldconfig in a subshell Laslo Hunhold dev@frign.de commit ef3e52a7f560f66df8ed1e2487872a1e62c5cedb parent 28064303528f2604c5bf932b1478eb9f7c7ffc04 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 11:22:18 +0200 Call ldconfig in a subshell This prevents a syntax error when LDCONFIG is unset. Signed-off-by: Laslo Hunhold <dev@frign.de> 28064303528f2604c5bf932b1478eb9f7c7ffc04 2022-10-08T09:11:49Z 2022-10-08T09:11:49Z Check if LDCONFIG is set before calling it Laslo Hunhold dev@frign.de commit 28064303528f2604c5bf932b1478eb9f7c7ffc04 parent a6b3a194f0381c5aef9346d39b02eb058111d2a2 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 11:11:49 +0200 Check if LDCONFIG is set before calling it Otherwise this prints a warning in some make-implementations. Signed-off-by: Laslo Hunhold <dev@frign.de> a6b3a194f0381c5aef9346d39b02eb058111d2a2 2022-10-08T08:40:03Z 2022-10-08T08:46:31Z Enhance build-system to perfectly support OpenBSD and macOS Laslo Hunhold dev@frign.de commit a6b3a194f0381c5aef9346d39b02eb058111d2a2 parent d42f53b5baafe01caa48477e204b63e065660117 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 10:40:03 +0200 Enhance build-system to perfectly support OpenBSD and macOS Studying the source material on OpenBSD[0], it is written that Quite a few ports need tweaks to build shared libraries correctly anyways. Remember that building shared libraries should be done with $ cc -shared -fpic|-fPIC -o libfoo.so.4.5 obj1 obj2 Trying to rename the library after the fact to adjust the version number does not work: ELF libraries use some extra magic to set the library internal name, so you must link it with the correct version the first time. Thus, it is necessary to directly compile into $(SONAME), which is changed to in this commit. The magic flags for macOS were taken from [1]. It sets up the linker such that it automatically respects semantic versioning and will load any library with a smaller compatible version (e.g. same minor-version). Additionally, both OpenBSD and macOS have smarter linkers than Linux and don't need symlinks from varying versions to work right. Thus a flag SOSYMLINK was added to enable toggling this from the config.mk. For convenience, the best-practices for each platform are added to the config.mk in a commented-out form, saving everybody some time. [0]:https://www.openbsd.org/faq/ports/specialtopics.html#SharedLibs [1]:https://begriffs.com/posts/2021-07-04-shared-libraries.html#linking Signed-off-by: Laslo Hunhold <dev@frign.de> d42f53b5baafe01caa48477e204b63e065660117 2022-10-08T07:38:08Z 2022-10-08T07:38:08Z Move version statements back into the Makefile Laslo Hunhold dev@frign.de commit d42f53b5baafe01caa48477e204b63e065660117 parent ad4877023146953d4daa8d91c119124c38620337 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 8 Oct 2022 09:38:08 +0200 Move version statements back into the Makefile Ensure rebuilding on such changes by adding an explicit dependency on the Makefile in each prerequisite list that also contains config.mk. Signed-off-by: Laslo Hunhold <dev@frign.de> ad4877023146953d4daa8d91c119124c38620337 2022-10-07T15:33:10Z 2022-10-07T16:00:11Z Check for empty destination before NUL-terminating Christopher Wellons wellons@nullprogram.com commit ad4877023146953d4daa8d91c119124c38620337 parent 4b4292a8f78eec4271213982fdddaf1c479dfe96 Author: Christopher Wellons <wellons@nullprogram.com> Date: Fri, 7 Oct 2022 11:33:10 -0400 Check for empty destination before NUL-terminating This overflow was triggered in the second test of to_lowercase_utf8 where the destination is zero length (w->destlen == 0). `w->destlen` would overflow by subtraction, then the subscript would overflow the destination. Signed-off-by: Laslo Hunhold <dev@frign.de> 4b4292a8f78eec4271213982fdddaf1c479dfe96 2022-10-07T10:40:51Z 2022-10-07T10:40:51Z Remove superfluous printf-parameter from the example Laslo Hunhold dev@frign.de commit 4b4292a8f78eec4271213982fdddaf1c479dfe96 parent ef608a20a5431e68922e787cfdd68d893497d16f Author: Laslo Hunhold <dev@frign.de> Date: Fri, 7 Oct 2022 12:40:51 +0200 Remove superfluous printf-parameter from the example This fortunately has no functional effect, it's just redundant. Thanks to Kartik Agaram for reporting this! Signed-off-by: Laslo Hunhold <dev@frign.de> ef608a20a5431e68922e787cfdd68d893497d16f 2022-10-06T21:01:24Z 2022-10-06T21:04:50Z Bump to version 2.0.0 Laslo Hunhold dev@frign.de commit ef608a20a5431e68922e787cfdd68d893497d16f parent 1774b5430fe46d8d5511075d3cd644716ad4c3c8 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 6 Oct 2022 23:01:24 +0200 Bump to version 2.0.0 Signed-off-by: Laslo Hunhold <dev@frign.de> 1774b5430fe46d8d5511075d3cd644716ad4c3c8 2022-10-06T20:57:31Z 2022-10-06T20:57:31Z Update README Laslo Hunhold dev@frign.de commit 1774b5430fe46d8d5511075d3cd644716ad4c3c8 parent 5939cf21cdb050e1c9bce964a30c9ad94f7440b9 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 6 Oct 2022 22:57:31 +0200 Update README Signed-off-by: Laslo Hunhold <dev@frign.de> 5939cf21cdb050e1c9bce964a30c9ad94f7440b9 2022-10-05T22:12:50Z 2022-10-05T22:12:50Z Add is_case.sh to MAN_TEMPLATE Laslo Hunhold dev@frign.de commit 5939cf21cdb050e1c9bce964a30c9ad94f7440b9 parent f6ab5a6edf5eae9470f7eb6ee3062fd9a7865ead Author: Laslo Hunhold <dev@frign.de> Date: Thu, 6 Oct 2022 00:12:50 +0200 Add is_case.sh to MAN_TEMPLATE Signed-off-by: Laslo Hunhold <dev@frign.de> f6ab5a6edf5eae9470f7eb6ee3062fd9a7865ead 2022-10-05T22:02:29Z 2022-10-05T22:02:29Z Fix up smaller notational and type aspects for constants Laslo Hunhold dev@frign.de commit f6ab5a6edf5eae9470f7eb6ee3062fd9a7865ead parent 3ebd28c3e3ce50fd3370c587a0ec66e6c9489c83 Author: Laslo Hunhold <dev@frign.de> Date: Thu, 6 Oct 2022 00:02:29 +0200 Fix up smaller notational and type aspects for constants Signed-off-by: Laslo Hunhold <dev@frign.de> 3ebd28c3e3ce50fd3370c587a0ec66e6c9489c83 2022-10-05T21:48:51Z 2022-10-05T21:48:51Z Explicitly list util.o for benchmark/ and test/ as well Laslo Hunhold dev@frign.de commit 3ebd28c3e3ce50fd3370c587a0ec66e6c9489c83 parent 6a70e181676e97dfe8a4b9b369ef15d286caf772 Author: Laslo Hunhold <dev@frign.de> Date: Wed, 5 Oct 2022 23:48:51 +0200 Explicitly list util.o for benchmark/ and test/ as well Signed-off-by: Laslo Hunhold <dev@frign.de> 6a70e181676e97dfe8a4b9b369ef15d286caf772 2022-10-05T20:57:33Z 2022-10-05T20:57:33Z Explicitly clear suffix list and fix a small oversight Laslo Hunhold dev@frign.de commit 6a70e181676e97dfe8a4b9b369ef15d286caf772 parent ed7ebdc7f7fa748f89372e034d6d983835db5d42 Author: Laslo Hunhold <dev@frign.de> Date: Wed, 5 Oct 2022 22:57:33 +0200 Explicitly clear suffix list and fix a small oversight The suffix list contains some rules defined by the standard. This masked that gen/util.o was never covered by the rules c->o, which has been fixed. Signed-off-by: Laslo Hunhold <dev@frign.de> ed7ebdc7f7fa748f89372e034d6d983835db5d42 2022-10-05T20:14:17Z 2022-10-05T20:44:52Z Switch to semantic versioning and improve dynamic library handling Laslo Hunhold dev@frign.de commit ed7ebdc7f7fa748f89372e034d6d983835db5d42 parent b583c3ab6855d491154f7be6d3bdb5c44380290c Author: Laslo Hunhold <dev@frign.de> Date: Wed, 5 Oct 2022 22:14:17 +0200 Switch to semantic versioning and improve dynamic library handling After long consideration, I've made the decision to switch this project over to semantic versioning[0]. While it made sense for farbfeld in some way to use incremental versioning, for libraries it is almost canonical to make use of semantic versioning instead. Given there have been breaking API-changes since version 1 (which now corresponds to 1.0.0), the major version will naturally be bumped. Afterwards though, additions to the API will only trigger a minor bump, as is convention, while also making it possible to release patch-releases when there have been errors. Because, to be frank, if you only have full integers, you kind of get anxiety that a release is in fact correct, given you don't want to waste another whole integer-step on a simple bugfix. For farbfeld, which is very small and self-contained, it was okay, but libgrapheme has become complex enough to warrant this. Regarding dynamic library handling: I really read a lot about it and referred to some interesting articles like [1] to figure out what the best approach is to reflect versioning in the dynamic library. Doing this portably is quite difficult and the common approach to simply use the major version has some serious drawbacks, given a binary linked against the version 2.4 can falsely be linked against versions 2.3.x, 2.2.x, 2.1.x or 2.0.x at runtime, even though they lack functions added in 2.4 that might be used in the binary, something explicitly allowed in semantic versioning. A portable trick described in [1] is to set SONAME to contain MAJOR.MINOR and explicitly create symlinks from all "lower" MAJOR-MINOR- combinations with the same MAJOR-version to ensure forward-compatibility for all binaries linked against a certain MAJOR.MINOR-combination. This way, a library linked against libgrapheme-2.4 is properly linkable against libgrapheme-2.5 at runtime (given semantic versioning ensures forward compatibility), but at the same time, it will not allow linking against libgrapheme-2.2 (if that is installed), given it has no explicit symlink set from libgrapheme-2.2 at libgrapheme.2.5. [0]:https://semver.org/ [1]:https://begriffs.com/posts/2021-07-04-shared-libraries.html Signed-off-by: Laslo Hunhold <dev@frign.de> b583c3ab6855d491154f7be6d3bdb5c44380290c 2022-10-05T18:54:24Z 2022-10-05T18:54:24Z Fix sorting in grapheme.h Laslo Hunhold dev@frign.de commit b583c3ab6855d491154f7be6d3bdb5c44380290c parent 0aa5d262f8d0975341bcc60916e12044c7d64d0d Author: Laslo Hunhold <dev@frign.de> Date: Wed, 5 Oct 2022 20:54:24 +0200 Fix sorting in grapheme.h Signed-off-by: Laslo Hunhold <dev@frign.de> 0aa5d262f8d0975341bcc60916e12044c7d64d0d 2022-10-04T06:11:00Z 2022-10-04T06:11:40Z Use explicit constant-macro instead of cast Laslo Hunhold dev@frign.de commit 0aa5d262f8d0975341bcc60916e12044c7d64d0d parent 608a5c3c12c036871e74c9da12fe1fffb400e3f1 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 4 Oct 2022 08:11:00 +0200 Use explicit constant-macro instead of cast Thanks NRK for the suggestion! Signed-off-by: Laslo Hunhold <dev@frign.de> 608a5c3c12c036871e74c9da12fe1fffb400e3f1 2022-10-03T22:56:52Z 2022-10-03T22:56:52Z Remove hyphen from "bare metal" Laslo Hunhold dev@frign.de commit 608a5c3c12c036871e74c9da12fe1fffb400e3f1 parent 4a5b4abeec1b91986ec0258289abf79b6122531c Author: Laslo Hunhold <dev@frign.de> Date: Tue, 4 Oct 2022 00:56:52 +0200 Remove hyphen from "bare metal" Signed-off-by: Laslo Hunhold <dev@frign.de> 4a5b4abeec1b91986ec0258289abf79b6122531c 2022-10-03T22:17:04Z 2022-10-03T22:17:04Z Rework libgrapheme(7) a bit Laslo Hunhold dev@frign.de commit 4a5b4abeec1b91986ec0258289abf79b6122531c parent fc73d06fed76dd7cde37d3704949d01391ea0032 Author: Laslo Hunhold <dev@frign.de> Date: Tue, 4 Oct 2022 00:17:04 +0200 Rework libgrapheme(7) a bit Add the information about the library being freestanding and fix wordings a bit. Reflect in the first paragraph what the library can do. Signed-off-by: Laslo Hunhold <dev@frign.de> fc73d06fed76dd7cde37d3704949d01391ea0032 2022-10-03T21:13:26Z 2022-10-03T21:13:26Z Convert GRAPHEME_STATE to uint_least16_t and remove it Laslo Hunhold dev@frign.de commit fc73d06fed76dd7cde37d3704949d01391ea0032 parent a815be4b5de7f7df2da664049fdb04874d37016a Author: Laslo Hunhold <dev@frign.de> Date: Mon, 3 Oct 2022 23:13:26 +0200 Convert GRAPHEME_STATE to uint_least16_t and remove it I was never quite happy with the fact that the internal state-struct was visible in the grapheme.h-header, given the declaration of the fields only namely served internal purposes and were useless noise to the reader. To keep it in was merely a choice made because I had always hoped to be able to implement maybe a few more state-based pairwise segmentation check functions and use the GRAPHEME_STATE type in more places, but now after implementing the algorithms it becomes clear that they all do not satisfy these pairwise semantics. The first logical step was to convert the struct to an uint_least16_t, which provides enough space (at least 16 bits) to store all the complete state, and have internal deserialiation and serialization functions. The remaining question was if the typedef uint_least16_t GRAPHEME_STATE should be removed. I took inspiration from the Linux kernel coding style[0], which in section 5b lays out the exact case of typedeffing an integer that is meant to store flags (just like in our case). It is argued there that there needs to be a good reason to typedef an integer (e.g. given it might change by architecture or maybe change in later versions). Both cases are not given here (we will _never_ need more than 16 bits to store the grapheme cluster break state and you can even reduce more wastage, e.g. for storing the prop which never exceeds 4 bits given NUM_CHAR_BREAK_PROPS == 14 < 15 == 2^4-1), and I must admit that it improves readability a bit given you finally know what you're dealing with. The expression GRAPHEME_STATE state = 0; admittedly looks a little fishy, given you don't really know what happens behind the scenes unless you look in the header, and I want all of the semantics to be crystal clear to the end-user. [0]:https://www.kernel.org/doc/html/latest/process/coding-style.html#typedefs Signed-off-by: Laslo Hunhold <dev@frign.de> a815be4b5de7f7df2da664049fdb04874d37016a 2022-10-03T19:18:52Z 2022-10-03T19:18:52Z Add unit tests for all segmentation functions Laslo Hunhold dev@frign.de commit a815be4b5de7f7df2da664049fdb04874d37016a parent 5ea8d87a9a0fb9c6dda827cc55d43c637cd4086d Author: Laslo Hunhold <dev@frign.de> Date: Mon, 3 Oct 2022 21:18:52 +0200 Add unit tests for all segmentation functions Now all functions in the library are covered by exhaustive unit tests which supplement the already present conformance tests to make sure that the thin layer between API and implementation is also working as expected. At this point I would assess that libgrapheme is a stable foundation for using it in the real world and now preparation can go underway to prepare the release of version 2. Signed-off-by: Laslo Hunhold <dev@frign.de> 5ea8d87a9a0fb9c6dda827cc55d43c637cd4086d 2022-10-03T19:16:38Z 2022-10-03T19:16:38Z Set case-test-structs as const and use uppercase-hex-notation Laslo Hunhold dev@frign.de commit 5ea8d87a9a0fb9c6dda827cc55d43c637cd4086d parent 28815433e3595cba51a40c4a5e291da3a8746d78 Author: Laslo Hunhold <dev@frign.de> Date: Mon, 3 Oct 2022 21:16:38 +0200 Set case-test-structs as const and use uppercase-hex-notation Signed-off-by: Laslo Hunhold <dev@frign.de> 28815433e3595cba51a40c4a5e291da3a8746d78 2022-10-03T19:14:52Z 2022-10-03T19:14:52Z Unify code paths in herodotus_read_codepoint() Laslo Hunhold dev@frign.de commit 28815433e3595cba51a40c4a5e291da3a8746d78 parent f70ea8c12ab5b7ad6f90f8860544779a43ce8a9e Author: Laslo Hunhold <dev@frign.de> Date: Mon, 3 Oct 2022 21:14:52 +0200 Unify code paths in herodotus_read_codepoint() This saves redundancy. Signed-off-by: Laslo Hunhold <dev@frign.de> f70ea8c12ab5b7ad6f90f8860544779a43ce8a9e 2022-10-02T20:30:08Z 2022-10-02T20:30:58Z Drop get_codepoint*() and set_codepoint*() functions Laslo Hunhold dev@frign.de commit f70ea8c12ab5b7ad6f90f8860544779a43ce8a9e parent 995e37182dc53da55dc4cf34868513610215c79e Author: Laslo Hunhold <dev@frign.de> Date: Sun, 2 Oct 2022 22:30:08 +0200 Drop get_codepoint*() and set_codepoint*() functions These are, now that all code has been refactored with Herodotus and Proper, no longer used and can be dropped. Signed-off-by: Laslo Hunhold <dev@frign.de> 995e37182dc53da55dc4cf34868513610215c79e 2022-10-02T20:22:54Z 2022-10-02T20:22:54Z Fix a few small errors in the manpages Laslo Hunhold dev@frign.de commit 995e37182dc53da55dc4cf34868513610215c79e parent a5b1b0c0c7bc1576b5893175b27585fa963f4433 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 2 Oct 2022 22:22:54 +0200 Fix a few small errors in the manpages Thanks to NRK for spotting most of these problems! In the other cases, mandoc with its Tlint-flag proved to be very useful. Signed-off-by: Laslo Hunhold <dev@frign.de> a5b1b0c0c7bc1576b5893175b27585fa963f4433 2022-10-02T20:05:11Z 2022-10-02T20:05:11Z Refactor sentence-functions with Proper (using Herodotus in the background) Laslo Hunhold dev@frign.de commit a5b1b0c0c7bc1576b5893175b27585fa963f4433 parent 52b0e29e02068d6a8123042ef901f73e37b2f38f Author: Laslo Hunhold <dev@frign.de> Date: Sun, 2 Oct 2022 22:05:11 +0200 Refactor sentence-functions with Proper (using Herodotus in the background) This refactor was a breeze and it passed all conformance tests on the first try. This, just like with the word-functions, leads to a massive simplification and separation of concerns in the code. And as with the word functions, this fixes some known quirks. Signed-off-by: Laslo Hunhold <dev@frign.de> 52b0e29e02068d6a8123042ef901f73e37b2f38f 2022-10-02T19:17:03Z 2022-10-02T19:17:03Z Refactor word-functions with Proper (using Herodotus in the background) Laslo Hunhold dev@frign.de commit 52b0e29e02068d6a8123042ef901f73e37b2f38f parent b899fd685c50cbc61999296ce1e0a03a45e74f52 Author: Laslo Hunhold <dev@frign.de> Date: Sun, 2 Oct 2022 21:17:03 +0200 Refactor word-functions with Proper (using Herodotus in the background) As promised, this leads to a heavy simplification and separation of concerns in the code. Additionally, this fixes some known quirks in regard to handling NUL-terminated strings. Signed-off-by: Laslo Hunhold <dev@frign.de> b899fd685c50cbc61999296ce1e0a03a45e74f52 2022-10-02T19:09:08Z 2022-10-02T19:09:08Z Add "proper"-property-reader Laslo Hunhold dev@frign.de commit b899fd685c50cbc61999296ce1e0a03a45e74f52 parent a4d42053f13e8471ee3903522f964fc0a1d3161a Author: Laslo Hunhold <dev@frign.de> Date: Sun, 2 Oct 2022 21:09:08 +0200 Add "proper"-property-reader The word- and sentence-segmentation algorithms make use of a complicated logic to accomodate "raw" and "skip" properties. The code is barely readable and doesn't separate abstractions away nicely. Moreover, there is a high probability that certain edge-cases are not handled properly. To fix this, this commit adds a "proper"-property-reader, which basically does the whole dirty details in the background using well-commented and transparent code that builds on top of the herodotus-reader instead of doing this by hand. This ensures that we will (provably) never have buffer overflows unless there is a mistake in the implementation itself, which can be verified relatively easily given each function has a limited scope. Signed-off-by: Laslo Hunhold <dev@frign.de> a4d42053f13e8471ee3903522f964fc0a1d3161a 2022-09-24T10:26:19Z 2022-09-24T10:26:19Z Refactor line-functions with Herodotus Laslo Hunhold dev@frign.de commit a4d42053f13e8471ee3903522f964fc0a1d3161a parent 65785f699be45dd77bdcbfc1d3aded39151f3205 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 24 Sep 2022 12:26:19 +0200 Refactor line-functions with Herodotus Signed-off-by: Laslo Hunhold <dev@frign.de> 65785f699be45dd77bdcbfc1d3aded39151f3205 2022-09-24T09:45:20Z 2022-09-24T09:45:20Z Refactor character-functions with Herodotus Laslo Hunhold dev@frign.de commit 65785f699be45dd77bdcbfc1d3aded39151f3205 parent b13acfd6cd5114fcddbffaf9855664a95f966403 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 24 Sep 2022 11:45:20 +0200 Refactor character-functions with Herodotus This also unifies the code and drops a lot of complicated state handling. Signed-off-by: Laslo Hunhold <dev@frign.de> b13acfd6cd5114fcddbffaf9855664a95f966403 2022-09-24T08:38:29Z 2022-09-24T08:38:29Z Update README Laslo Hunhold dev@frign.de commit b13acfd6cd5114fcddbffaf9855664a95f966403 parent bc1dc28c09ce845291c51041b45594fef78e4eb4 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 24 Sep 2022 10:38:29 +0200 Update README Signed-off-by: Laslo Hunhold <dev@frign.de> bc1dc28c09ce845291c51041b45594fef78e4eb4 2022-09-24T08:37:21Z 2022-09-24T08:37:21Z Clarify a comment in gen/case.c Laslo Hunhold dev@frign.de commit bc1dc28c09ce845291c51041b45594fef78e4eb4 parent 5dec22a7143e1105f25c7a7626fa166d882367d0 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 24 Sep 2022 10:37:21 +0200 Clarify a comment in gen/case.c Signed-off-by: Laslo Hunhold <dev@frign.de> 5dec22a7143e1105f25c7a7626fa166d882367d0 2022-09-24T08:36:15Z 2022-09-24T08:36:15Z Refactor case-checking-functions with Herodotus and add unit tests Laslo Hunhold dev@frign.de commit 5dec22a7143e1105f25c7a7626fa166d882367d0 parent 8a7e2ee85f0a2824e48e85e57534c5b18113cf07 Author: Laslo Hunhold <dev@frign.de> Date: Sat, 24 Sep 2022 10:36:15 +0200 Refactor case-checking-functions with Herodotus and add unit tests Additionally, expand the unit tests with special-casing-cases. Signed-off-by: Laslo Hunhold <dev@frign.de> 8a7e2ee85f0a2824e48e85e57534c5b18113cf07 2022-09-23T23:54:52Z 2022-09-23T23:54:52Z Compile the library in freestanding mode Laslo Hunhold dev@frign.de commit 8a7e2ee85f0a2824e48e85e57534c5b18113cf07 parent 9f15d7eb0c9cf216f069d6972c58520013b80acb Author: Laslo Hunhold <dev@frign.de> Date: Sat, 24 Sep 2022 01:54:52 +0200 Compile the library in freestanding mode Looking closely, we never explicitly depend on the standard library within the actual library code. This can be explicitly expressed by setting -ffreestanding during object-compilation and -nostdlib during linking. The result is a clean library with zero libc-symbols, allowing it to be used even without an operating system (kernel code, ELF, etc.), by making use of the freestanding implementation form defined in the standard[0]. To be freestanding, the code may only include <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h> and <stdnoreturn.h>. We satisfy this condition implictly, but there are some erroneous supplementary includes that are removed in this commit. Additionally, the strict compiler-implementation simply adds the U-prefix to the argument of UINT16_C (et. al.), which is why calls to it have to be changed to really include only constants. [0]:https://www.iso-9899.info/n1570.html#4.p6 Signed-off-by: Laslo Hunhold <dev@frign.de> 9f15d7eb0c9cf216f069d6972c58520013b80acb 2022-09-21T22:16:56Z 2022-09-21T22:16:56Z Declare test-arrays as static Laslo Hunhold dev@frign.de commit 9f15d7eb0c9cf216f069d6972c58520013b80acb parent 9c926f112553481fae101b692f8add2998aeeaaf Author: Laslo Hunhold <dev@frign.de> Date: Thu, 22 Sep 2022 00:16:56 +0200 Declare test-arrays as static Signed-off-by: Laslo Hunhold <dev@frign.de> 9c926f112553481fae101b692f8add2998aeeaaf 2022-09-21T18:25:41Z 2022-09-21T18:25:41Z Remove autistic screeching Laslo Hunhold dev@frign.de commit 9c926f112553481fae101b692f8add2998aeeaaf parent e63bcc42010176b300feea6a7412f814a6cc4191 Author: Laslo Hunhold <dev@frign.de> Date: Wed, 21 Sep 2022 20:25:41 +0200 Remove autistic screeching Signed-off-by: Laslo Hunhold <dev@frign.de> e63bcc42010176b300feea6a7412f814a6cc4191 2022-09-21T18:18:12Z 2022-09-21T18:18:12Z Add case-conversion-unit-tests Laslo Hunhold dev@frign.de commit e63bcc42010176b300feea6a7412f814a6cc4191 parent 5332f7ee034081618617c2b0785733ccc9ec8753 Author: Laslo Hunhold <dev@frign.de> Date: Wed, 21 Sep 2022 20:18:12 +0200 Add case-conversion-unit-tests To give even more assurance and catch any possible future regressions, exhaustive unit tests are added for the case-conversion functions. Signed-off-by: Laslo Hunhold <dev@frign.de> 5332f7ee034081618617c2b0785733ccc9ec8753 2022-09-21T18:16:00Z 2022-09-21T18:16:00Z Refactor case-conversion-functions with Herodotus Laslo Hunhold dev@frign.de commit 5332f7ee034081618617c2b0785733ccc9ec8753 parent 563eb65bfbaa4f27c77d73ae81b51882c916993d Author: Laslo Hunhold <dev@frign.de> Date: Wed, 21 Sep 2022 20:16:00 +0200 Refactor case-conversion-functions with Herodotus The readability of the code is greatly improved, and the code is now much more robust than before. Signed-off-by: Laslo Hunhold <dev@frign.de> 563eb65bfbaa4f27c77d73ae81b51882c916993d 2022-09-21T18:11:55Z 2022-09-21T18:11:55Z Add helper structure for reading from and writing into buffers Laslo Hunhold dev@frign.de commit 563eb65bfbaa4f27c77d73ae81b51882c916993d parent 6d0595242a027c1fcb06136e632f6d727388c4ec Author: Laslo Hunhold <dev@frign.de> Date: Wed, 21 Sep 2022 20:11:55 +0200 Add helper structure for reading from and writing into buffers The logic behind the input and output buffers is quite intricate and leads to numerous subtle bugs that are best handled with a refactoring using an abstraction layer that hides most of the gory details. The Herodotus reader/writer elegantly does all the magic in the background, allowing us to focus on the algorithms in the front instead. This especially helps with handling NUL-terminated strings, as we are guaranteed not to accidentally read too far. Signed-off-by: Laslo Hunhold <dev@frign.de> 6d0595242a027c1fcb06136e632f6d727388c4ec 2022-09-16T23:13:59Z 2022-09-16T23:13:59Z Sort prototypes in grapheme.h alphabetically Laslo Hunhold dev@frign.de commit 6d0595242a027c1fcb06136e632f6d727388c4ec parent fad432f65f9011175f4fe24d4045ba0d42bdc55e Author: Laslo Hunhold <dev@frign.de> Date: Sat, 17 Sep 2022 01:13:59 +0200 Sort prototypes in grapheme.h alphabetically Signed-off-by: Laslo Hunhold <dev@frign.de>