BidiCharacterTest.txt - libgrapheme - unicode string library
HTML git clone git://git.suckless.org/libgrapheme
DIR Log
DIR Files
DIR Refs
DIR README
DIR LICENSE
---
BidiCharacterTest.txt (6880771B)
---
1 # BidiCharacterTest-17.0.0.txt
2 # Date: 2025-07-30
3 # © 2025 Unicode®, Inc.
4 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5 # For terms of use and license, see https://www.unicode.org/terms_of_use.html
6 #
7 # Unicode Character Database
8 # For documentation, see https://www.unicode.org/reports/tr44/
9 #
10 # This file provides a conformance test for implementations of the
11 # Unicode Bidirectional Algorithm, specified in UAX #9: Unicode
12 # Bidirectional Algorithm, at https://www.unicode.org/reports/tr9/
13 #
14 # The test data has been generated with a few constraints. Each test case
15 # is a single paragraph, so the test data does not contain any characters
16 # with Bidi_Class property value Paragraph_Separator and rule P1 of the
17 # algorithm is out of scope. Each test case further constitutes a single
18 # line of text; reordering is applied within a single line and independently
19 # of a rendering engine, and rules L3 and L4 are also out of scope.
20 # Therefore, the test data can be used for verifying conformance to the
21 # Unicode Bidirectional Algorithm implemented through rule L2 inclusively.
22 #
23 # The file contains test sequences of explicit character code points.
24 # Each line consists of five fields separated by a semicolon.
25 #
26 # Field 0: A sequence of hexadecimal code point values separated by space
27 # Field 1: A value representing the paragraph direction, as follows:
28 # 0 represents left-to-right
29 # 1 represents right-to-left
30 # 2 represents auto-LTR according to rules P2 and P3 of the algorithm
31 # Field 2: The resolved paragraph embedding level
32 # Field 3: A list of resolved levels; characters removed in rule X9 are
33 # indicated with an 'x'
34 # Field 4: A list of indices showing the resulting visual ordering from
35 # left to right; characters with a resolved level of 'x' are skipped
36 #
37 # Comment lines start with '#'.
38
39 ################################################################################
40 # Examples from UAX #9
41
42 # Examples from the "Resolving Neutral and Isolate Formatting Types" section of UAX #9
43 # (https://www.unicode.org/reports/tr9/#Resolving_Neutral_Types)
44 05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13
45 05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;1;1;1 1 1 1 1 1 1 2 2 1 1 1 2 2;12 13 11 10 9 7 8 6 5 4 3 2 1 0
46 0061 0062 0063 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 05D0 05D1 05D2;0;0;0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1;0 1 2 3 4 5 6 7 8 11 10 9 12 13 16 15 14
47 0061 0062 0063 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 05D0 05D1 05D2;1;1;2 2 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1;16 15 14 13 12 11 10 9 8 5 6 7 4 3 0 1 2
48 05D0 05D1 05D2 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 0061 0062 0063;0;0;1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0;2 1 0 3 4 5 6 7 8 11 10 9 12 13 14 15 16
49 05D0 05D1 05D2 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 0061 0062 0063;1;1;1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2;14 15 16 13 12 11 10 9 8 5 6 7 4 3 2 1 0
50 0061 0062 0063 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 05D0 05D1 05D2;0;0;0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1;0 1 2 3 4 7 6 5 8 9 10 11 12 13 16 15 14
51 0061 0062 0063 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 05D0 05D1 05D2;1;1;2 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1;16 15 14 13 12 9 10 11 8 7 6 5 4 3 0 1 2
52 05D0 05D1 05D2 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 0061 0062 0063;0;0;1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0;2 1 0 3 4 7 6 5 8 9 10 11 12 13 14 15 16
53 05D0 05D1 05D2 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 0061 0062 0063;1;1;1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 2 2;14 15 16 13 12 9 10 11 8 7 6 5 4 3 2 1 0
54 0627 0628 062C 0020 0062 006F 006F 006B 0028 0073 0029;0;0;1 1 1 0 0 0 0 0 0 0 0;2 1 0 3 4 5 6 7 8 9 10
55 0627 0628 062C 0020 0062 006F 006F 006B 0028 0073 0029;1;1;1 1 1 1 2 2 2 2 2 2 2;4 5 6 7 8 9 10 3 2 1 0
56
57 ################################################################################
58 # Test cases for the algorithm changes and clarifications made in Unicode 8.0
59
60 # Explicit directional overrides applied to isolates tightly flanked by embeddings
61 202E 0061 202A 0062 202C 2066 0063 2069 202A 0064 202C 0065 202C;2;0;x 1 x 2 x 1 2 1 x 2 x 1 x;11 9 7 6 5 3 1
62 202E 0061 202A 0062 202C 2066 0063 2069 202A 0064 202C 0065 202C;1;1;x 3 x 4 x 3 4 3 x 4 x 3 x;11 9 7 6 5 3 1
63 202D 05D0 202B 05D1 202C 2068 05D2 2069 202B 05D3 202C 05D4 202C;2;1;x 2 x 3 x 2 3 2 x 3 x 2 x;1 3 5 6 7 9 11
64 202D 0661 202B 0662 202C 2068 0663 2069 202B 0664 202C 0665 202C;0;0;x 2 x 4 x 2 6 2 x 4 x 2 x;1 3 5 6 7 9 11
65
66 # Explicit directional overrides applied to paired brackets
67 202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6
68 202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6
69 202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3
70 202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3
71 202A 202E 0061 202C 0028 05D0 202C 202D 0029 202C;2;0;x x 3 x 3 3 x x 2 x;5 4 2 8
72 202B 202D 05D0 202C 0028 0061 202C 202E 0029 202C;2;1;x x 4 x 4 4 x x 3 x;8 2 4 5
73 202A 202E 0061 202C 0028 005B 05D0 202C 202D 005D 0029 202C;2;0;x x 3 x 3 3 3 x x 2 2 x;6 5 4 2 9 10
74 202B 202D 05D0 202C 0028 005B 0061 202C 202E 005D 0029 202C;2;1;x x 4 x 4 4 4 x x 3 3 x;10 9 2 4 5 6
75 202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4
76 202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4
77 202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1
78 202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1
79 202D 202E 0061 202C 0028 202C 202A 05D0 0029 05D1;2;0;x x 3 x 2 x x 3 3 3;2 4 9 8 7
80 202E 202D 05D0 202C 0028 202C 202B 0061 0029 0062;2;1;x x 4 x 3 x x 4 4 4;7 8 9 4 2
81 202D 202E 0061 202C 0028 005B 202C 202A 05D0 005D 0029 05D1;2;0;x x 3 x 2 2 x x 3 3 3 3;2 4 5 11 10 9 8
82 202E 202D 05D0 202C 0028 005B 202C 202B 0061 005D 0029 0062;2;1;x x 4 x 3 3 x x 4 4 4 4;8 9 10 11 5 4 2
83
84 # Nonspacing marks applied to paired brackets
85 0061 0028 0062 0029 0331;1;1;2 2 2 2 2;0 1 2 3 4
86 0061 0028 0332 0062 0029 0333;1;1;2 2 2 2 2 2;0 1 2 3 4 5
87 05D0 0028 05D1 0029 0331;0;0;1 1 1 1 1;4 3 2 1 0
88 05D0 0028 0332 05D1 0029 0333;0;0;1 1 1 1 1 1;5 4 3 2 1 0
89 0661 0028 0662 0029 0331;0;0;2 1 2 1 1;4 3 2 1 0
90 0661 0028 0332 0662 0029 0333;0;0;2 1 1 2 1 1;5 4 3 2 1 0
91
92 # Nonspacing marks applied to paired brackets [added to test cases for Unicode 14.0]
93 # These cases exercise the ignoring of bc=BN characters (such as ZWJ or ZWSP)
94 # that appear between the base bracket character and the nonspacing mark,
95 # in a context where the brackets have been forced to a strong R direction.
96 #
97 # Note that due to an implementation error in the N0 rule in the Bidi Reference C
98 # test code for UBA 8.0, versions of that reference test code through UBA 12.0 will fail for
99 # precisely these newly added tests. The bug in the implementation of the N0 rule in the Bidi Reference C
100 # test code was fixed for Unicode 13.0, and that updated test code now performs correctly
101 # for all versions of UBA.
102 #
103 # These test cases first test a combining mark following a ZWJ after the trailing bracket of a pair:
104 0041 200F 005B 05D0 005D 200D 20D6;0;0;0 1 1 1 1 x 1;0 6 4 3 2 1
105 0041 200F 005B 05D0 005D 200D 20D6;1;1;2 1 1 1 1 x 1;6 4 3 2 1 0
106 # Then a combining mark following a ZWJ after the leading bracket of a pair:
107 0041 200F 005B 200D 20D6 05D0 005D;0;0;0 1 1 x 1 1 1;0 6 5 4 2 1
108 0041 200F 005B 200D 20D6 05D0 005D;1;1;2 1 1 x 1 1 1;6 5 4 2 1 0
109 # Then a combining mark following a ZWJ after both brackets of a pair:
110 0041 200F 005B 200D 20D6 05D0 005D 200D 20D6;0;0;0 1 1 x 1 1 1 x 1;0 8 6 5 4 2 1
111 0041 200F 005B 200D 20D6 05D0 005D 200D 20D6;1;1;2 1 1 x 1 1 1 x 1;8 6 5 4 2 1 0
112 # Then the intervention of a ZWSP in these same sequences.
113 # (The ZWSP formally breaks the combining character sequence, but should
114 # not block the identification of the combining mark for the application of rule N0.)
115 0041 200F 005B 200D 200B 20D6 05D0 005D 200B 200D 20D6;0;0;0 1 1 x x 1 1 1 x x 1;0 10 7 6 5 2 1
116 0041 200F 005B 200D 200B 20D6 05D0 005D 200B 200D 20D6;1;1;2 1 1 x x 1 1 1 x x 1;10 7 6 5 2 1 0
117
118 # Nested bracket pairs that reach and exceed the fixed capacity of the bracket stack
119 # a ( ( ... ( b ) ) ... ) with 62, 63, and 64 nested bracket pairs
suckless.org:70 /git/libgrapheme/file/data/BidiCharacterTest.txt.gph:130: line too long