URI: 
       BidiCharacterTest.txt - libgrapheme - unicode string library
  HTML git clone git://git.suckless.org/libgrapheme
   DIR Log
   DIR Files
   DIR Refs
   DIR README
   DIR LICENSE
       ---
       BidiCharacterTest.txt (6880771B)
       ---
            1 # BidiCharacterTest-17.0.0.txt
            2 # Date: 2025-07-30
            3 # © 2025 Unicode®, Inc.
            4 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
            5 # For terms of use and license, see https://www.unicode.org/terms_of_use.html
            6 #
            7 # Unicode Character Database
            8 # For documentation, see https://www.unicode.org/reports/tr44/
            9 #
           10 # This file provides a conformance test for implementations of the
           11 # Unicode Bidirectional Algorithm, specified in UAX #9: Unicode
           12 # Bidirectional Algorithm, at https://www.unicode.org/reports/tr9/
           13 #
           14 # The test data has been generated with a few constraints. Each test case
           15 # is a single paragraph, so the test data does not contain any characters
           16 # with Bidi_Class property value Paragraph_Separator and rule P1 of the
           17 # algorithm is out of scope. Each test case further constitutes a single
           18 # line of text; reordering is applied within a single line and independently
           19 # of a rendering engine, and rules L3 and L4 are also out of scope.
           20 # Therefore, the test data can be used for verifying conformance to the
           21 # Unicode Bidirectional Algorithm implemented through rule L2 inclusively.
           22 #
           23 # The file contains test sequences of explicit character code points.
           24 # Each line consists of five fields separated by a semicolon.
           25 #
           26 # Field 0: A sequence of hexadecimal code point values separated by space
           27 # Field 1: A value representing the paragraph direction, as follows:
           28 #   0 represents left-to-right
           29 #   1 represents right-to-left
           30 #   2 represents auto-LTR according to rules P2 and P3 of the algorithm
           31 # Field 2: The resolved paragraph embedding level
           32 # Field 3: A list of resolved levels; characters removed in rule X9 are
           33 #   indicated with an 'x'
           34 # Field 4: A list of indices showing the resulting visual ordering from
           35 #   left to right; characters with a resolved level of 'x' are skipped
           36 #
           37 # Comment lines start with '#'.
           38 
           39 ################################################################################
           40 # Examples from UAX #9
           41 
           42 # Examples from the "Resolving Neutral and Isolate Formatting Types" section of UAX #9
           43 # (https://www.unicode.org/reports/tr9/#Resolving_Neutral_Types)
           44 05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13
           45 05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;1;1;1 1 1 1 1 1 1 2 2 1 1 1 2 2;12 13 11 10 9 7 8 6 5 4 3 2 1 0
           46 0061 0062 0063 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 05D0 05D1 05D2;0;0;0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1;0 1 2 3 4 5 6 7 8 11 10 9 12 13 16 15 14
           47 0061 0062 0063 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 05D0 05D1 05D2;1;1;2 2 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1;16 15 14 13 12 11 10 9 8 5 6 7 4 3 0 1 2
           48 05D0 05D1 05D2 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 0061 0062 0063;0;0;1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0;2 1 0 3 4 5 6 7 8 11 10 9 12 13 14 15 16
           49 05D0 05D1 05D2 0020 0028 0064 0065 0066 0020 0627 0628 062C 0029 0020 0061 0062 0063;1;1;1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2;14 15 16 13 12 11 10 9 8 5 6 7 4 3 2 1 0
           50 0061 0062 0063 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 05D0 05D1 05D2;0;0;0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1;0 1 2 3 4 7 6 5 8 9 10 11 12 13 16 15 14
           51 0061 0062 0063 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 05D0 05D1 05D2;1;1;2 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1;16 15 14 13 12 9 10 11 8 7 6 5 4 3 0 1 2
           52 05D0 05D1 05D2 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 0061 0062 0063;0;0;1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0;2 1 0 3 4 7 6 5 8 9 10 11 12 13 14 15 16
           53 05D0 05D1 05D2 0020 0028 0627 0628 062C 0020 0064 0065 0066 0029 0020 0061 0062 0063;1;1;1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 2 2;14 15 16 13 12 9 10 11 8 7 6 5 4 3 2 1 0
           54 0627 0628 062C 0020 0062 006F 006F 006B 0028 0073 0029;0;0;1 1 1 0 0 0 0 0 0 0 0;2 1 0 3 4 5 6 7 8 9 10
           55 0627 0628 062C 0020 0062 006F 006F 006B 0028 0073 0029;1;1;1 1 1 1 2 2 2 2 2 2 2;4 5 6 7 8 9 10 3 2 1 0
           56 
           57 ################################################################################
           58 # Test cases for the algorithm changes and clarifications made in Unicode 8.0
           59 
           60 # Explicit directional overrides applied to isolates tightly flanked by embeddings
           61 202E 0061 202A 0062 202C 2066 0063 2069 202A 0064 202C 0065 202C;2;0;x 1 x 2 x 1 2 1 x 2 x 1 x;11 9 7 6 5 3 1
           62 202E 0061 202A 0062 202C 2066 0063 2069 202A 0064 202C 0065 202C;1;1;x 3 x 4 x 3 4 3 x 4 x 3 x;11 9 7 6 5 3 1
           63 202D 05D0 202B 05D1 202C 2068 05D2 2069 202B 05D3 202C 05D4 202C;2;1;x 2 x 3 x 2 3 2 x 3 x 2 x;1 3 5 6 7 9 11
           64 202D 0661 202B 0662 202C 2068 0663 2069 202B 0664 202C 0665 202C;0;0;x 2 x 4 x 2 6 2 x 4 x 2 x;1 3 5 6 7 9 11
           65 
           66 # Explicit directional overrides applied to paired brackets
           67 202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6
           68 202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6
           69 202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3
           70 202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3
           71 202A 202E 0061 202C 0028 05D0 202C 202D 0029 202C;2;0;x x 3 x 3 3 x x 2 x;5 4 2 8
           72 202B 202D 05D0 202C 0028 0061 202C 202E 0029 202C;2;1;x x 4 x 4 4 x x 3 x;8 2 4 5
           73 202A 202E 0061 202C 0028 005B 05D0 202C 202D 005D 0029 202C;2;0;x x 3 x 3 3 3 x x 2 2 x;6 5 4 2 9 10
           74 202B 202D 05D0 202C 0028 005B 0061 202C 202E 005D 0029 202C;2;1;x x 4 x 4 4 4 x x 3 3 x;10 9 2 4 5 6
           75 202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4
           76 202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4
           77 202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1
           78 202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1
           79 202D 202E 0061 202C 0028 202C 202A 05D0 0029 05D1;2;0;x x 3 x 2 x x 3 3 3;2 4 9 8 7
           80 202E 202D 05D0 202C 0028 202C 202B 0061 0029 0062;2;1;x x 4 x 3 x x 4 4 4;7 8 9 4 2
           81 202D 202E 0061 202C 0028 005B 202C 202A 05D0 005D 0029 05D1;2;0;x x 3 x 2 2 x x 3 3 3 3;2 4 5 11 10 9 8
           82 202E 202D 05D0 202C 0028 005B 202C 202B 0061 005D 0029 0062;2;1;x x 4 x 3 3 x x 4 4 4 4;8 9 10 11 5 4 2
           83 
           84 # Nonspacing marks applied to paired brackets
           85 0061 0028 0062 0029 0331;1;1;2 2 2 2 2;0 1 2 3 4
           86 0061 0028 0332 0062 0029 0333;1;1;2 2 2 2 2 2;0 1 2 3 4 5
           87 05D0 0028 05D1 0029 0331;0;0;1 1 1 1 1;4 3 2 1 0
           88 05D0 0028 0332 05D1 0029 0333;0;0;1 1 1 1 1 1;5 4 3 2 1 0
           89 0661 0028 0662 0029 0331;0;0;2 1 2 1 1;4 3 2 1 0
           90 0661 0028 0332 0662 0029 0333;0;0;2 1 1 2 1 1;5 4 3 2 1 0
           91 
           92 # Nonspacing marks applied to paired brackets [added to test cases for Unicode 14.0]
           93 # These cases exercise the ignoring of bc=BN characters (such as ZWJ or ZWSP)
           94 # that appear between the base bracket character and the nonspacing mark,
           95 # in a context where the brackets have been forced to a strong R direction.
           96 #
           97 # Note that due to an implementation error in the N0 rule in the Bidi Reference C
           98 # test code for UBA 8.0, versions of that reference test code through UBA 12.0 will fail for
           99 # precisely these newly added tests. The bug in the implementation of the N0 rule in the Bidi Reference C 
          100 # test code was fixed for Unicode 13.0, and that updated test code now performs correctly
          101 # for all versions of UBA.
          102 #
          103 # These test cases first test a combining mark following a ZWJ after the trailing bracket of a pair:
          104 0041 200F 005B 05D0 005D 200D 20D6;0;0;0 1 1 1 1 x 1;0 6 4 3 2 1
          105 0041 200F 005B 05D0 005D 200D 20D6;1;1;2 1 1 1 1 x 1;6 4 3 2 1 0
          106 # Then a combining mark following a ZWJ after the leading bracket of a pair:
          107 0041 200F 005B 200D 20D6 05D0 005D;0;0;0 1 1 x 1 1 1;0 6 5 4 2 1
          108 0041 200F 005B 200D 20D6 05D0 005D;1;1;2 1 1 x 1 1 1;6 5 4 2 1 0
          109 # Then a combining mark following a ZWJ after both brackets of a pair:
          110 0041 200F 005B 200D 20D6 05D0 005D 200D 20D6;0;0;0 1 1 x 1 1 1 x 1;0 8 6 5 4 2 1
          111 0041 200F 005B 200D 20D6 05D0 005D 200D 20D6;1;1;2 1 1 x 1 1 1 x 1;8 6 5 4 2 1 0
          112 # Then the intervention of a ZWSP in these same sequences.
          113 # (The ZWSP formally breaks the combining character sequence, but should
          114 # not block the identification of the combining mark for the application of rule N0.)
          115 0041 200F 005B 200D 200B 20D6 05D0 005D 200B 200D 20D6;0;0;0 1 1 x x 1 1 1 x x 1;0 10 7 6 5 2 1
          116 0041 200F 005B 200D 200B 20D6 05D0 005D 200B 200D 20D6;1;1;2 1 1 x x 1 1 1 x x 1;10 7 6 5 2 1 0
          117 
          118 # Nested bracket pairs that reach and exceed the fixed capacity of the bracket stack
          119 # a ( ( ... ( b ) ) ... ) with 62, 63, and 64 nested bracket pairs
suckless.org:70 /git/libgrapheme/file/data/BidiCharacterTest.txt.gph:130: line too long