lumidify.org/1/git/transliterate_data/file/Notes.gph

  URI:

       Notes - transliterate_data - Data for Urdu<->Hindi transliteration
  HTML git clone git://lumidify.org/transliterate_data.git (fast, but not encrypted)
  HTML git clone https://lumidify.org/transliterate_data.git (encrypted, but very slow)
  HTML git clone git://4kcetb7mo7hj6grozzybxtotsub5bempzo4lirzc3437amof2c2impyd.onion/transliterate_data.git (over tor)
   DIR Log
   DIR Files
   DIR Refs
   DIR README
       ---
       Notes (4031B)
       ---
            1 NOTE REGARDING THE TABLES
            2 
            3 The tables of words have been divided into nouns_adjectives and verbs. The tables are divided according to the way in which the stems are inflected.  The two 'irregular.txt' files are for any word that is not to be expanded/inflected. 
            4 
            5 Note: When adding new words to the tables, it is important to understand WHAT to add. In the case of the irregular.txt tables, the whole word is added. With the rest, only a stem is added. The inflections are then added by the program.
            6 
            7 An example from each table is given below. On the left is the stem, on the right one of the inflections/expansions. 
            8 
            9 VERBS
           10 
           11 irregular        سیوں گا        सियूँगा        > [no expansion]
           12 regular_consonant_ending        ابال        उबाल        >        ابالنا        उबालना
           13 regular_ending_in_a_o        آزما        आज़मा        >        آزمانا        आज़माना
           14 
           15  NOUNS/ADJECTIVES
           16 
           17 adjectiveregular_a_i        آدھ        आध > آدھا        आधा        
           18 irregular        آئین        आईन        > [no expansion]
           19 ahmasc        آلود        आलूद        > آلودہ        आलूदा
           20 aishortmasc        افع        अफ़        > افعی        अफ़इ
           21 amasc        آٹ        आट        > آٹا        आटा
           22 an        آٹھو        आठव        > آٹھواں        आठवाँ
           23 cfem        آتش        आतिश        > آتشیں        आतिशें
           24 cmasc        آبشار        आबशार        > آبشاروں        आबशारों
           25 ifem        آباد        आबाद        > آبادی        आबादी
           26 ifemshort        مورت        मूर्त        > مورتی        मूर्ति
           27 imasc        آدم        आदम        > آدمی        आदमी
           28 o_a_staysfem        ابتدا        इब्तिदा        > ابتداؤں        इब्तिदाओं
           29 u_staysfem        آرز        आरज़        > آرزو        आरज़ू
           30 o_a_staysmasc دانا        दाना        > داناؤں        दानाओं
           31 u_staysmasc        آنس        आँस        > آنسو        आँसू
           32 ui_oi_ai_mascfem        ابتدا        इब्तिदा        > ابتدائی        इब्तिदाई
           33 
           34 TABLES IN DATA FOLDER
           35 
           36 There are a number of further tables in order to cope with punctuation, exceptions and special cases in the data folder:
           37 
           38 ignore: adds words that are ignored permanently,
           39 punctuation: for conversion of punctuation.
           40 misc_beginword.ur_hi: word parts ("prefixes") at the beginning of word compounds
           41 misc_endword: word parts ("suffixes") at the end of word compounds
           42 special: special cases (no beginword endword)
           43 exceptions_beginword_endword.ur_hi:  override multiple choices for common words found in the preceding tables.
           44 exceptions_beginword.hi_ur: exceptions which need to replaced before the following match statements.
           45 exceptions_beginword_endword.hi_ur:  override multiple choices for common words found in the preceding tables.
           46 pairs_middle_e_o: The Persian Genetive े-  (eg मुल्के-मिसर)  conflicts with word pairs containing this such as नवासे-नवासियाँ. These word pairs are regular inflections and do not contain a Persian Genetive, so in Urdu script the first word of the pair ends in ے + space and not ِ  + space. Word pairs conflicting with the Persian Genetive have been put into the new file 'pairs.middle_e_o'. Word pairs with و at the end of the first word have also been placed here, eg دو ایک        दो-एक, as these conflict with the rule regarding the copula و linking words in Urdu.
           47 
           48 CAREFUL: If you add the wrong words to these tables, you can mess up the conversion process!
           49 
           50 THE CONFIG FILES
           51 There are two config files.
           52 
           53 config.hi_ur: the config to use when converting Hindi to Urdu.
           54 config.ur_hi: the config to use when converting Urdu to Hindi.
           55 
           56 NOTE: The tables in the data folder relating only to one of these two configs are labelled accordingly, ie xxxxx.hi_ur.txt or xxxxx.ur_hi.txt
           57 
           58 Tables which are not labelled in either way relate to both config files.
           59 
           60 !!!THINGS TO KEEP IN MIND!!!!
           61 
           62 * -से needs to be done manually, as this is in most cases the postposition से and not the 'adjective' से.   के-से can be done through search/replace. It is better to find the rest of the cases by reading through the text.
           63 
           64 * Also make sure you have gtk2-perl installed!
           65 
           66