GBREL.TXT Genetic Sequence Data Bank August 15 2005 NCBI-GenBank Flat File Release 149.0 Distribution Release Notes 46947388 loci, 51674486881 bases, from 46947388 reported sequences This document describes the format and content of the flat files that comprise releases of the GenBank nucleotide sequence database. If you have any questions or comments about GenBank or this document, please contact NCBI via email at info@ncbi.nlm.nih.gov or: GenBank National Center for Biotechnology Information National Library of Medicine, 38A, 8N805 8600 Rockville Pike Bethesda, MD 20894 USA Phone: (301) 496-2475 Fax: (301) 480-9241 ========================================================================== TABLE OF CONTENTS ========================================================================== 1. INTRODUCTION 1.1 Release 149.0 1.2 Cutoff Date 1.3 Important Changes in Release 149.0 1.4 Upcoming Changes 1.5 Request for Direct Submission of Sequence Data 1.6 Organization of This Document 2. ORGANIZATION OF DATA FILES 2.1 Overview 2.2 Files 2.2.1 File Descriptions 2.2.5 File Sizes 2.2.6 Per-Division Statistics 2.2.7 Selected Per-Organism Statistics 2.2.8 Growth of GenBank 3. FILE FORMATS 3.1 File Header Information 3.2 Directory Files 3.2.1 Short Directory File 3.3 Index Files 3.3.1 Accession Number Index File 3.3.2 Keyword Phrase Index File 3.3.3 Author Name Index File 3.3.4 Journal Citation Index File 3.3.5 Gene Name Index 3.4 Sequence Entry Files 3.4.1 File Organization 3.4.2 Entry Organization 3.4.3 Sample Sequence Data File 3.4.4 LOCUS Format 3.4.5 DEFINITION Format 3.4.5.1 DEFINITION Format for NLM Entries 3.4.6 ACCESSION Format 3.4.7 VERSION Format 3.4.8 KEYWORDS Format 3.4.9 SEGMENT Format 3.4.10 SOURCE Format 3.4.11 REFERENCE Format 3.4.12 FEATURES Format 3.4.12.1 Feature Key Names 3.4.12.2 Feature Location 3.4.12.3 Feature Qualifiers 3.4.12.4 Cross-Reference Information 3.4.12.5 Feature Table Examples 3.4.13 ORIGIN Format 3.4.14 SEQUENCE Format 3.4.15 CONTIG Format 4. ALTERNATE RELEASES 5. KNOWN PROBLEMS OF THE GENBANK DATABASE 5.1 Incorrect Gene Symbols in Entries and Index 6. GENBANK ADMINISTRATION 6.1 Registered Trademark Notice 6.2 Citing GenBank 6.3 GenBank Distribution Formats and Media 6.4 Other Methods of Accessing GenBank Data 6.5 Request for Corrections and Comments 6.6 Credits and Acknowledgments 6.7 Disclaimer ========================================================================== 1. INTRODUCTION 1.1 Release 149.0 The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), National Institutes of Health (NIH) is responsible for producing and distributing the GenBank Sequence Database. NCBI handles all GenBank direct submissions and authors are advised to use the address below. Submitters are encouraged to use the free Sequin software package for sending sequence data, or the newly developed World Wide Web submission form. See Section 1.5 below for details. ***************************************************************************** The address for direct submissions to GenBank is: GenBank Submissions National Center for Biotechnology Information Bldg 38A, Rm. 8N-803 8600 Rockville Pike Bethesda, MD 20894 E-MAIL: gb-sub@ncbi.nlm.nih.gov Updates and changes to existing GenBank records: E-MAIL: update@ncbi.nlm.nih.gov URL for the new GenBank submission tool - BankIt - on the World Wide Web: http://www.ncbi.nlm.nih.gov/ (see Section 1.5 for additional details about submitting data to GenBank.) ***************************************************************************** GenBank Release 149.0 is a release of sequence data by NCBI in the GenBank flatfile format. GenBank is a component of a tri-partite, international collaboration of sequence databases in the U.S., Europe, and Japan. The collaborating databases in Europe are the European Molecular Biology Laboratory (EMBL) at Hinxton Hall, UK, and the DNA Database of Japan (DDBJ) in Mishima, Japan. Patent sequences are incorporated through arrangements with the U.S. Patent and Trademark Office, and via the collaborating international databases from other international patent offices. The database is converted to various output formats, including the Flat File and Abstract Syntax Notation 1 (ASN.1) versions. The ASN.1 and Flat File forms of the data are available at NCBI's anonymous FTP server : ftp://ftp.ncbi.nih.gov A mirror of the GenBank FTP site at the NCBI is available at the University of Indiana: ftp://bio-mirror.net/biomirror/genbank/ Some users who experience slow FTP transfers of large files might realize an improvement in transfer rates from this alternate site when the volume of traffic at the NCBI is high. 1.2 Cutoff Date This full release, 149.0, incorporates data available to the collaborating databases as of August 15, 2005 at approximately 1:30am EDT. For more recent data, users are advised to: o Download GenBank Incremental Update (GIU) files by anonymous FTP from NCBI: ftp://ftp.ncbi.nih.gov/ncbi-asn1/daily-nc (ASN.1 format) ftp://ftp.ncbi.nih.gov/genbank/daily-nc (flatfile format) o Use the interactive Network-Entrez or Web-Entrez applications to query the 'Entrez: Nucleotides' database (see Section 6.4 of this document). 1.3 Important Changes in Release 149.0 1.3.0 GenBank Exceeds 100 Gigabases! GenBank reaches a milestone with 149.0, exceeding 100 gigabases of sequence data. It is interesting to note that the Whole Genome Shotgun (WGS) portion of the database has grown to exceed the non-WGS portion in just 3.5 years. 1.3.1 Problems generating accession number and keyword indexes Continuing software problems again prevented the generation of the gbacc.idx and gbkey.idx 'index' files which normally accompany GenBank releases. A version of gbacc.idx was built manually. However, the first field contains just an accession number rather than Accession.Version . The gbkey.idx index could not be created without substantial additional delays in release processing, so it is completely absent from 149.0 . Our apologies for any inconvenience that this may cause. 1.3.2 Organizational changes The total number of sequence data files increased by 25 with this release: - the EST division is now comprised of 413 files (+16) - the GSS division is now comprised of 151 files (+7) - the HTG division is now comprised of 68 files (+3) - the PRI division is now comprised of 29 files (+1) - the ROD division is now comprised of 20 files (+2) 1.3.3 GSS File Header Problem GSS sequences at GenBank are maintained in two different systems, depending on their origin, and the dumps from those systems occur in parallel. Because the second dump (for example) has no prior knowledge of exactly how many GSS files will be dumped from the first, it does not know how to number its own output files. There is thus a discrepancy between the filenames and file headers for twenty-seven of the GSS flatfiles in Release 149.0. Consider gbgss125.seq : GBGSS1.SEQ Genetic Sequence Data Bank August 15 2005 NCBI-GenBank Flat File Release 149.0 GSS Sequences (Part 1) 87189 loci, 64730609 bases, from 87189 reported sequences Here, the filename and part number in the header is "1", though the file has been renamed as "125" based on the number of files dumped from the other system. We will work to resolve this discrepancy in future releases, but the priority is certainly much lower than many other tasks. 1.4 Upcoming Changes Several changes related to the Feature Table were agreed to during the May 2005 collaborative meeting among DDBJ, EMBL, and GenBank. The descriptions of the changes provided below are preliminary; complete definitions will appear in future release notes. 1.4.1 New qualifiers for the source feature A set of five new source feature qualifiers will be legal as of the October 2005 release. /lat_lon : GPS coordinates for the location at which a specimen, from which the sequence was obtained, was collected. Format: Decimal degrees (N/S, E/W). /collected_by : Name of the person who collected the specimen. /collection_date : Date that the specimen was collected. Format: DD-MMM-YYYY (two-digit month, three letter month abbreviation, 4-digit year) /identified_by : Name of the person who identified the specimen. /PCR_primers="fwd_name: XXX, fwd_seq: aaatttgggccc" rev_name: YYY, rev_seq: gggcccaaattt" Four separate primer-related qualifiers were initially proposed (and announced), but in subsequent discussion it was decided to combine them into a single structured /PCR_primers qualifier. fwd_seq and rev_seq are mandatory, and their values must be from the IUPAC nucleotide alphabet. fwd_name and rev_name are both optional. The primer names (if present) must be a single token, without whitespace. The order of the elements within the /PCR_primers must always be as shown above. Multiple /PCR_primers qualifiers may exist on a source feature. These qualifiers will most likely see their first use in association with environmental sampling projects and the BarCode project. 1.4.2 : /evidence qualifer to be replaced Two new qualifiers designed to replace /evidence will be legal as of the October 2005 GenBank release : /experiment and /inference . The current /evidence="not_experimental" qualifier will be replaced by /inference . The /inference values will be from a controlled list which is intended to capture several different classes of inferential methods. The current /evidence="experimental" qualifier will be replaced by /experiment. This will be a free-text qualifier in which a brief description of the nature of the bench experiment which supports the associated feature can be provided by the submittor. 1.4.3 New /organelle qualifier value As of the October 2005 GenBank release, a new value for the /organelle qualifier will be legal : hydrogenosome This will support the annotation of sequences from anaerobic protozoa and fungi, for which the hydrogenosome has a role in anaerobic respiration. 1.4.4 Two new CDS qualifiers As of the October 2005 GenBank release, two new CDS feature qualifiers will be introduced: /trans_splicing /ribosomal_slippage Coding regions involved in such processes will be more easily identified with the addition of these qualifiers. 1.4.5 New /exception qualifier value Coding regions for which the conceptual protein translation differs from the supplied /translation qualifier are flagged with an /exception qualifier. The value : "rearrangement required for product" will be legal for this qualifier as of the October 2005 GenBank release. 1.4.6 : /repeat_unit qualifer to be replaced Two new qualifiers designed to replace /repeat_unit will be legal as of the October 2005 GenBank release : /repeat_unit_seq and /repeat_unit_range . The current qualifier accomodates both integer ranges (eg: "10..20") and characters that represent a repeat unit pattern (eg: (AT)2(AA)5 ). Introducing a distinct qualifier for each of these representations will make it easier to submit and validate them. 1.5 Request for Direct Submission of Sequence Data A successful GenBank requires that sequence data enter the database as soon as possible after publication, that the annotations be as complete as possible, and that the sequence and annotation data be accurate. All three of these requirements are best met if authors of sequence data submit their data directly to GenBank in a usable form. It is especially important that these submissions be in computer-readable form. GenBank must rely on direct author submission of data to ensure that it achieves its goals of completeness, accuracy, and timeliness. To assist researchers in entering their own sequence data, GenBank provides a WWW submission tool called BankIt, as well as a stand-alone software package called Sequin. BankIt and Sequin are both easy-to-use programs that enable authors to enter a sequence, annotate it, and submit it to GenBank. Through the international collaboration of DNA sequence databases, GenBank submissions are forwarded daily for inclusion in the EMBL and DDBJ databases. SEQUIN. Sequin is an interactive, graphically-oriented program based on screen forms and controlled vocabularies that guides you through the process of entering your sequence and providing biological and bibliographic annotation. Sequin is designed to simplify the sequence submission process, and to provide increased data handling capabilities to accomodate very long sequences, complex annotations, and robust error checking. E-mail the completed submission file to : gb-sub@ncbi.nlm.nih.gov Sequin is provided for Macintosh, PC/Windows, UNIX and VMS computers. It is available by annonymous ftp from ftp.ncbi.nih.gov; login as anonymous and use your e-mail address as the password. It is located in the sequin directory. Or direct your web browser to this URL: ftp://ftp.ncbi.nih.gov/sequin BANKIT. BankIt provides a simple forms-based approach for submitting your sequence and descriptive information to GenBank. Your submission will be submitted directly to GenBank via the World Wide Web, and immediately forwarded for inclusion in the EMBL and DDBJ databases. BankIt may be used with Netscape, Internet Explorer, and other common WWW clients. You can access BankIt from GenBank's home page: http://www.ncbi.nlm.nih.gov/ AUTHORIN. Authorin sequence submissions are no longer accepted by GenBank, and the Authorin application is no longer distributed by NCBI. If you have questions about GenBank submissions or any of the data submission tools, contact NCBI at: info@ncbi.nlm.nih.gov or 301-496-2475. 1.6 Organization of This Document The second section describes the contents of GenBank releases. The third section illustrates the formats of the flat files. The fourth section describes other versions of the data, the fifth section identifies known prob- lems, and the sixth contains administrative details. 2. ORGANIZATION OF DATA FILES 2.1 Overview GenBank releases consist of a set of ASCII text files, most of which contain sequence data. A few supplemental "index" files are also supplied, containing comprehensive lists of author names, journal citations, gene names, and keywords, along with the accession numbers of the records in which they can be found (see Section 3.3). The line-lengths of these files is variable. 2.2 Files This GenBank flat file release consists of 816 files. The lists that follow describe each of the files included in the distribution. Their sizes and base pair content are also summarized. 2.2.1 File Descriptions Files included in this release are: 1. gbacc.idx - Index of the entries according to accession number. 2. gbaut1.idx - Index of the entries according to accession number, part 1. 3. gbaut10.idx - Index of the entries according to accession number, part 10. 4. gbaut11.idx - Index of the entries according to accession number, part 11. 5. gbaut12.idx - Index of the entries according to accession number, part 12. 6. gbaut13.idx - Index of the entries according to accession number, part 13. 7. gbaut14.idx - Index of the entries according to accession number, part 14. 8. gbaut15.idx - Index of the entries according to accession number, part 15. 9. gbaut16.idx - Index of the entries according to accession number, part 16. 10. gbaut17.idx - Index of the entries according to accession number, part 17. 11. gbaut18.idx - Index of the entries according to accession number, part 18. 12. gbaut19.idx - Index of the entries according to accession number, part 19. 13. gbaut2.idx - Index of the entries according to accession number, part 2. 14. gbaut20.idx - Index of the entries according to accession number, part 20. 15. gbaut21.idx - Index of the entries according to accession number, part 21. 16. gbaut22.idx - Index of the entries according to accession number, part 22. 17. gbaut23.idx - Index of the entries according to accession number, part 23. 18. gbaut24.idx - Index of the entries according to accession number, part 24. 19. gbaut25.idx - Index of the entries according to accession number, part 25. 20. gbaut26.idx - Index of the entries according to accession number, part 26. 21. gbaut27.idx - Index of the entries according to accession number, part 27. 22. gbaut28.idx - Index of the entries according to accession number, part 28. 23. gbaut29.idx - Index of the entries according to accession number, part 29. 24. gbaut3.idx - Index of the entries according to accession number, part 3. 25. gbaut30.idx - Index of the entries according to accession number, part 30. 26. gbaut31.idx - Index of the entries according to accession number, part 31. 27. gbaut4.idx - Index of the entries according to accession number, part 4. 28. gbaut5.idx - Index of the entries according to accession number, part 5. 29. gbaut6.idx - Index of the entries according to accession number, part 6. 30. gbaut7.idx - Index of the entries according to accession number, part 7. 31. gbaut8.idx - Index of the entries according to accession number, part 8. 32. gbaut9.idx - Index of the entries according to accession number, part 9. 33. gbbct1.seq - Bacterial sequence entries, part 1. 34. gbbct10.seq - Bacterial sequence entries, part 10. 35. gbbct11.seq - Bacterial sequence entries, part 11. 36. gbbct2.seq - Bacterial sequence entries, part 2. 37. gbbct3.seq - Bacterial sequence entries, part 3. 38. gbbct4.seq - Bacterial sequence entries, part 4. 39. gbbct5.seq - Bacterial sequence entries, part 5. 40. gbbct6.seq - Bacterial sequence entries, part 6. 41. gbbct7.seq - Bacterial sequence entries, part 7. 42. gbbct8.seq - Bacterial sequence entries, part 8. 43. gbbct9.seq - Bacterial sequence entries, part 9. 44. gbchg.txt - Accession numbers of entries updated since the previous release. 45. gbcon.seq - Constructed sequence entries. 46. gbdel.txt - Accession numbers of entries deleted since the previous release. 47. gbenv1.seq - Environmental sampling sequence entries, part 1. 48. gbenv2.seq - Environmental sampling sequence entries, part 2. 49. gbest1.seq - EST (expressed sequence tag) sequence entries, part 1. 50. gbest10.seq - EST (expressed sequence tag) sequence entries, part 10. 51. gbest100.seq - EST (expressed sequence tag) sequence entries, part 100. 52. gbest101.seq - EST (expressed sequence tag) sequence entries, part 101. 53. gbest102.seq - EST (expressed sequence tag) sequence entries, part 102. 54. gbest103.seq - EST (expressed sequence tag) sequence entries, part 103. 55. gbest104.seq - EST (expressed sequence tag) sequence entries, part 104. 56. gbest105.seq - EST (expressed sequence tag) sequence entries, part 105. 57. gbest106.seq - EST (expressed sequence tag) sequence entries, part 106. 58. gbest107.seq - EST (expressed sequence tag) sequence entries, part 107. 59. gbest108.seq - EST (expressed sequence tag) sequence entries, part 108. 60. gbest109.seq - EST (expressed sequence tag) sequence entries, part 109. 61. gbest11.seq - EST (expressed sequence tag) sequence entries, part 11. 62. gbest110.seq - EST (expressed sequence tag) sequence entries, part 110. 63. gbest111.seq - EST (expressed sequence tag) sequence entries, part 111. 64. gbest112.seq - EST (expressed sequence tag) sequence entries, part 112. 65. gbest113.seq - EST (expressed sequence tag) sequence entries, part 113. 66. gbest114.seq - EST (expressed sequence tag) sequence entries, part 114. 67. gbest115.seq - EST (expressed sequence tag) sequence entries, part 115. 68. gbest116.seq - EST (expressed sequence tag) sequence entries, part 116. 69. gbest117.seq - EST (expressed sequence tag) sequence entries, part 117. 70. gbest118.seq - EST (expressed sequence tag) sequence entries, part 118. 71. gbest119.seq - EST (expressed sequence tag) sequence entries, part 119. 72. gbest12.seq - EST (expressed sequence tag) sequence entries, part 12. 73. gbest120.seq - EST (expressed sequence tag) sequence entries, part 120. 74. gbest121.seq - EST (expressed sequence tag) sequence entries, part 121. 75. gbest122.seq - EST (expressed sequence tag) sequence entries, part 122. 76. gbest123.seq - EST (expressed sequence tag) sequence entries, part 123. 77. gbest124.seq - EST (expressed sequence tag) sequence entries, part 124. 78. gbest125.seq - EST (expressed sequence tag) sequence entries, part 125. 79. gbest126.seq - EST (expressed sequence tag) sequence entries, part 126. 80. gbest127.seq - EST (expressed sequence tag) sequence entries, part 127. 81. gbest128.seq - EST (expressed sequence tag) sequence entries, part 128. 82. gbest129.seq - EST (expressed sequence tag) sequence entries, part 129. 83. gbest13.seq - EST (expressed sequence tag) sequence entries, part 13. 84. gbest130.seq - EST (expressed sequence tag) sequence entries, part 130. 85. gbest131.seq - EST (expressed sequence tag) sequence entries, part 131. 86. gbest132.seq - EST (expressed sequence tag) sequence entries, part 132. 87. gbest133.seq - EST (expressed sequence tag) sequence entries, part 133. 88. gbest134.seq - EST (expressed sequence tag) sequence entries, part 134. 89. gbest135.seq - EST (expressed sequence tag) sequence entries, part 135. 90. gbest136.seq - EST (expressed sequence tag) sequence entries, part 136. 91. gbest137.seq - EST (expressed sequence tag) sequence entries, part 137. 92. gbest138.seq - EST (expressed sequence tag) sequence entries, part 138. 93. gbest139.seq - EST (expressed sequence tag) sequence entries, part 139. 94. gbest14.seq - EST (expressed sequence tag) sequence entries, part 14. 95. gbest140.seq - EST (expressed sequence tag) sequence entries, part 140. 96. gbest141.seq - EST (expressed sequence tag) sequence entries, part 141. 97. gbest142.seq - EST (expressed sequence tag) sequence entries, part 142. 98. gbest143.seq - EST (expressed sequence tag) sequence entries, part 143. 99. gbest144.seq - EST (expressed sequence tag) sequence entries, part 144. 100. gbest145.seq - EST (expressed sequence tag) sequence entries, part 145. 101. gbest146.seq - EST (expressed sequence tag) sequence entries, part 146. 102. gbest147.seq - EST (expressed sequence tag) sequence entries, part 147. 103. gbest148.seq - EST (expressed sequence tag) sequence entries, part 148. 104. gbest149.seq - EST (expressed sequence tag) sequence entries, part 149. 105. gbest15.seq - EST (expressed sequence tag) sequence entries, part 15. 106. gbest150.seq - EST (expressed sequence tag) sequence entries, part 150. 107. gbest151.seq - EST (expressed sequence tag) sequence entries, part 151. 108. gbest152.seq - EST (expressed sequence tag) sequence entries, part 152. 109. gbest153.seq - EST (expressed sequence tag) sequence entries, part 153. 110. gbest154.seq - EST (expressed sequence tag) sequence entries, part 154. 111. gbest155.seq - EST (expressed sequence tag) sequence entries, part 155. 112. gbest156.seq - EST (expressed sequence tag) sequence entries, part 156. 113. gbest157.seq - EST (expressed sequence tag) sequence entries, part 157. 114. gbest158.seq - EST (expressed sequence tag) sequence entries, part 158. 115. gbest159.seq - EST (expressed sequence tag) sequence entries, part 159. 116. gbest16.seq - EST (expressed sequence tag) sequence entries, part 16. 117. gbest160.seq - EST (expressed sequence tag) sequence entries, part 160. 118. gbest161.seq - EST (expressed sequence tag) sequence entries, part 161. 119. gbest162.seq - EST (expressed sequence tag) sequence entries, part 162. 120. gbest163.seq - EST (expressed sequence tag) sequence entries, part 163. 121. gbest164.seq - EST (expressed sequence tag) sequence entries, part 164. 122. gbest165.seq - EST (expressed sequence tag) sequence entries, part 165. 123. gbest166.seq - EST (expressed sequence tag) sequence entries, part 166. 124. gbest167.seq - EST (expressed sequence tag) sequence entries, part 167. 125. gbest168.seq - EST (expressed sequence tag) sequence entries, part 168. 126. gbest169.seq - EST (expressed sequence tag) sequence entries, part 169. 127. gbest17.seq - EST (expressed sequence tag) sequence entries, part 17. 128. gbest170.seq - EST (expressed sequence tag) sequence entries, part 170. 129. gbest171.seq - EST (expressed sequence tag) sequence entries, part 171. 130. gbest172.seq - EST (expressed sequence tag) sequence entries, part 172. 131. gbest173.seq - EST (expressed sequence tag) sequence entries, part 173. 132. gbest174.seq - EST (expressed sequence tag) sequence entries, part 174. 133. gbest175.seq - EST (expressed sequence tag) sequence entries, part 175. 134. gbest176.seq - EST (expressed sequence tag) sequence entries, part 176. 135. gbest177.seq - EST (expressed sequence tag) sequence entries, part 177. 136. gbest178.seq - EST (expressed sequence tag) sequence entries, part 178. 137. gbest179.seq - EST (expressed sequence tag) sequence entries, part 179. 138. gbest18.seq - EST (expressed sequence tag) sequence entries, part 18. 139. gbest180.seq - EST (expressed sequence tag) sequence entries, part 180. 140. gbest181.seq - EST (expressed sequence tag) sequence entries, part 181. 141. gbest182.seq - EST (expressed sequence tag) sequence entries, part 182. 142. gbest183.seq - EST (expressed sequence tag) sequence entries, part 183. 143. gbest184.seq - EST (expressed sequence tag) sequence entries, part 184. 144. gbest185.seq - EST (expressed sequence tag) sequence entries, part 185. 145. gbest186.seq - EST (expressed sequence tag) sequence entries, part 186. 146. gbest187.seq - EST (expressed sequence tag) sequence entries, part 187. 147. gbest188.seq - EST (expressed sequence tag) sequence entries, part 188. 148. gbest189.seq - EST (expressed sequence tag) sequence entries, part 189. 149. gbest19.seq - EST (expressed sequence tag) sequence entries, part 19. 150. gbest190.seq - EST (expressed sequence tag) sequence entries, part 190. 151. gbest191.seq - EST (expressed sequence tag) sequence entries, part 191. 152. gbest192.seq - EST (expressed sequence tag) sequence entries, part 192. 153. gbest193.seq - EST (expressed sequence tag) sequence entries, part 193. 154. gbest194.seq - EST (expressed sequence tag) sequence entries, part 194. 155. gbest195.seq - EST (expressed sequence tag) sequence entries, part 195. 156. gbest196.seq - EST (expressed sequence tag) sequence entries, part 196. 157. gbest197.seq - EST (expressed sequence tag) sequence entries, part 197. 158. gbest198.seq - EST (expressed sequence tag) sequence entries, part 198. 159. gbest199.seq - EST (expressed sequence tag) sequence entries, part 199. 160. gbest2.seq - EST (expressed sequence tag) sequence entries, part 2. 161. gbest20.seq - EST (expressed sequence tag) sequence entries, part 20. 162. gbest200.seq - EST (expressed sequence tag) sequence entries, part 200. 163. gbest201.seq - EST (expressed sequence tag) sequence entries, part 201. 164. gbest202.seq - EST (expressed sequence tag) sequence entries, part 202. 165. gbest203.seq - EST (expressed sequence tag) sequence entries, part 203. 166. gbest204.seq - EST (expressed sequence tag) sequence entries, part 204. 167. gbest205.seq - EST (expressed sequence tag) sequence entries, part 205. 168. gbest206.seq - EST (expressed sequence tag) sequence entries, part 206. 169. gbest207.seq - EST (expressed sequence tag) sequence entries, part 207. 170. gbest208.seq - EST (expressed sequence tag) sequence entries, part 208. 171. gbest209.seq - EST (expressed sequence tag) sequence entries, part 209. 172. gbest21.seq - EST (expressed sequence tag) sequence entries, part 21. 173. gbest210.seq - EST (expressed sequence tag) sequence entries, part 210. 174. gbest211.seq - EST (expressed sequence tag) sequence entries, part 211. 175. gbest212.seq - EST (expressed sequence tag) sequence entries, part 212. 176. gbest213.seq - EST (expressed sequence tag) sequence entries, part 213. 177. gbest214.seq - EST (expressed sequence tag) sequence entries, part 214. 178. gbest215.seq - EST (expressed sequence tag) sequence entries, part 215. 179. gbest216.seq - EST (expressed sequence tag) sequence entries, part 216. 180. gbest217.seq - EST (expressed sequence tag) sequence entries, part 217. 181. gbest218.seq - EST (expressed sequence tag) sequence entries, part 218. 182. gbest219.seq - EST (expressed sequence tag) sequence entries, part 219. 183. gbest22.seq - EST (expressed sequence tag) sequence entries, part 22. 184. gbest220.seq - EST (expressed sequence tag) sequence entries, part 220. 185. gbest221.seq - EST (expressed sequence tag) sequence entries, part 221. 186. gbest222.seq - EST (expressed sequence tag) sequence entries, part 222. 187. gbest223.seq - EST (expressed sequence tag) sequence entries, part 223. 188. gbest224.seq - EST (expressed sequence tag) sequence entries, part 224. 189. gbest225.seq - EST (expressed sequence tag) sequence entries, part 225. 190. gbest226.seq - EST (expressed sequence tag) sequence entries, part 226. 191. gbest227.seq - EST (expressed sequence tag) sequence entries, part 227. 192. gbest228.seq - EST (expressed sequence tag) sequence entries, part 228. 193. gbest229.seq - EST (expressed sequence tag) sequence entries, part 229. 194. gbest23.seq - EST (expressed sequence tag) sequence entries, part 23. 195. gbest230.seq - EST (expressed sequence tag) sequence entries, part 230. 196. gbest231.seq - EST (expressed sequence tag) sequence entries, part 231. 197. gbest232.seq - EST (expressed sequence tag) sequence entries, part 232. 198. gbest233.seq - EST (expressed sequence tag) sequence entries, part 233. 199. gbest234.seq - EST (expressed sequence tag) sequence entries, part 234. 200. gbest235.seq - EST (expressed sequence tag) sequence entries, part 235. 201. gbest236.seq - EST (expressed sequence tag) sequence entries, part 236. 202. gbest237.seq - EST (expressed sequence tag) sequence entries, part 237. 203. gbest238.seq - EST (expressed sequence tag) sequence entries, part 238. 204. gbest239.seq - EST (expressed sequence tag) sequence entries, part 239. 205. gbest24.seq - EST (expressed sequence tag) sequence entries, part 24. 206. gbest240.seq - EST (expressed sequence tag) sequence entries, part 240. 207. gbest241.seq - EST (expressed sequence tag) sequence entries, part 241. 208. gbest242.seq - EST (expressed sequence tag) sequence entries, part 242. 209. gbest243.seq - EST (expressed sequence tag) sequence entries, part 243. 210. gbest244.seq - EST (expressed sequence tag) sequence entries, part 244. 211. gbest245.seq - EST (expressed sequence tag) sequence entries, part 245. 212. gbest246.seq - EST (expressed sequence tag) sequence entries, part 246. 213. gbest247.seq - EST (expressed sequence tag) sequence entries, part 247. 214. gbest248.seq - EST (expressed sequence tag) sequence entries, part 248. 215. gbest249.seq - EST (expressed sequence tag) sequence entries, part 249. 216. gbest25.seq - EST (expressed sequence tag) sequence entries, part 25. 217. gbest250.seq - EST (expressed sequence tag) sequence entries, part 250. 218. gbest251.seq - EST (expressed sequence tag) sequence entries, part 251. 219. gbest252.seq - EST (expressed sequence tag) sequence entries, part 252. 220. gbest253.seq - EST (expressed sequence tag) sequence entries, part 253. 221. gbest254.seq - EST (expressed sequence tag) sequence entries, part 254. 222. gbest255.seq - EST (expressed sequence tag) sequence entries, part 255. 223. gbest256.seq - EST (expressed sequence tag) sequence entries, part 256. 224. gbest257.seq - EST (expressed sequence tag) sequence entries, part 257. 225. gbest258.seq - EST (expressed sequence tag) sequence entries, part 258. 226. gbest259.seq - EST (expressed sequence tag) sequence entries, part 259. 227. gbest26.seq - EST (expressed sequence tag) sequence entries, part 26. 228. gbest260.seq - EST (expressed sequence tag) sequence entries, part 260. 229. gbest261.seq - EST (expressed sequence tag) sequence entries, part 261. 230. gbest262.seq - EST (expressed sequence tag) sequence entries, part 262. 231. gbest263.seq - EST (expressed sequence tag) sequence entries, part 263. 232. gbest264.seq - EST (expressed sequence tag) sequence entries, part 264. 233. gbest265.seq - EST (expressed sequence tag) sequence entries, part 265. 234. gbest266.seq - EST (expressed sequence tag) sequence entries, part 266. 235. gbest267.seq - EST (expressed sequence tag) sequence entries, part 267. 236. gbest268.seq - EST (expressed sequence tag) sequence entries, part 268. 237. gbest269.seq - EST (expressed sequence tag) sequence entries, part 269. 238. gbest27.seq - EST (expressed sequence tag) sequence entries, part 27. 239. gbest270.seq - EST (expressed sequence tag) sequence entries, part 270. 240. gbest271.seq - EST (expressed sequence tag) sequence entries, part 271. 241. gbest272.seq - EST (expressed sequence tag) sequence entries, part 272. 242. gbest273.seq - EST (expressed sequence tag) sequence entries, part 273. 243. gbest274.seq - EST (expressed sequence tag) sequence entries, part 274. 244. gbest275.seq - EST (expressed sequence tag) sequence entries, part 275. 245. gbest276.seq - EST (expressed sequence tag) sequence entries, part 276. 246. gbest277.seq - EST (expressed sequence tag) sequence entries, part 277. 247. gbest278.seq - EST (expressed sequence tag) sequence entries, part 278. 248. gbest279.seq - EST (expressed sequence tag) sequence entries, part 279. 249. gbest28.seq - EST (expressed sequence tag) sequence entries, part 28. 250. gbest280.seq - EST (expressed sequence tag) sequence entries, part 280. 251. gbest281.seq - EST (expressed sequence tag) sequence entries, part 281. 252. gbest282.seq - EST (expressed sequence tag) sequence entries, part 282. 253. gbest283.seq - EST (expressed sequence tag) sequence entries, part 283. 254. gbest284.seq - EST (expressed sequence tag) sequence entries, part 284. 255. gbest285.seq - EST (expressed sequence tag) sequence entries, part 285. 256. gbest286.seq - EST (expressed sequence tag) sequence entries, part 286. 257. gbest287.seq - EST (expressed sequence tag) sequence entries, part 287. 258. gbest288.seq - EST (expressed sequence tag) sequence entries, part 288. 259. gbest289.seq - EST (expressed sequence tag) sequence entries, part 289. 260. gbest29.seq - EST (expressed sequence tag) sequence entries, part 29. 261. gbest290.seq - EST (expressed sequence tag) sequence entries, part 290. 262. gbest291.seq - EST (expressed sequence tag) sequence entries, part 291. 263. gbest292.seq - EST (expressed sequence tag) sequence entries, part 292. 264. gbest293.seq - EST (expressed sequence tag) sequence entries, part 293. 265. gbest294.seq - EST (expressed sequence tag) sequence entries, part 294. 266. gbest295.seq - EST (expressed sequence tag) sequence entries, part 295. 267. gbest296.seq - EST (expressed sequence tag) sequence entries, part 296. 268. gbest297.seq - EST (expressed sequence tag) sequence entries, part 297. 269. gbest298.seq - EST (expressed sequence tag) sequence entries, part 298. 270. gbest299.seq - EST (expressed sequence tag) sequence entries, part 299. 271. gbest3.seq - EST (expressed sequence tag) sequence entries, part 3. 272. gbest30.seq - EST (expressed sequence tag) sequence entries, part 30. 273. gbest300.seq - EST (expressed sequence tag) sequence entries, part 300. 274. gbest301.seq - EST (expressed sequence tag) sequence entries, part 301. 275. gbest302.seq - EST (expressed sequence tag) sequence entries, part 302. 276. gbest303.seq - EST (expressed sequence tag) sequence entries, part 303. 277. gbest304.seq - EST (expressed sequence tag) sequence entries, part 304. 278. gbest305.seq - EST (expressed sequence tag) sequence entries, part 305. 279. gbest306.seq - EST (expressed sequence tag) sequence entries, part 306. 280. gbest307.seq - EST (expressed sequence tag) sequence entries, part 307. 281. gbest308.seq - EST (expressed sequence tag) sequence entries, part 308. 282. gbest309.seq - EST (expressed sequence tag) sequence entries, part 309. 283. gbest31.seq - EST (expressed sequence tag) sequence entries, part 31. 284. gbest310.seq - EST (expressed sequence tag) sequence entries, part 310. 285. gbest311.seq - EST (expressed sequence tag) sequence entries, part 311. 286. gbest312.seq - EST (expressed sequence tag) sequence entries, part 312. 287. gbest313.seq - EST (expressed sequence tag) sequence entries, part 313. 288. gbest314.seq - EST (expressed sequence tag) sequence entries, part 314. 289. gbest315.seq - EST (expressed sequence tag) sequence entries, part 315. 290. gbest316.seq - EST (expressed sequence tag) sequence entries, part 316. 291. gbest317.seq - EST (expressed sequence tag) sequence entries, part 317. 292. gbest318.seq - EST (expressed sequence tag) sequence entries, part 318. 293. gbest319.seq - EST (expressed sequence tag) sequence entries, part 319. 294. gbest32.seq - EST (expressed sequence tag) sequence entries, part 32. 295. gbest320.seq - EST (expressed sequence tag) sequence entries, part 320. 296. gbest321.seq - EST (expressed sequence tag) sequence entries, part 321. 297. gbest322.seq - EST (expressed sequence tag) sequence entries, part 322. 298. gbest323.seq - EST (expressed sequence tag) sequence entries, part 323. 299. gbest324.seq - EST (expressed sequence tag) sequence entries, part 324. 300. gbest325.seq - EST (expressed sequence tag) sequence entries, part 325. 301. gbest326.seq - EST (expressed sequence tag) sequence entries, part 326. 302. gbest327.seq - EST (expressed sequence tag) sequence entries, part 327. 303. gbest328.seq - EST (expressed sequence tag) sequence entries, part 328. 304. gbest329.seq - EST (expressed sequence tag) sequence entries, part 329. 305. gbest33.seq - EST (expressed sequence tag) sequence entries, part 33. 306. gbest330.seq - EST (expressed sequence tag) sequence entries, part 330. 307. gbest331.seq - EST (expressed sequence tag) sequence entries, part 331. 308. gbest332.seq - EST (expressed sequence tag) sequence entries, part 332. 309. gbest333.seq - EST (expressed sequence tag) sequence entries, part 333. 310. gbest334.seq - EST (expressed sequence tag) sequence entries, part 334. 311. gbest335.seq - EST (expressed sequence tag) sequence entries, part 335. 312. gbest336.seq - EST (expressed sequence tag) sequence entries, part 336. 313. gbest337.seq - EST (expressed sequence tag) sequence entries, part 337. 314. gbest338.seq - EST (expressed sequence tag) sequence entries, part 338. 315. gbest339.seq - EST (expressed sequence tag) sequence entries, part 339. 316. gbest34.seq - EST (expressed sequence tag) sequence entries, part 34. 317. gbest340.seq - EST (expressed sequence tag) sequence entries, part 340. 318. gbest341.seq - EST (expressed sequence tag) sequence entries, part 341. 319. gbest342.seq - EST (expressed sequence tag) sequence entries, part 342. 320. gbest343.seq - EST (expressed sequence tag) sequence entries, part 343. 321. gbest344.seq - EST (expressed sequence tag) sequence entries, part 344. 322. gbest345.seq - EST (expressed sequence tag) sequence entries, part 345. 323. gbest346.seq - EST (expressed sequence tag) sequence entries, part 346. 324. gbest347.seq - EST (expressed sequence tag) sequence entries, part 347. 325. gbest348.seq - EST (expressed sequence tag) sequence entries, part 348. 326. gbest349.seq - EST (expressed sequence tag) sequence entries, part 349. 327. gbest35.seq - EST (expressed sequence tag) sequence entries, part 35. 328. gbest350.seq - EST (expressed sequence tag) sequence entries, part 350. 329. gbest351.seq - EST (expressed sequence tag) sequence entries, part 351. 330. gbest352.seq - EST (expressed sequence tag) sequence entries, part 352. 331. gbest353.seq - EST (expressed sequence tag) sequence entries, part 353. 332. gbest354.seq - EST (expressed sequence tag) sequence entries, part 354. 333. gbest355.seq - EST (expressed sequence tag) sequence entries, part 355. 334. gbest356.seq - EST (expressed sequence tag) sequence entries, part 356. 335. gbest357.seq - EST (expressed sequence tag) sequence entries, part 357. 336. gbest358.seq - EST (expressed sequence tag) sequence entries, part 358. 337. gbest359.seq - EST (expressed sequence tag) sequence entries, part 359. 338. gbest36.seq - EST (expressed sequence tag) sequence entries, part 36. 339. gbest360.seq - EST (expressed sequence tag) sequence entries, part 360. 340. gbest361.seq - EST (expressed sequence tag) sequence entries, part 361. 341. gbest362.seq - EST (expressed sequence tag) sequence entries, part 362. 342. gbest363.seq - EST (expressed sequence tag) sequence entries, part 363. 343. gbest364.seq - EST (expressed sequence tag) sequence entries, part 364. 344. gbest365.seq - EST (expressed sequence tag) sequence entries, part 365. 345. gbest366.seq - EST (expressed sequence tag) sequence entries, part 366. 346. gbest367.seq - EST (expressed sequence tag) sequence entries, part 367. 347. gbest368.seq - EST (expressed sequence tag) sequence entries, part 368. 348. gbest369.seq - EST (expressed sequence tag) sequence entries, part 369. 349. gbest37.seq - EST (expressed sequence tag) sequence entries, part 37. 350. gbest370.seq - EST (expressed sequence tag) sequence entries, part 370. 351. gbest371.seq - EST (expressed sequence tag) sequence entries, part 371. 352. gbest372.seq - EST (expressed sequence tag) sequence entries, part 372. 353. gbest373.seq - EST (expressed sequence tag) sequence entries, part 373. 354. gbest374.seq - EST (expressed sequence tag) sequence entries, part 374. 355. gbest375.seq - EST (expressed sequence tag) sequence entries, part 375. 356. gbest376.seq - EST (expressed sequence tag) sequence entries, part 376. 357. gbest377.seq - EST (expressed sequence tag) sequence entries, part 377. 358. gbest378.seq - EST (expressed sequence tag) sequence entries, part 378. 359. gbest379.seq - EST (expressed sequence tag) sequence entries, part 379. 360. gbest38.seq - EST (expressed sequence tag) sequence entries, part 38. 361. gbest380.seq - EST (expressed sequence tag) sequence entries, part 380. 362. gbest381.seq - EST (expressed sequence tag) sequence entries, part 381. 363. gbest382.seq - EST (expressed sequence tag) sequence entries, part 382. 364. gbest383.seq - EST (expressed sequence tag) sequence entries, part 383. 365. gbest384.seq - EST (expressed sequence tag) sequence entries, part 384. 366. gbest385.seq - EST (expressed sequence tag) sequence entries, part 385. 367. gbest386.seq - EST (expressed sequence tag) sequence entries, part 386. 368. gbest387.seq - EST (expressed sequence tag) sequence entries, part 387. 369. gbest388.seq - EST (expressed sequence tag) sequence entries, part 388. 370. gbest389.seq - EST (expressed sequence tag) sequence entries, part 389. 371. gbest39.seq - EST (expressed sequence tag) sequence entries, part 39. 372. gbest390.seq - EST (expressed sequence tag) sequence entries, part 390. 373. gbest391.seq - EST (expressed sequence tag) sequence entries, part 391. 374. gbest392.seq - EST (expressed sequence tag) sequence entries, part 392. 375. gbest393.seq - EST (expressed sequence tag) sequence entries, part 393. 376. gbest394.seq - EST (expressed sequence tag) sequence entries, part 394. 377. gbest395.seq - EST (expressed sequence tag) sequence entries, part 395. 378. gbest396.seq - EST (expressed sequence tag) sequence entries, part 396. 379. gbest397.seq - EST (expressed sequence tag) sequence entries, part 397. 380. gbest398.seq - EST (expressed sequence tag) sequence entries, part 398. 381. gbest399.seq - EST (expressed sequence tag) sequence entries, part 399. 382. gbest4.seq - EST (expressed sequence tag) sequence entries, part 4. 383. gbest40.seq - EST (expressed sequence tag) sequence entries, part 40. 384. gbest400.seq - EST (expressed sequence tag) sequence entries, part 400. 385. gbest401.seq - EST (expressed sequence tag) sequence entries, part 401. 386. gbest402.seq - EST (expressed sequence tag) sequence entries, part 402. 387. gbest403.seq - EST (expressed sequence tag) sequence entries, part 403. 388. gbest404.seq - EST (expressed sequence tag) sequence entries, part 404. 389. gbest405.seq - EST (expressed sequence tag) sequence entries, part 405. 390. gbest406.seq - EST (expressed sequence tag) sequence entries, part 406. 391. gbest407.seq - EST (expressed sequence tag) sequence entries, part 407. 392. gbest408.seq - EST (expressed sequence tag) sequence entries, part 408. 393. gbest409.seq - EST (expressed sequence tag) sequence entries, part 409. 394. gbest41.seq - EST (expressed sequence tag) sequence entries, part 41. 395. gbest410.seq - EST (expressed sequence tag) sequence entries, part 410. 396. gbest411.seq - EST (expressed sequence tag) sequence entries, part 411. 397. gbest412.seq - EST (expressed sequence tag) sequence entries, part 412. 398. gbest413.seq - EST (expressed sequence tag) sequence entries, part 413. 399. gbest42.seq - EST (expressed sequence tag) sequence entries, part 42. 400. gbest43.seq - EST (expressed sequence tag) sequence entries, part 43. 401. gbest44.seq - EST (expressed sequence tag) sequence entries, part 44. 402. gbest45.seq - EST (expressed sequence tag) sequence entries, part 45. 403. gbest46.seq - EST (expressed sequence tag) sequence entries, part 46. 404. gbest47.seq - EST (expressed sequence tag) sequence entries, part 47. 405. gbest48.seq - EST (expressed sequence tag) sequence entries, part 48. 406. gbest49.seq - EST (expressed sequence tag) sequence entries, part 49. 407. gbest5.seq - EST (expressed sequence tag) sequence entries, part 5. 408. gbest50.seq - EST (expressed sequence tag) sequence entries, part 50. 409. gbest51.seq - EST (expressed sequence tag) sequence entries, part 51. 410. gbest52.seq - EST (expressed sequence tag) sequence entries, part 52. 411. gbest53.seq - EST (expressed sequence tag) sequence entries, part 53. 412. gbest54.seq - EST (expressed sequence tag) sequence entries, part 54. 413. gbest55.seq - EST (expressed sequence tag) sequence entries, part 55. 414. gbest56.seq - EST (expressed sequence tag) sequence entries, part 56. 415. gbest57.seq - EST (expressed sequence tag) sequence entries, part 57. 416. gbest58.seq - EST (expressed sequence tag) sequence entries, part 58. 417. gbest59.seq - EST (expressed sequence tag) sequence entries, part 59. 418. gbest6.seq - EST (expressed sequence tag) sequence entries, part 6. 419. gbest60.seq - EST (expressed sequence tag) sequence entries, part 60. 420. gbest61.seq - EST (expressed sequence tag) sequence entries, part 61. 421. gbest62.seq - EST (expressed sequence tag) sequence entries, part 62. 422. gbest63.seq - EST (expressed sequence tag) sequence entries, part 63. 423. gbest64.seq - EST (expressed sequence tag) sequence entries, part 64. 424. gbest65.seq - EST (expressed sequence tag) sequence entries, part 65. 425. gbest66.seq - EST (expressed sequence tag) sequence entries, part 66. 426. gbest67.seq - EST (expressed sequence tag) sequence entries, part 67. 427. gbest68.seq - EST (expressed sequence tag) sequence entries, part 68. 428. gbest69.seq - EST (expressed sequence tag) sequence entries, part 69. 429. gbest7.seq - EST (expressed sequence tag) sequence entries, part 7. 430. gbest70.seq - EST (expressed sequence tag) sequence entries, part 70. 431. gbest71.seq - EST (expressed sequence tag) sequence entries, part 71. 432. gbest72.seq - EST (expressed sequence tag) sequence entries, part 72. 433. gbest73.seq - EST (expressed sequence tag) sequence entries, part 73. 434. gbest74.seq - EST (expressed sequence tag) sequence entries, part 74. 435. gbest75.seq - EST (expressed sequence tag) sequence entries, part 75. 436. gbest76.seq - EST (expressed sequence tag) sequence entries, part 76. 437. gbest77.seq - EST (expressed sequence tag) sequence entries, part 77. 438. gbest78.seq - EST (expressed sequence tag) sequence entries, part 78. 439. gbest79.seq - EST (expressed sequence tag) sequence entries, part 79. 440. gbest8.seq - EST (expressed sequence tag) sequence entries, part 8. 441. gbest80.seq - EST (expressed sequence tag) sequence entries, part 80. 442. gbest81.seq - EST (expressed sequence tag) sequence entries, part 81. 443. gbest82.seq - EST (expressed sequence tag) sequence entries, part 82. 444. gbest83.seq - EST (expressed sequence tag) sequence entries, part 83. 445. gbest84.seq - EST (expressed sequence tag) sequence entries, part 84. 446. gbest85.seq - EST (expressed sequence tag) sequence entries, part 85. 447. gbest86.seq - EST (expressed sequence tag) sequence entries, part 86. 448. gbest87.seq - EST (expressed sequence tag) sequence entries, part 87. 449. gbest88.seq - EST (expressed sequence tag) sequence entries, part 88. 450. gbest89.seq - EST (expressed sequence tag) sequence entries, part 89. 451. gbest9.seq - EST (expressed sequence tag) sequence entries, part 9. 452. gbest90.seq - EST (expressed sequence tag) sequence entries, part 90. 453. gbest91.seq - EST (expressed sequence tag) sequence entries, part 91. 454. gbest92.seq - EST (expressed sequence tag) sequence entries, part 92. 455. gbest93.seq - EST (expressed sequence tag) sequence entries, part 93. 456. gbest94.seq - EST (expressed sequence tag) sequence entries, part 94. 457. gbest95.seq - EST (expressed sequence tag) sequence entries, part 95. 458. gbest96.seq - EST (expressed sequence tag) sequence entries, part 96. 459. gbest97.seq - EST (expressed sequence tag) sequence entries, part 97. 460. gbest98.seq - EST (expressed sequence tag) sequence entries, part 98. 461. gbest99.seq - EST (expressed sequence tag) sequence entries, part 99. 462. gbgen.idx - Index of the entries according to gene symbols. 463. gbgss1.seq - GSS (genome survey sequence) sequence entries, part 1. 464. gbgss10.seq - GSS (genome survey sequence) sequence entries, part 10. 465. gbgss100.seq - GSS (genome survey sequence) sequence entries, part 100. 466. gbgss101.seq - GSS (genome survey sequence) sequence entries, part 101. 467. gbgss102.seq - GSS (genome survey sequence) sequence entries, part 102. 468. gbgss103.seq - GSS (genome survey sequence) sequence entries, part 103. 469. gbgss104.seq - GSS (genome survey sequence) sequence entries, part 104. 470. gbgss105.seq - GSS (genome survey sequence) sequence entries, part 105. 471. gbgss106.seq - GSS (genome survey sequence) sequence entries, part 106. 472. gbgss107.seq - GSS (genome survey sequence) sequence entries, part 107. 473. gbgss108.seq - GSS (genome survey sequence) sequence entries, part 108. 474. gbgss109.seq - GSS (genome survey sequence) sequence entries, part 109. 475. gbgss11.seq - GSS (genome survey sequence) sequence entries, part 11. 476. gbgss110.seq - GSS (genome survey sequence) sequence entries, part 110. 477. gbgss111.seq - GSS (genome survey sequence) sequence entries, part 111. 478. gbgss112.seq - GSS (genome survey sequence) sequence entries, part 112. 479. gbgss113.seq - GSS (genome survey sequence) sequence entries, part 113. 480. gbgss114.seq - GSS (genome survey sequence) sequence entries, part 114. 481. gbgss115.seq - GSS (genome survey sequence) sequence entries, part 115. 482. gbgss116.seq - GSS (genome survey sequence) sequence entries, part 116. 483. gbgss117.seq - GSS (genome survey sequence) sequence entries, part 117. 484. gbgss118.seq - GSS (genome survey sequence) sequence entries, part 118. 485. gbgss119.seq - GSS (genome survey sequence) sequence entries, part 119. 486. gbgss12.seq - GSS (genome survey sequence) sequence entries, part 12. 487. gbgss120.seq - GSS (genome survey sequence) sequence entries, part 120. 488. gbgss121.seq - GSS (genome survey sequence) sequence entries, part 121. 489. gbgss122.seq - GSS (genome survey sequence) sequence entries, part 122. 490. gbgss123.seq - GSS (genome survey sequence) sequence entries, part 123. 491. gbgss124.seq - GSS (genome survey sequence) sequence entries, part 124. 492. gbgss125.seq - GSS (genome survey sequence) sequence entries, part 125. 493. gbgss126.seq - GSS (genome survey sequence) sequence entries, part 126. 494. gbgss127.seq - GSS (genome survey sequence) sequence entries, part 127. 495. gbgss128.seq - GSS (genome survey sequence) sequence entries, part 128. 496. gbgss129.seq - GSS (genome survey sequence) sequence entries, part 129. 497. gbgss13.seq - GSS (genome survey sequence) sequence entries, part 13. 498. gbgss130.seq - GSS (genome survey sequence) sequence entries, part 130. 499. gbgss131.seq - GSS (genome survey sequence) sequence entries, part 131. 500. gbgss132.seq - GSS (genome survey sequence) sequence entries, part 132. 501. gbgss133.seq - GSS (genome survey sequence) sequence entries, part 133. 502. gbgss134.seq - GSS (genome survey sequence) sequence entries, part 134. 503. gbgss135.seq - GSS (genome survey sequence) sequence entries, part 135. 504. gbgss136.seq - GSS (genome survey sequence) sequence entries, part 136. 505. gbgss137.seq - GSS (genome survey sequence) sequence entries, part 137. 506. gbgss138.seq - GSS (genome survey sequence) sequence entries, part 138. 507. gbgss139.seq - GSS (genome survey sequence) sequence entries, part 139. 508. gbgss14.seq - GSS (genome survey sequence) sequence entries, part 14. 509. gbgss140.seq - GSS (genome survey sequence) sequence entries, part 140. 510. gbgss141.seq - GSS (genome survey sequence) sequence entries, part 141. 511. gbgss142.seq - GSS (genome survey sequence) sequence entries, part 142. 512. gbgss143.seq - GSS (genome survey sequence) sequence entries, part 143. 513. gbgss144.seq - GSS (genome survey sequence) sequence entries, part 144. 514. gbgss145.seq - GSS (genome survey sequence) sequence entries, part 145. 515. gbgss146.seq - GSS (genome survey sequence) sequence entries, part 146. 516. gbgss147.seq - GSS (genome survey sequence) sequence entries, part 147. 517. gbgss148.seq - GSS (genome survey sequence) sequence entries, part 148. 518. gbgss149.seq - GSS (genome survey sequence) sequence entries, part 149. 519. gbgss15.seq - GSS (genome survey sequence) sequence entries, part 15. 520. gbgss150.seq - GSS (genome survey sequence) sequence entries, part 150. 521. gbgss151.seq - GSS (genome survey sequence) sequence entries, part 151. 522. gbgss16.seq - GSS (genome survey sequence) sequence entries, part 16. 523. gbgss17.seq - GSS (genome survey sequence) sequence entries, part 17. 524. gbgss18.seq - GSS (genome survey sequence) sequence entries, part 18. 525. gbgss19.seq - GSS (genome survey sequence) sequence entries, part 19. 526. gbgss2.seq - GSS (genome survey sequence) sequence entries, part 2. 527. gbgss20.seq - GSS (genome survey sequence) sequence entries, part 20. 528. gbgss21.seq - GSS (genome survey sequence) sequence entries, part 21. 529. gbgss22.seq - GSS (genome survey sequence) sequence entries, part 22. 530. gbgss23.seq - GSS (genome survey sequence) sequence entries, part 23. 531. gbgss24.seq - GSS (genome survey sequence) sequence entries, part 24. 532. gbgss25.seq - GSS (genome survey sequence) sequence entries, part 25. 533. gbgss26.seq - GSS (genome survey sequence) sequence entries, part 26. 534. gbgss27.seq - GSS (genome survey sequence) sequence entries, part 27. 535. gbgss28.seq - GSS (genome survey sequence) sequence entries, part 28. 536. gbgss29.seq - GSS (genome survey sequence) sequence entries, part 29. 537. gbgss3.seq - GSS (genome survey sequence) sequence entries, part 3. 538. gbgss30.seq - GSS (genome survey sequence) sequence entries, part 30. 539. gbgss31.seq - GSS (genome survey sequence) sequence entries, part 31. 540. gbgss32.seq - GSS (genome survey sequence) sequence entries, part 32. 541. gbgss33.seq - GSS (genome survey sequence) sequence entries, part 33. 542. gbgss34.seq - GSS (genome survey sequence) sequence entries, part 34. 543. gbgss35.seq - GSS (genome survey sequence) sequence entries, part 35. 544. gbgss36.seq - GSS (genome survey sequence) sequence entries, part 36. 545. gbgss37.seq - GSS (genome survey sequence) sequence entries, part 37. 546. gbgss38.seq - GSS (genome survey sequence) sequence entries, part 38. 547. gbgss39.seq - GSS (genome survey sequence) sequence entries, part 39. 548. gbgss4.seq - GSS (genome survey sequence) sequence entries, part 4. 549. gbgss40.seq - GSS (genome survey sequence) sequence entries, part 40. 550. gbgss41.seq - GSS (genome survey sequence) sequence entries, part 41. 551. gbgss42.seq - GSS (genome survey sequence) sequence entries, part 42. 552. gbgss43.seq - GSS (genome survey sequence) sequence entries, part 43. 553. gbgss44.seq - GSS (genome survey sequence) sequence entries, part 44. 554. gbgss45.seq - GSS (genome survey sequence) sequence entries, part 45. 555. gbgss46.seq - GSS (genome survey sequence) sequence entries, part 46. 556. gbgss47.seq - GSS (genome survey sequence) sequence entries, part 47. 557. gbgss48.seq - GSS (genome survey sequence) sequence entries, part 48. 558. gbgss49.seq - GSS (genome survey sequence) sequence entries, part 49. 559. gbgss5.seq - GSS (genome survey sequence) sequence entries, part 5. 560. gbgss50.seq - GSS (genome survey sequence) sequence entries, part 50. 561. gbgss51.seq - GSS (genome survey sequence) sequence entries, part 51. 562. gbgss52.seq - GSS (genome survey sequence) sequence entries, part 52. 563. gbgss53.seq - GSS (genome survey sequence) sequence entries, part 53. 564. gbgss54.seq - GSS (genome survey sequence) sequence entries, part 54. 565. gbgss55.seq - GSS (genome survey sequence) sequence entries, part 55. 566. gbgss56.seq - GSS (genome survey sequence) sequence entries, part 56. 567. gbgss57.seq - GSS (genome survey sequence) sequence entries, part 57. 568. gbgss58.seq - GSS (genome survey sequence) sequence entries, part 58. 569. gbgss59.seq - GSS (genome survey sequence) sequence entries, part 59. 570. gbgss6.seq - GSS (genome survey sequence) sequence entries, part 6. 571. gbgss60.seq - GSS (genome survey sequence) sequence entries, part 60. 572. gbgss61.seq - GSS (genome survey sequence) sequence entries, part 61. 573. gbgss62.seq - GSS (genome survey sequence) sequence entries, part 62. 574. gbgss63.seq - GSS (genome survey sequence) sequence entries, part 63. 575. gbgss64.seq - GSS (genome survey sequence) sequence entries, part 64. 576. gbgss65.seq - GSS (genome survey sequence) sequence entries, part 65. 577. gbgss66.seq - GSS (genome survey sequence) sequence entries, part 66. 578. gbgss67.seq - GSS (genome survey sequence) sequence entries, part 67. 579. gbgss68.seq - GSS (genome survey sequence) sequence entries, part 68. 580. gbgss69.seq - GSS (genome survey sequence) sequence entries, part 69. 581. gbgss7.seq - GSS (genome survey sequence) sequence entries, part 7. 582. gbgss70.seq - GSS (genome survey sequence) sequence entries, part 70. 583. gbgss71.seq - GSS (genome survey sequence) sequence entries, part 71. 584. gbgss72.seq - GSS (genome survey sequence) sequence entries, part 72. 585. gbgss73.seq - GSS (genome survey sequence) sequence entries, part 73. 586. gbgss74.seq - GSS (genome survey sequence) sequence entries, part 74. 587. gbgss75.seq - GSS (genome survey sequence) sequence entries, part 75. 588. gbgss76.seq - GSS (genome survey sequence) sequence entries, part 76. 589. gbgss77.seq - GSS (genome survey sequence) sequence entries, part 77. 590. gbgss78.seq - GSS (genome survey sequence) sequence entries, part 78. 591. gbgss79.seq - GSS (genome survey sequence) sequence entries, part 79. 592. gbgss8.seq - GSS (genome survey sequence) sequence entries, part 8. 593. gbgss80.seq - GSS (genome survey sequence) sequence entries, part 80. 594. gbgss81.seq - GSS (genome survey sequence) sequence entries, part 81. 595. gbgss82.seq - GSS (genome survey sequence) sequence entries, part 82. 596. gbgss83.seq - GSS (genome survey sequence) sequence entries, part 83. 597. gbgss84.seq - GSS (genome survey sequence) sequence entries, part 84. 598. gbgss85.seq - GSS (genome survey sequence) sequence entries, part 85. 599. gbgss86.seq - GSS (genome survey sequence) sequence entries, part 86. 600. gbgss87.seq - GSS (genome survey sequence) sequence entries, part 87. 601. gbgss88.seq - GSS (genome survey sequence) sequence entries, part 88. 602. gbgss89.seq - GSS (genome survey sequence) sequence entries, part 89. 603. gbgss9.seq - GSS (genome survey sequence) sequence entries, part 9. 604. gbgss90.seq - GSS (genome survey sequence) sequence entries, part 90. 605. gbgss91.seq - GSS (genome survey sequence) sequence entries, part 91. 606. gbgss92.seq - GSS (genome survey sequence) sequence entries, part 92. 607. gbgss93.seq - GSS (genome survey sequence) sequence entries, part 93. 608. gbgss94.seq - GSS (genome survey sequence) sequence entries, part 94. 609. gbgss95.seq - GSS (genome survey sequence) sequence entries, part 95. 610. gbgss96.seq - GSS (genome survey sequence) sequence entries, part 96. 611. gbgss97.seq - GSS (genome survey sequence) sequence entries, part 97. 612. gbgss98.seq - GSS (genome survey sequence) sequence entries, part 98. 613. gbgss99.seq - GSS (genome survey sequence) sequence entries, part 99. 614. gbhtc1.seq - HTC (high throughput cDNA sequencing) sequence entries, part 1. 615. gbhtc2.seq - HTC (high throughput cDNA sequencing) sequence entries, part 2. 616. gbhtc3.seq - HTC (high throughput cDNA sequencing) sequence entries, part 3. 617. gbhtc4.seq - HTC (high throughput cDNA sequencing) sequence entries, part 4. 618. gbhtc5.seq - HTC (high throughput cDNA sequencing) sequence entries, part 5. 619. gbhtc6.seq - HTC (high throughput cDNA sequencing) sequence entries, part 6. 620. gbhtc7.seq - HTC (high throughput cDNA sequencing) sequence entries, part 7. 621. gbhtg1.seq - HTGS (high throughput genomic sequencing) sequence entries, part 1. 622. gbhtg10.seq - HTGS (high throughput genomic sequencing) sequence entries, part 10. 623. gbhtg11.seq - HTGS (high throughput genomic sequencing) sequence entries, part 11. 624. gbhtg12.seq - HTGS (high throughput genomic sequencing) sequence entries, part 12. 625. gbhtg13.seq - HTGS (high throughput genomic sequencing) sequence entries, part 13. 626. gbhtg14.seq - HTGS (high throughput genomic sequencing) sequence entries, part 14. 627. gbhtg15.seq - HTGS (high throughput genomic sequencing) sequence entries, part 15. 628. gbhtg16.seq - HTGS (high throughput genomic sequencing) sequence entries, part 16. 629. gbhtg17.seq - HTGS (high throughput genomic sequencing) sequence entries, part 17. 630. gbhtg18.seq - HTGS (high throughput genomic sequencing) sequence entries, part 18. 631. gbhtg19.seq - HTGS (high throughput genomic sequencing) sequence entries, part 19. 632. gbhtg2.seq - HTGS (high throughput genomic sequencing) sequence entries, part 2. 633. gbhtg20.seq - HTGS (high throughput genomic sequencing) sequence entries, part 20. 634. gbhtg21.seq - HTGS (high throughput genomic sequencing) sequence entries, part 21. 635. gbhtg22.seq - HTGS (high throughput genomic sequencing) sequence entries, part 22. 636. gbhtg23.seq - HTGS (high throughput genomic sequencing) sequence entries, part 23. 637. gbhtg24.seq - HTGS (high throughput genomic sequencing) sequence entries, part 24. 638. gbhtg25.seq - HTGS (high throughput genomic sequencing) sequence entries, part 25. 639. gbhtg26.seq - HTGS (high throughput genomic sequencing) sequence entries, part 26. 640. gbhtg27.seq - HTGS (high throughput genomic sequencing) sequence entries, part 27. 641. gbhtg28.seq - HTGS (high throughput genomic sequencing) sequence entries, part 28. 642. gbhtg29.seq - HTGS (high throughput genomic sequencing) sequence entries, part 29. 643. gbhtg3.seq - HTGS (high throughput genomic sequencing) sequence entries, part 3. 644. gbhtg30.seq - HTGS (high throughput genomic sequencing) sequence entries, part 30. 645. gbhtg31.seq - HTGS (high throughput genomic sequencing) sequence entries, part 31. 646. gbhtg32.seq - HTGS (high throughput genomic sequencing) sequence entries, part 32. 647. gbhtg33.seq - HTGS (high throughput genomic sequencing) sequence entries, part 33. 648. gbhtg34.seq - HTGS (high throughput genomic sequencing) sequence entries, part 34. 649. gbhtg35.seq - HTGS (high throughput genomic sequencing) sequence entries, part 35. 650. gbhtg36.seq - HTGS (high throughput genomic sequencing) sequence entries, part 36. 651. gbhtg37.seq - HTGS (high throughput genomic sequencing) sequence entries, part 37. 652. gbhtg38.seq - HTGS (high throughput genomic sequencing) sequence entries, part 38. 653. gbhtg39.seq - HTGS (high throughput genomic sequencing) sequence entries, part 39. 654. gbhtg4.seq - HTGS (high throughput genomic sequencing) sequence entries, part 4. 655. gbhtg40.seq - HTGS (high throughput genomic sequencing) sequence entries, part 40. 656. gbhtg41.seq - HTGS (high throughput genomic sequencing) sequence entries, part 41. 657. gbhtg42.seq - HTGS (high throughput genomic sequencing) sequence entries, part 42. 658. gbhtg43.seq - HTGS (high throughput genomic sequencing) sequence entries, part 43. 659. gbhtg44.seq - HTGS (high throughput genomic sequencing) sequence entries, part 44. 660. gbhtg45.seq - HTGS (high throughput genomic sequencing) sequence entries, part 45. 661. gbhtg46.seq - HTGS (high throughput genomic sequencing) sequence entries, part 46. 662. gbhtg47.seq - HTGS (high throughput genomic sequencing) sequence entries, part 47. 663. gbhtg48.seq - HTGS (high throughput genomic sequencing) sequence entries, part 48. 664. gbhtg49.seq - HTGS (high throughput genomic sequencing) sequence entries, part 49. 665. gbhtg5.seq - HTGS (high throughput genomic sequencing) sequence entries, part 5. 666. gbhtg50.seq - HTGS (high throughput genomic sequencing) sequence entries, part 50. 667. gbhtg51.seq - HTGS (high throughput genomic sequencing) sequence entries, part 51. 668. gbhtg52.seq - HTGS (high throughput genomic sequencing) sequence entries, part 52. 669. gbhtg53.seq - HTGS (high throughput genomic sequencing) sequence entries, part 53. 670. gbhtg54.seq - HTGS (high throughput genomic sequencing) sequence entries, part 54. 671. gbhtg55.seq - HTGS (high throughput genomic sequencing) sequence entries, part 55. 672. gbhtg56.seq - HTGS (high throughput genomic sequencing) sequence entries, part 56. 673. gbhtg57.seq - HTGS (high throughput genomic sequencing) sequence entries, part 57. 674. gbhtg58.seq - HTGS (high throughput genomic sequencing) sequence entries, part 58. 675. gbhtg59.seq - HTGS (high throughput genomic sequencing) sequence entries, part 59. 676. gbhtg6.seq - HTGS (high throughput genomic sequencing) sequence entries, part 6. 677. gbhtg60.seq - HTGS (high throughput genomic sequencing) sequence entries, part 60. 678. gbhtg61.seq - HTGS (high throughput genomic sequencing) sequence entries, part 61. 679. gbhtg62.seq - HTGS (high throughput genomic sequencing) sequence entries, part 62. 680. gbhtg63.seq - HTGS (high throughput genomic sequencing) sequence entries, part 63. 681. gbhtg64.seq - HTGS (high throughput genomic sequencing) sequence entries, part 64. 682. gbhtg65.seq - HTGS (high throughput genomic sequencing) sequence entries, part 65. 683. gbhtg66.seq - HTGS (high throughput genomic sequencing) sequence entries, part 66. 684. gbhtg67.seq - HTGS (high throughput genomic sequencing) sequence entries, part 67. 685. gbhtg68.seq - HTGS (high throughput genomic sequencing) sequence entries, part 68. 686. gbhtg7.seq - HTGS (high throughput genomic sequencing) sequence entries, part 7. 687. gbhtg8.seq - HTGS (high throughput genomic sequencing) sequence entries, part 8. 688. gbhtg9.seq - HTGS (high throughput genomic sequencing) sequence entries, part 9. 689. gbinv1.seq - Invertebrate sequence entries, part 1. 690. gbinv2.seq - Invertebrate sequence entries, part 2. 691. gbinv3.seq - Invertebrate sequence entries, part 3. 692. gbinv4.seq - Invertebrate sequence entries, part 4. 693. gbinv5.seq - Invertebrate sequence entries, part 5. 694. gbinv6.seq - Invertebrate sequence entries, part 6. 695. gbinv7.seq - Invertebrate sequence entries, part 7. 696. gbjou.idx - Index of the entries according to journal citation. 697. gbmam1.seq - Other mammalian sequence entries, part 1. 698. gbmam2.seq - Other mammalian sequence entries, part 2. 699. gbnew.txt - Accession numbers of entries new since the previous release. 700. gbpat1.seq - Patent sequence entries, part 1. 701. gbpat10.seq - Patent sequence entries, part 10. 702. gbpat11.seq - Patent sequence entries, part 11. 703. gbpat12.seq - Patent sequence entries, part 12. 704. gbpat13.seq - Patent sequence entries, part 13. 705. gbpat14.seq - Patent sequence entries, part 14. 706. gbpat15.seq - Patent sequence entries, part 15. 707. gbpat16.seq - Patent sequence entries, part 16. 708. gbpat17.seq - Patent sequence entries, part 17. 709. gbpat18.seq - Patent sequence entries, part 18. 710. gbpat2.seq - Patent sequence entries, part 2. 711. gbpat3.seq - Patent sequence entries, part 3. 712. gbpat4.seq - Patent sequence entries, part 4. 713. gbpat5.seq - Patent sequence entries, part 5. 714. gbpat6.seq - Patent sequence entries, part 6. 715. gbpat7.seq - Patent sequence entries, part 7. 716. gbpat8.seq - Patent sequence entries, part 8. 717. gbpat9.seq - Patent sequence entries, part 9. 718. gbphg.seq - Phage sequence entries. 719. gbpln1.seq - Plant sequence entries (including fungi and algae), part 1. 720. gbpln10.seq - Plant sequence entries (including fungi and algae), part 10. 721. gbpln11.seq - Plant sequence entries (including fungi and algae), part 11. 722. gbpln12.seq - Plant sequence entries (including fungi and algae), part 12. 723. gbpln13.seq - Plant sequence entries (including fungi and algae), part 13. 724. gbpln14.seq - Plant sequence entries (including fungi and algae), part 14. 725. gbpln15.seq - Plant sequence entries (including fungi and algae), part 15. 726. gbpln16.seq - Plant sequence entries (including fungi and algae), part 16. 727. gbpln2.seq - Plant sequence entries (including fungi and algae), part 2. 728. gbpln3.seq - Plant sequence entries (including fungi and algae), part 3. 729. gbpln4.seq - Plant sequence entries (including fungi and algae), part 4. 730. gbpln5.seq - Plant sequence entries (including fungi and algae), part 5. 731. gbpln6.seq - Plant sequence entries (including fungi and algae), part 6. 732. gbpln7.seq - Plant sequence entries (including fungi and algae), part 7. 733. gbpln8.seq - Plant sequence entries (including fungi and algae), part 8. 734. gbpln9.seq - Plant sequence entries (including fungi and algae), part 9. 735. gbpri1.seq - Primate sequence entries, part 1. 736. gbpri10.seq - Primate sequence entries, part 10. 737. gbpri11.seq - Primate sequence entries, part 11. 738. gbpri12.seq - Primate sequence entries, part 12. 739. gbpri13.seq - Primate sequence entries, part 13. 740. gbpri14.seq - Primate sequence entries, part 14. 741. gbpri15.seq - Primate sequence entries, part 15. 742. gbpri16.seq - Primate sequence entries, part 16. 743. gbpri17.seq - Primate sequence entries, part 17. 744. gbpri18.seq - Primate sequence entries, part 18. 745. gbpri19.seq - Primate sequence entries, part 19. 746. gbpri2.seq - Primate sequence entries, part 2. 747. gbpri20.seq - Primate sequence entries, part 20. 748. gbpri21.seq - Primate sequence entries, part 21. 749. gbpri22.seq - Primate sequence entries, part 22. 750. gbpri23.seq - Primate sequence entries, part 23. 751. gbpri24.seq - Primate sequence entries, part 24. 752. gbpri25.seq - Primate sequence entries, part 25. 753. gbpri26.seq - Primate sequence entries, part 26. 754. gbpri27.seq - Primate sequence entries, part 27. 755. gbpri28.seq - Primate sequence entries, part 28. 756. gbpri29.seq - Primate sequence entries, part 29. 757. gbpri3.seq - Primate sequence entries, part 3. 758. gbpri4.seq - Primate sequence entries, part 4. 759. gbpri5.seq - Primate sequence entries, part 5. 760. gbpri6.seq - Primate sequence entries, part 6. 761. gbpri7.seq - Primate sequence entries, part 7. 762. gbpri8.seq - Primate sequence entries, part 8. 763. gbpri9.seq - Primate sequence entries, part 9. 764. gbrel.txt - Release notes (this document). 765. gbrod1.seq - Rodent sequence entries, part 1. 766. gbrod10.seq - Rodent sequence entries, part 10. 767. gbrod11.seq - Rodent sequence entries, part 11. 768. gbrod12.seq - Rodent sequence entries, part 12. 769. gbrod13.seq - Rodent sequence entries, part 13. 770. gbrod14.seq - Rodent sequence entries, part 14. 771. gbrod15.seq - Rodent sequence entries, part 15. 772. gbrod16.seq - Rodent sequence entries, part 16. 773. gbrod17.seq - Rodent sequence entries, part 17. 774. gbrod18.seq - Rodent sequence entries, part 18. 775. gbrod19.seq - Rodent sequence entries, part 19. 776. gbrod2.seq - Rodent sequence entries, part 2. 777. gbrod20.seq - Rodent sequence entries, part 20. 778. gbrod3.seq - Rodent sequence entries, part 3. 779. gbrod4.seq - Rodent sequence entries, part 4. 780. gbrod5.seq - Rodent sequence entries, part 5. 781. gbrod6.seq - Rodent sequence entries, part 6. 782. gbrod7.seq - Rodent sequence entries, part 7. 783. gbrod8.seq - Rodent sequence entries, part 8. 784. gbrod9.seq - Rodent sequence entries, part 9. 785. gbsdr.txt - Short directory of the data bank. 786. gbsec.idx - Index of the entries according to secondary accession number. 787. gbsts1.seq - STS (sequence tagged site) sequence entries, part 1. 788. gbsts10.seq - STS (sequence tagged site) sequence entries, part 10. 789. gbsts11.seq - STS (sequence tagged site) sequence entries, part 11. 790. gbsts12.seq - STS (sequence tagged site) sequence entries, part 12. 791. gbsts13.seq - STS (sequence tagged site) sequence entries, part 13. 792. gbsts14.seq - STS (sequence tagged site) sequence entries, part 14. 793. gbsts2.seq - STS (sequence tagged site) sequence entries, part 2. 794. gbsts3.seq - STS (sequence tagged site) sequence entries, part 3. 795. gbsts4.seq - STS (sequence tagged site) sequence entries, part 4. 796. gbsts5.seq - STS (sequence tagged site) sequence entries, part 5. 797. gbsts6.seq - STS (sequence tagged site) sequence entries, part 6. 798. gbsts7.seq - STS (sequence tagged site) sequence entries, part 7. 799. gbsts8.seq - STS (sequence tagged site) sequence entries, part 8. 800. gbsts9.seq - STS (sequence tagged site) sequence entries, part 9. 801. gbsyn.seq - Synthetic and chimeric sequence entries. 802. gbuna.seq - Unannotated sequence entries. 803. gbvrl1.seq - Viral sequence entries, part 1. 804. gbvrl2.seq - Viral sequence entries, part 2. 805. gbvrl3.seq - Viral sequence entries, part 3. 806. gbvrl4.seq - Viral sequence entries, part 4. 807. gbvrl5.seq - Viral sequence entries, part 5. 808. gbvrt1.seq - Other vertebrate sequence entries, part 1. 809. gbvrt2.seq - Other vertebrate sequence entries, part 2. 810. gbvrt3.seq - Other vertebrate sequence entries, part 3. 811. gbvrt4.seq - Other vertebrate sequence entries, part 4. 812. gbvrt5.seq - Other vertebrate sequence entries, part 5. 813. gbvrt6.seq - Other vertebrate sequence entries, part 6. 814. gbvrt7.seq - Other vertebrate sequence entries, part 7. 815. gbvrt8.seq - Other vertebrate sequence entries, part 8. 816. gbvrt9.seq - Other vertebrate sequence entries, part 9. The gbcon.seq data file provides an alternative representation for complex sequences, such as segmented sets and complete-genomes that have been split into pieces. These "CON" records do not contain sequence data; they utilize a CONTIG linetype with a join() statement which describes how the component sequences can be assembled to form the larger sequence. The contents of the CON division are not reflected in the 'index', 'new', 'chg', and 'del' files that accompany GenBank releases, nor in release statistics (Sections 2.2.6, 2.2.7, and 2.2.8). The GenBank README describes the CON division of GenBank in more detail: ftp://ftp.ncbi.nih.gov/genbank/README.genbank 2.2.5 File Sizes Uncompressed, the Release 149.0 flatfiles require roughly 179 GB (sequence files only) or 195 GB (including the 'short directory', 'index' and the *.txt files). The following table contains the approximate sizes of the individual files in this release. Since minor changes to some of the files might have occurred after these release notes were written, these sizes should not be used to determine file integrity; they are provided as an aid to planning only. File Size File Name 1447983457 gbacc.idx 500613038 gbaut1.idx 512854608 gbaut10.idx 512814334 gbaut11.idx 500643770 gbaut12.idx 500023244 gbaut13.idx 502522192 gbaut14.idx 500371670 gbaut15.idx 501686283 gbaut16.idx 504030549 gbaut17.idx 540236765 gbaut18.idx 503209163 gbaut19.idx 509020875 gbaut2.idx 511749291 gbaut20.idx 503033675 gbaut21.idx 502990526 gbaut22.idx 519443346 gbaut23.idx 501292529 gbaut24.idx 502375153 gbaut25.idx 503181942 gbaut26.idx 500575922 gbaut27.idx 500708255 gbaut28.idx 514131475 gbaut29.idx 501506608 gbaut3.idx 501241517 gbaut30.idx 128949038 gbaut31.idx 500897443 gbaut4.idx 505416878 gbaut5.idx 517402667 gbaut6.idx 506948872 gbaut7.idx 500143531 gbaut8.idx 501445577 gbaut9.idx 250904878 gbbct1.seq 250003322 gbbct10.seq 199646399 gbbct11.seq 250200479 gbbct2.seq 250166514 gbbct3.seq 250000501 gbbct4.seq 252863805 gbbct5.seq 250000698 gbbct6.seq 250430152 gbbct7.seq 262535345 gbbct8.seq 258107798 gbbct9.seq 7313576 gbchg.txt 837534463 gbcon.seq 175139 gbdel.txt 250000698 gbenv1.seq 208465584 gbenv2.seq 230688425 gbest1.seq 230687597 gbest10.seq 230689431 gbest100.seq 230690088 gbest101.seq 230690607 gbest102.seq 230688508 gbest103.seq 230688706 gbest104.seq 230691285 gbest105.seq 230689341 gbest106.seq 230687853 gbest107.seq 230689166 gbest108.seq 230690242 gbest109.seq 230687816 gbest11.seq 230689405 gbest110.seq 230038277 gbest111.seq 230688011 gbest112.seq 230689781 gbest113.seq 230687553 gbest114.seq 230688637 gbest115.seq 230689180 gbest116.seq 230690357 gbest117.seq 230688459 gbest118.seq 230690404 gbest119.seq 230687685 gbest12.seq 230690980 gbest120.seq 230689361 gbest121.seq 230688118 gbest122.seq 230690592 gbest123.seq 230688807 gbest124.seq 230688699 gbest125.seq 229578878 gbest126.seq 230688031 gbest127.seq 230688176 gbest128.seq 230690909 gbest129.seq 230688069 gbest13.seq 230688698 gbest130.seq 230687841 gbest131.seq 230687460 gbest132.seq 230690356 gbest133.seq 230690185 gbest134.seq 230691130 gbest135.seq 230687926 gbest136.seq 230688100 gbest137.seq 230688622 gbest138.seq 230687858 gbest139.seq 230689004 gbest14.seq 230689410 gbest140.seq 230688972 gbest141.seq 230689580 gbest142.seq 230689035 gbest143.seq 230688346 gbest144.seq 230689115 gbest145.seq 230689479 gbest146.seq 230689744 gbest147.seq 230688095 gbest148.seq 230689412 gbest149.seq 230688790 gbest15.seq 230689427 gbest150.seq 230687731 gbest151.seq 230687873 gbest152.seq 230688866 gbest153.seq 230688285 gbest154.seq 230688017 gbest155.seq 230688436 gbest156.seq 230689555 gbest157.seq 230690116 gbest158.seq 230687867 gbest159.seq 230690286 gbest16.seq 230687688 gbest160.seq 230687522 gbest161.seq 230689477 gbest162.seq 230687633 gbest163.seq 230690466 gbest164.seq 230687827 gbest165.seq 230689151 gbest166.seq 230689188 gbest167.seq 230688217 gbest168.seq 230687871 gbest169.seq 230689264 gbest17.seq 230689583 gbest170.seq 230688299 gbest171.seq 230688881 gbest172.seq 230689569 gbest173.seq 230688576 gbest174.seq 230688967 gbest175.seq 230687593 gbest176.seq 230688761 gbest177.seq 230688865 gbest178.seq 227848744 gbest179.seq 230690149 gbest18.seq 230690343 gbest180.seq 230689884 gbest181.seq 230688141 gbest182.seq 230687883 gbest183.seq 230689263 gbest184.seq 230688335 gbest185.seq 230690076 gbest186.seq 230688654 gbest187.seq 230688088 gbest188.seq 230690167 gbest189.seq 230689659 gbest19.seq 230688179 gbest190.seq 230689860 gbest191.seq 230688331 gbest192.seq 230688431 gbest193.seq 230690542 gbest194.seq 230690361 gbest195.seq 230689948 gbest196.seq 230689443 gbest197.seq 230689673 gbest198.seq 230687740 gbest199.seq 230689482 gbest2.seq 230689537 gbest20.seq 230687600 gbest200.seq 230688032 gbest201.seq 230687539 gbest202.seq 230689504 gbest203.seq 230687641 gbest204.seq 230688464 gbest205.seq 230688250 gbest206.seq 230687962 gbest207.seq 230689626 gbest208.seq 230687506 gbest209.seq 230689303 gbest21.seq 230688191 gbest210.seq 230689703 gbest211.seq 230689751 gbest212.seq 230690508 gbest213.seq 230689714 gbest214.seq 230689197 gbest215.seq 230690172 gbest216.seq 230688792 gbest217.seq 230688894 gbest218.seq 225709343 gbest219.seq 230688100 gbest22.seq 163048051 gbest220.seq 162117314 gbest221.seq 168883587 gbest222.seq 168115990 gbest223.seq 167914583 gbest224.seq 165340119 gbest225.seq 164904270 gbest226.seq 164506162 gbest227.seq 164344405 gbest228.seq 164313641 gbest229.seq 230690317 gbest23.seq 164442025 gbest230.seq 163438216 gbest231.seq 167035062 gbest232.seq 164200632 gbest233.seq 162064704 gbest234.seq 164941687 gbest235.seq 168693038 gbest236.seq 166759995 gbest237.seq 165586124 gbest238.seq 165878491 gbest239.seq 230689028 gbest24.seq 165187409 gbest240.seq 164736380 gbest241.seq 164239725 gbest242.seq 164516702 gbest243.seq 165189851 gbest244.seq 173115083 gbest245.seq 177059916 gbest246.seq 172875640 gbest247.seq 230422257 gbest248.seq 230689563 gbest249.seq 230689106 gbest25.seq 230689477 gbest250.seq 230690167 gbest251.seq 230690230 gbest252.seq 230689421 gbest253.seq 230691486 gbest254.seq 230689851 gbest255.seq 230690356 gbest256.seq 230689150 gbest257.seq 230688627 gbest258.seq 230690206 gbest259.seq 230689379 gbest26.seq 230687600 gbest260.seq 230688263 gbest261.seq 230689293 gbest262.seq 230688933 gbest263.seq 230688693 gbest264.seq 230688927 gbest265.seq 230689864 gbest266.seq 230690447 gbest267.seq 230687805 gbest268.seq 230689774 gbest269.seq 230688361 gbest27.seq 230690015 gbest270.seq 230689754 gbest271.seq 230687807 gbest272.seq 230688397 gbest273.seq 230689701 gbest274.seq 230691142 gbest275.seq 230689578 gbest276.seq 213043079 gbest277.seq 230689419 gbest278.seq 230688018 gbest279.seq 230689525 gbest28.seq 230687860 gbest280.seq 230689012 gbest281.seq 230688451 gbest282.seq 230690433 gbest283.seq 230689145 gbest284.seq 230690339 gbest285.seq 230689141 gbest286.seq 230689019 gbest287.seq 230688837 gbest288.seq 230687752 gbest289.seq 230688677 gbest29.seq 230689813 gbest290.seq 230689916 gbest291.seq 230688295 gbest292.seq 230688440 gbest293.seq 230687531 gbest294.seq 230687481 gbest295.seq 230693073 gbest296.seq 230689800 gbest297.seq 230689399 gbest298.seq 230690426 gbest299.seq 230687569 gbest3.seq 230690414 gbest30.seq 230689857 gbest300.seq 230688982 gbest301.seq 230689744 gbest302.seq 230688674 gbest303.seq 212985883 gbest304.seq 230690276 gbest305.seq 230691988 gbest306.seq 230689945 gbest307.seq 230689291 gbest308.seq 230688343 gbest309.seq 230687782 gbest31.seq 230690498 gbest310.seq 153058802 gbest311.seq 167292541 gbest312.seq 230688152 gbest313.seq 230689854 gbest314.seq 230690198 gbest315.seq 230688257 gbest316.seq 230690606 gbest317.seq 230690356 gbest318.seq 230690910 gbest319.seq 230690273 gbest32.seq 230688776 gbest320.seq 230690152 gbest321.seq 230688023 gbest322.seq 230688407 gbest323.seq 230689647 gbest324.seq 230690516 gbest325.seq 230688626 gbest326.seq 230690135 gbest327.seq 230689600 gbest328.seq 230690003 gbest329.seq 230688642 gbest33.seq 230687752 gbest330.seq 230688623 gbest331.seq 230688592 gbest332.seq 230687627 gbest333.seq 230688563 gbest334.seq 230690110 gbest335.seq 230689613 gbest336.seq 230689156 gbest337.seq 230691729 gbest338.seq 230687491 gbest339.seq 230688067 gbest34.seq 230687521 gbest340.seq 230688330 gbest341.seq 230689156 gbest342.seq 230688630 gbest343.seq 230689621 gbest344.seq 230687499 gbest345.seq 230687970 gbest346.seq 230688007 gbest347.seq 230688481 gbest348.seq 230687615 gbest349.seq 230690075 gbest35.seq 230687871 gbest350.seq 230691451 gbest351.seq 230688333 gbest352.seq 230689282 gbest353.seq 230689713 gbest354.seq 223853357 gbest355.seq 230690103 gbest356.seq 230688533 gbest357.seq 230689814 gbest358.seq 230691209 gbest359.seq 230688742 gbest36.seq 230688790 gbest360.seq 230479386 gbest361.seq 230687466 gbest362.seq 230691939 gbest363.seq 230689064 gbest364.seq 230687857 gbest365.seq 230688846 gbest366.seq 230688190 gbest367.seq 230690437 gbest368.seq 230690271 gbest369.seq 230688756 gbest37.seq 230689096 gbest370.seq 230693369 gbest371.seq 230689983 gbest372.seq 230688453 gbest373.seq 230689506 gbest374.seq 230687924 gbest375.seq 230689036 gbest376.seq 230691685 gbest377.seq 230691163 gbest378.seq 230689935 gbest379.seq 230688606 gbest38.seq 230689750 gbest380.seq 230687599 gbest381.seq 230687984 gbest382.seq 230688546 gbest383.seq 230688181 gbest384.seq 230689504 gbest385.seq 230691744 gbest386.seq 230689401 gbest387.seq 230689236 gbest388.seq 230694065 gbest389.seq 224237215 gbest39.seq 230689345 gbest390.seq 230688744 gbest391.seq 230691756 gbest392.seq 230687822 gbest393.seq 230691168 gbest394.seq 230690607 gbest395.seq 230690223 gbest396.seq 230688406 gbest397.seq 230689537 gbest398.seq 230689679 gbest399.seq 230687828 gbest4.seq 192376921 gbest40.seq 230689688 gbest400.seq 230690286 gbest401.seq 230689343 gbest402.seq 230688927 gbest403.seq 230688328 gbest404.seq 230690623 gbest405.seq 230690720 gbest406.seq 230688570 gbest407.seq 230689512 gbest408.seq 230688730 gbest409.seq 191780848 gbest41.seq 230689595 gbest410.seq 217255265 gbest411.seq 230689613 gbest412.seq 165099141 gbest413.seq 214826010 gbest42.seq 216207178 gbest43.seq 215842026 gbest44.seq 216969187 gbest45.seq 230688216 gbest46.seq 230688030 gbest47.seq 226202383 gbest48.seq 230689055 gbest49.seq 164950269 gbest5.seq 230687756 gbest50.seq 230688728 gbest51.seq 230689672 gbest52.seq 230688253 gbest53.seq 230691005 gbest54.seq 230687969 gbest55.seq 230688333 gbest56.seq 230690776 gbest57.seq 230690553 gbest58.seq 230688372 gbest59.seq 177138923 gbest6.seq 230689460 gbest60.seq 230690188 gbest61.seq 230691214 gbest62.seq 230688880 gbest63.seq 230689138 gbest64.seq 209877969 gbest65.seq 209438885 gbest66.seq 208755262 gbest67.seq 209111908 gbest68.seq 209544758 gbest69.seq 230689868 gbest7.seq 210716530 gbest70.seq 209572983 gbest71.seq 208452746 gbest72.seq 210239815 gbest73.seq 209303316 gbest74.seq 205126453 gbest75.seq 208215758 gbest76.seq 207426498 gbest77.seq 210022451 gbest78.seq 222451254 gbest79.seq 230690240 gbest8.seq 230690513 gbest80.seq 230692454 gbest81.seq 224145001 gbest82.seq 215852406 gbest83.seq 213771000 gbest84.seq 227292664 gbest85.seq 230688823 gbest86.seq 230690359 gbest87.seq 230688275 gbest88.seq 230688026 gbest89.seq 230689340 gbest9.seq 230687766 gbest90.seq 230690136 gbest91.seq 230687990 gbest92.seq 230689512 gbest93.seq 230689142 gbest94.seq 230687888 gbest95.seq 230690179 gbest96.seq 230690708 gbest97.seq 230689653 gbest98.seq 230689496 gbest99.seq 48222731 gbgen.idx 230689539 gbgss1.seq 230687499 gbgss10.seq 227152373 gbgss100.seq 228508251 gbgss101.seq 228733056 gbgss102.seq 228616567 gbgss103.seq 227899677 gbgss104.seq 227421937 gbgss105.seq 230688530 gbgss106.seq 230688097 gbgss107.seq 230688308 gbgss108.seq 230690167 gbgss109.seq 230691316 gbgss11.seq 230688976 gbgss110.seq 230687567 gbgss111.seq 230689704 gbgss112.seq 230688743 gbgss113.seq 230687767 gbgss114.seq 230688844 gbgss115.seq 230689836 gbgss116.seq 230688014 gbgss117.seq 230688965 gbgss118.seq 230687876 gbgss119.seq 230688172 gbgss12.seq 230688703 gbgss120.seq 230688308 gbgss121.seq 230688898 gbgss122.seq 230688365 gbgss123.seq 113228653 gbgss124.seq 250001108 gbgss125.seq 250000539 gbgss126.seq 250002531 gbgss127.seq 250001647 gbgss128.seq 250003748 gbgss129.seq 230688583 gbgss13.seq 250001626 gbgss130.seq 250000398 gbgss131.seq 250000981 gbgss132.seq 250000560 gbgss133.seq 250000466 gbgss134.seq 250001745 gbgss135.seq 250000977 gbgss136.seq 250001470 gbgss137.seq 250002479 gbgss138.seq 250003175 gbgss139.seq 230689806 gbgss14.seq 250001657 gbgss140.seq 250004137 gbgss141.seq 250000539 gbgss142.seq 250000004 gbgss143.seq 250000816 gbgss144.seq 250002150 gbgss145.seq 250000950 gbgss146.seq 250000365 gbgss147.seq 250001441 gbgss148.seq 250000359 gbgss149.seq 230687747 gbgss15.seq 250002344 gbgss150.seq 93172196 gbgss151.seq 230688886 gbgss16.seq 230689013 gbgss17.seq 230690153 gbgss18.seq 230687966 gbgss19.seq 230689883 gbgss2.seq 230689352 gbgss20.seq 230688687 gbgss21.seq 230689885 gbgss22.seq 230689578 gbgss23.seq 230688192 gbgss24.seq 230689705 gbgss25.seq 230689250 gbgss26.seq 230687894 gbgss27.seq 230690186 gbgss28.seq 230689619 gbgss29.seq 230688931 gbgss3.seq 230687910 gbgss30.seq 230689592 gbgss31.seq 230689547 gbgss32.seq 230689976 gbgss33.seq 230689284 gbgss34.seq 230688552 gbgss35.seq 230689696 gbgss36.seq 230689247 gbgss37.seq 230688531 gbgss38.seq 230687465 gbgss39.seq 230689635 gbgss4.seq 230688188 gbgss40.seq 230687849 gbgss41.seq 230689966 gbgss42.seq 230688523 gbgss43.seq 230688935 gbgss44.seq 230690360 gbgss45.seq 230689807 gbgss46.seq 230690232 gbgss47.seq 230688899 gbgss48.seq 230690507 gbgss49.seq 230689060 gbgss5.seq 230688252 gbgss50.seq 230687651 gbgss51.seq 230690526 gbgss52.seq 230689363 gbgss53.seq 230689525 gbgss54.seq 230689516 gbgss55.seq 230688846 gbgss56.seq 230687892 gbgss57.seq 230690101 gbgss58.seq 230689534 gbgss59.seq 230690103 gbgss6.seq 230690641 gbgss60.seq 229393941 gbgss61.seq 230688508 gbgss62.seq 230687757 gbgss63.seq 230687737 gbgss64.seq 230689108 gbgss65.seq 230687463 gbgss66.seq 230688698 gbgss67.seq 230689392 gbgss68.seq 230688119 gbgss69.seq 230688253 gbgss7.seq 230689096 gbgss70.seq 230688451 gbgss71.seq 230689398 gbgss72.seq 230688133 gbgss73.seq 230687770 gbgss74.seq 230687476 gbgss75.seq 230687704 gbgss76.seq 230689609 gbgss77.seq 230689327 gbgss78.seq 230689218 gbgss79.seq 230688823 gbgss8.seq 230688320 gbgss80.seq 197964465 gbgss81.seq 194762549 gbgss82.seq 221685584 gbgss83.seq 230688320 gbgss84.seq 230689904 gbgss85.seq 230687591 gbgss86.seq 230689970 gbgss87.seq 230688660 gbgss88.seq 230689589 gbgss89.seq 230690336 gbgss9.seq 230689791 gbgss90.seq 230689802 gbgss91.seq 230687510 gbgss92.seq 230688176 gbgss93.seq 230687595 gbgss94.seq 230687975 gbgss95.seq 230689668 gbgss96.seq 230688138 gbgss97.seq 230688229 gbgss98.seq 230689837 gbgss99.seq 250005376 gbhtc1.seq 250000866 gbhtc2.seq 250003853 gbhtc3.seq 250002695 gbhtc4.seq 250003329 gbhtc5.seq 250001858 gbhtc6.seq 133348561 gbhtc7.seq 250063914 gbhtg1.seq 250019536 gbhtg10.seq 250024110 gbhtg11.seq 250012719 gbhtg12.seq 250041897 gbhtg13.seq 250299934 gbhtg14.seq 250314905 gbhtg15.seq 250267778 gbhtg16.seq 250045541 gbhtg17.seq 250138830 gbhtg18.seq 250002541 gbhtg19.seq 250005030 gbhtg2.seq 250016110 gbhtg20.seq 250273887 gbhtg21.seq 250283293 gbhtg22.seq 250086593 gbhtg23.seq 250041270 gbhtg24.seq 250096183 gbhtg25.seq 250148018 gbhtg26.seq 250186228 gbhtg27.seq 250168277 gbhtg28.seq 250158229 gbhtg29.seq 250071408 gbhtg3.seq 250014134 gbhtg30.seq 250312562 gbhtg31.seq 250080558 gbhtg32.seq 250151230 gbhtg33.seq 250036885 gbhtg34.seq 250223822 gbhtg35.seq 250241528 gbhtg36.seq 250156045 gbhtg37.seq 250096076 gbhtg38.seq 250201712 gbhtg39.seq 250077931 gbhtg4.seq 250141386 gbhtg40.seq 250056907 gbhtg41.seq 250021086 gbhtg42.seq 250102786 gbhtg43.seq 250352212 gbhtg44.seq 250216325 gbhtg45.seq 250035703 gbhtg46.seq 250156340 gbhtg47.seq 250143222 gbhtg48.seq 250050358 gbhtg49.seq 250084487 gbhtg5.seq 250003229 gbhtg50.seq 250617941 gbhtg51.seq 250125926 gbhtg52.seq 250052516 gbhtg53.seq 250231599 gbhtg54.seq 250163142 gbhtg55.seq 250022983 gbhtg56.seq 250105117 gbhtg57.seq 250292331 gbhtg58.seq 250083887 gbhtg59.seq 250211487 gbhtg6.seq 250056908 gbhtg60.seq 250043757 gbhtg61.seq 250006494 gbhtg62.seq 250391528 gbhtg63.seq 250260684 gbhtg64.seq 250141488 gbhtg65.seq 250022760 gbhtg66.seq 250084325 gbhtg67.seq 118259505 gbhtg68.seq 250150851 gbhtg7.seq 250119872 gbhtg8.seq 250038777 gbhtg9.seq 250048074 gbinv1.seq 250313705 gbinv2.seq 250007633 gbinv3.seq 250005261 gbinv4.seq 250018063 gbinv5.seq 250228201 gbinv6.seq 244683053 gbinv7.seq 1206882022 gbjou.idx 250000575 gbmam1.seq 94040508 gbmam2.seq 27343273 gbnew.txt 250000603 gbpat1.seq 250002509 gbpat10.seq 250001599 gbpat11.seq 250000866 gbpat12.seq 250000791 gbpat13.seq 250003365 gbpat14.seq 250001184 gbpat15.seq 250001623 gbpat16.seq 250000785 gbpat17.seq 133968349 gbpat18.seq 250000020 gbpat2.seq 250000343 gbpat3.seq 250010087 gbpat4.seq 250000523 gbpat5.seq 250000863 gbpat6.seq 250000787 gbpat7.seq 250005046 gbpat8.seq 250001321 gbpat9.seq 40991080 gbphg.seq 250127424 gbpln1.seq 250001065 gbpln10.seq 250004402 gbpln11.seq 250000216 gbpln12.seq 250002252 gbpln13.seq 250011138 gbpln14.seq 250000421 gbpln15.seq 203552675 gbpln16.seq 250058361 gbpln2.seq 250000536 gbpln3.seq 250001904 gbpln4.seq 250003459 gbpln5.seq 250037141 gbpln6.seq 250095747 gbpln7.seq 258565371 gbpln8.seq 250570165 gbpln9.seq 250082933 gbpri1.seq 250272893 gbpri10.seq 250244127 gbpri11.seq 250023879 gbpri12.seq 250086229 gbpri13.seq 250064540 gbpri14.seq 250103447 gbpri15.seq 250004989 gbpri16.seq 250062970 gbpri17.seq 250145477 gbpri18.seq 250149724 gbpri19.seq 250127198 gbpri2.seq 250098641 gbpri20.seq 250119728 gbpri21.seq 250027726 gbpri22.seq 250000893 gbpri23.seq 250151467 gbpri24.seq 250001206 gbpri25.seq 250005684 gbpri26.seq 250084717 gbpri27.seq 250000242 gbpri28.seq 40439311 gbpri29.seq 250043993 gbpri3.seq 250012952 gbpri4.seq 250151866 gbpri5.seq 250200111 gbpri6.seq 250062749 gbpri7.seq 250086546 gbpri8.seq 250167634 gbpri9.seq 196340 gbrel.txt 250021812 gbrod1.seq 250011379 gbrod10.seq 250062136 gbrod11.seq 250106566 gbrod12.seq 250164476 gbrod13.seq 250047578 gbrod14.seq 250105598 gbrod15.seq 250090172 gbrod16.seq 250001064 gbrod17.seq 250000803 gbrod18.seq 250066893 gbrod19.seq 250209318 gbrod2.seq 119688237 gbrod20.seq 250155610 gbrod3.seq 250164443 gbrod4.seq 250009999 gbrod5.seq 250186151 gbrod6.seq 250134519 gbrod7.seq 250083875 gbrod8.seq 250252884 gbrod9.seq 3755841797 gbsdr.txt 1563739 gbsec.idx 250002224 gbsts1.seq 250000488 gbsts10.seq 250004141 gbsts11.seq 250002207 gbsts12.seq 250001763 gbsts13.seq 14123992 gbsts14.seq 250000838 gbsts2.seq 250000517 gbsts3.seq 250000048 gbsts4.seq 250001168 gbsts5.seq 250000738 gbsts6.seq 250002573 gbsts7.seq 250001086 gbsts8.seq 250002868 gbsts9.seq 102451677 gbsyn.seq 416265 gbuna.seq 250002795 gbvrl1.seq 250001686 gbvrl2.seq 250000995 gbvrl3.seq 250002744 gbvrl4.seq 60232698 gbvrl5.seq 250000875 gbvrt1.seq 250237833 gbvrt2.seq 250000215 gbvrt3.seq 250053723 gbvrt4.seq 250024974 gbvrt5.seq 250144183 gbvrt6.seq 250000802 gbvrt7.seq 250043627 gbvrt8.seq 194018932 gbvrt9.seq 2.2.6 Per-Division Statistics The following table provides a per-division breakdown of the number of sequence entries and the total number of bases of DNA/RNA in each non-WGS sequence data file: Division Entries Bases BCT1 24178 102385489 BCT10 24732 99733479 BCT11 25846 72882800 BCT2 7477 107448276 BCT3 297 114527223 BCT4 28078 103290660 BCT5 29125 98262105 BCT6 64424 85233641 BCT7 6321 114735921 BCT8 664 98640047 BCT9 2194 115416062 ENV1 96941 71366664 ENV2 77937 66628936 EST1 68034 26258363 EST10 76493 29723216 EST100 73824 44049389 EST101 71798 43057367 EST102 70073 44735908 EST103 74383 36698498 EST104 69852 31133723 EST105 65196 31111523 EST106 66871 40104586 EST107 70422 36527708 EST108 69801 45534787 EST109 75910 34468867 EST11 74698 28608637 EST110 74488 27738339 EST111 75675 25952974 EST112 73420 40107286 EST113 63837 31083361 EST114 78570 47302196 EST115 79182 44966931 EST116 72496 45237784 EST117 68421 43364912 EST118 73106 47053943 EST119 67665 44923009 EST12 77211 30612340 EST120 73830 43811396 EST121 75342 44907957 EST122 73000 48061879 EST123 75391 48431121 EST124 72962 46772737 EST125 78255 37666394 EST126 78089 26440409 EST127 80296 48717674 EST128 74799 43674319 EST129 66653 37413205 EST13 76235 29134866 EST130 70869 35181357 EST131 67170 37819020 EST132 66640 36214297 EST133 69543 42578969 EST134 72971 41824812 EST135 68583 43021823 EST136 69196 41625596 EST137 73928 42504761 EST138 68500 41647351 EST139 58516 33402747 EST14 77983 31413784 EST140 96311 52226110 EST141 88941 51170012 EST142 78986 40667090 EST143 107359 58031849 EST144 108294 57938594 EST145 98546 54331141 EST146 73480 45116763 EST147 94382 55478877 EST148 99608 59590670 EST149 90515 51954405 EST15 74272 31341755 EST150 72154 43593260 EST151 68336 33788088 EST152 64120 29275779 EST153 54776 24940338 EST154 64461 31275788 EST155 61640 31276270 EST156 66627 35767588 EST157 72583 51625277 EST158 64936 44365065 EST159 82703 48868456 EST16 75803 33403926 EST160 58287 29715231 EST161 62402 31767812 EST162 67093 40383354 EST163 65263 37445491 EST164 63096 42549470 EST165 88313 44511613 EST166 91978 45616284 EST167 96762 57348837 EST168 102616 59531905 EST169 95617 53123833 EST17 80848 33051568 EST170 94510 40936127 EST171 92439 45441241 EST172 93315 43575347 EST173 90270 37114505 EST174 85641 41299562 EST175 65195 44632407 EST176 70698 38417501 EST177 57987 37457677 EST178 70100 38595514 EST179 73980 26349464 EST18 81510 32879782 EST180 74071 47067228 EST181 76814 38733829 EST182 66987 33263412 EST183 69106 40880062 EST184 69450 54419961 EST185 67604 36830218 EST186 68895 34681246 EST187 70102 57138611 EST188 67883 41914996 EST189 72328 36311227 EST19 78262 31520557 EST190 69757 52026485 EST191 69604 55699263 EST192 64621 47731024 EST193 64481 47103364 EST194 65110 47645777 EST195 64926 45573255 EST196 64145 48298596 EST197 64501 41127279 EST198 63717 34665249 EST199 63415 36681500 EST2 74718 28667916 EST20 74204 30887609 EST200 76928 41948302 EST201 88270 54794572 EST202 69369 44549117 EST203 97606 59861190 EST204 107335 66515004 EST205 109001 64122456 EST206 106255 65470202 EST207 103390 66486829 EST208 110756 50996925 EST209 90969 54906334 EST21 74911 34225358 EST210 109888 49020731 EST211 91917 52711377 EST212 67881 37173118 EST213 69906 52415454 EST214 68675 58767931 EST215 76132 48360786 EST216 80986 40180554 EST217 76420 51905039 EST218 71480 50881710 EST219 62411 36740306 EST22 73177 29510851 EST220 27793 10457982 EST221 27863 10451790 EST222 26737 9811173 EST223 26892 9455364 EST224 26972 9223617 EST225 27225 10198511 EST226 27181 9688414 EST227 27227 10157218 EST228 27213 11539959 EST229 27189 11441373 EST23 77453 32919525 EST230 27236 10552452 EST231 27538 9538875 EST232 26999 8860128 EST233 27550 11198465 EST234 27880 10750305 EST235 27440 11063389 EST236 26629 11740960 EST237 26869 11778906 EST238 27118 10575470 EST239 26961 11486940 EST24 74075 31991301 EST240 27097 11758417 EST241 27239 11301811 EST242 27270 10779993 EST243 27236 10765352 EST244 27172 10545170 EST245 25749 15488933 EST246 25051 16641646 EST247 28671 18091177 EST248 101145 34542629 EST249 80668 41232413 EST25 75075 32928853 EST250 68808 44316226 EST251 69183 44957002 EST252 69099 44312476 EST253 64471 40028417 EST254 80494 44600995 EST255 62166 43168916 EST256 68625 42422626 EST257 61613 30941824 EST258 102723 48009091 EST259 84086 44535073 EST26 73678 29979037 EST260 72188 38281534 EST261 65216 31383595 EST262 67197 40330310 EST263 72141 34903678 EST264 71003 33500930 EST265 75171 46187587 EST266 64172 36067021 EST267 67876 39318249 EST268 76779 37711417 EST269 82139 45491356 EST27 75034 32413096 EST270 73970 49243351 EST271 78099 53679016 EST272 101033 49725627 EST273 93634 45156776 EST274 76909 43836067 EST275 70342 46344210 EST276 75648 33520853 EST277 69912 30485034 EST278 72177 37621364 EST279 64245 37628657 EST28 106370 50117310 EST280 60125 38717914 EST281 73655 43147910 EST282 62448 43290519 EST283 61400 33791025 EST284 84476 44106316 EST285 83632 40778711 EST286 71967 46503943 EST287 104006 58580132 EST288 121550 47797797 EST289 83266 38744849 EST29 97903 45867154 EST290 65894 35397962 EST291 71631 45887358 EST292 80879 41111842 EST293 71229 42051174 EST294 70563 40686705 EST295 63114 35348347 EST296 61305 37305320 EST297 54171 30868846 EST298 72005 51794159 EST299 73676 43971191 EST3 73590 29860872 EST30 99107 53271438 EST300 71149 41650632 EST301 69143 36211301 EST302 77724 48089524 EST303 94110 61600526 EST304 71144 40917019 EST305 77967 43665829 EST306 64156 55609759 EST307 64244 46665835 EST308 65231 41968418 EST309 62038 35147744 EST31 75765 39042340 EST310 73114 50974876 EST311 49968 26991508 EST312 52959 26284652 EST313 71859 39120346 EST314 60343 32458681 EST315 62212 31548596 EST316 64864 41760286 EST317 85764 43581198 EST318 71791 42391644 EST319 67949 43629244 EST32 66727 61911354 EST320 66252 37697379 EST321 67696 44255276 EST322 88762 49992996 EST323 84867 50644508 EST324 75469 41681079 EST325 63213 36591335 EST326 67843 34930725 EST327 69311 38636066 EST328 47238 24161605 EST329 68791 39800853 EST33 80323 48570533 EST330 74554 42465092 EST331 98572 46385232 EST332 72653 50118178 EST333 74589 47944011 EST334 70738 43650584 EST335 61204 34114280 EST336 88433 38243011 EST337 68433 40720794 EST338 51682 36585688 EST339 63159 37139945 EST34 91639 44852446 EST340 68989 44731836 EST341 100942 55384274 EST342 72388 37226232 EST343 80438 44419304 EST344 61013 37843415 EST345 42469 22577388 EST346 57000 34279942 EST347 85207 46155861 EST348 77522 46219157 EST349 72637 44964250 EST35 85023 46131514 EST350 72941 39319921 EST351 61362 39810035 EST352 72996 40263199 EST353 57392 35339695 EST354 75958 31429946 EST355 80049 26432341 EST356 70047 41447550 EST357 66539 36989028 EST358 69924 38836351 EST359 81270 48917671 EST36 99045 51467534 EST360 63342 43601708 EST361 74188 43907342 EST362 60684 34253635 EST363 67564 41098021 EST364 79934 51321725 EST365 82790 48715038 EST366 69327 41343734 EST367 59416 42013020 EST368 57540 41432257 EST369 57330 41014166 EST37 97977 49109135 EST370 66849 40561438 EST371 68445 41342564 EST372 58969 37142827 EST373 58301 38889661 EST374 59277 40737662 EST375 60567 41736246 EST376 56043 42106401 EST377 46205 33080514 EST378 83310 38976556 EST379 52349 40858478 EST38 100289 48671542 EST380 67511 45522344 EST381 66275 37751607 EST382 69162 40531867 EST383 121392 60834961 EST384 74872 46669728 EST385 79278 57895929 EST386 39326 21790726 EST387 50800 55023208 EST388 46852 58191065 EST389 67291 46518190 EST39 92829 35930195 EST390 71712 42112449 EST391 69976 44346343 EST392 66817 43893566 EST393 66858 42853831 EST394 58801 49362214 EST395 60483 39730897 EST396 56137 40840614 EST397 55766 37974688 EST398 66450 47739547 EST399 67367 47534339 EST4 73953 28155396 EST40 68625 18213698 EST400 60772 44978818 EST401 52124 37716642 EST402 52693 38639486 EST403 59742 42606740 EST404 61134 36465591 EST405 68030 44290464 EST406 71100 30863255 EST407 69738 25503185 EST408 73127 27355610 EST409 73953 26008459 EST41 68571 18332216 EST410 77446 27363749 EST411 71114 24840435 EST412 70857 29369982 EST413 55018 20110904 EST42 63818 19559428 EST43 43335 11801739 EST44 43053 11912554 EST45 42854 11374871 EST46 74517 29590770 EST47 96513 44266651 EST48 92531 46180172 EST49 89529 42636250 EST5 48552 15462195 EST50 102012 51711431 EST51 101988 52118113 EST52 75825 32673831 EST53 66286 28740252 EST54 73246 32539213 EST55 70612 29761701 EST56 80145 32673221 EST57 75426 29674971 EST58 70847 28606339 EST59 64970 29785511 EST6 55001 17440741 EST60 73504 32964443 EST61 78668 35010515 EST62 73533 29364583 EST63 75184 25456996 EST64 87102 42165342 EST65 41338 11596994 EST66 40036 11015959 EST67 40050 12274199 EST68 40384 12475751 EST69 40447 12063051 EST7 74090 29152490 EST70 40341 13083220 EST71 40307 12594129 EST72 40316 12297306 EST73 39953 11871555 EST74 40661 12779122 EST75 41341 11642291 EST76 41024 13175949 EST77 40897 12680974 EST78 41474 13259494 EST79 45327 12970218 EST8 75313 30469369 EST80 39759 25208291 EST81 42040 21413185 EST82 46384 18746216 EST83 50008 22810516 EST84 51384 21339916 EST85 58216 23996625 EST86 75973 31735272 EST87 75430 28517845 EST88 72942 36011126 EST89 77071 44840196 EST9 77448 29829010 EST90 77267 42899903 EST91 76078 36758282 EST92 76024 43097547 EST93 72725 38088939 EST94 73141 29999448 EST95 74774 46931078 EST96 70192 30012833 EST97 77232 49090218 EST98 64797 35332837 EST99 71811 36873360 GSS1 90825 38838374 GSS10 75205 43833923 GSS100 75306 40823369 GSS101 73732 44781763 GSS102 73471 45439590 GSS103 73601 45109294 GSS104 73207 45084917 GSS105 73612 43942445 GSS106 79969 52598358 GSS107 85771 58476444 GSS108 81501 53730881 GSS109 84481 52656059 GSS11 69998 35693198 GSS110 84194 49571606 GSS111 81202 66229335 GSS112 88144 46327712 GSS113 92496 55610317 GSS114 74642 48789051 GSS115 87689 62378256 GSS116 82530 61393627 GSS117 78321 44275723 GSS118 86221 51035778 GSS119 92119 54251476 GSS12 73548 38676139 GSS120 85323 54951919 GSS121 84677 56154774 GSS122 77201 64532837 GSS123 76624 68996277 GSS124 38201 32283960 GSS125 87189 64730609 GSS126 83935 63019542 GSS127 101312 46889474 GSS128 68991 58743647 GSS129 68487 58900293 GSS13 76489 38776094 GSS130 69543 56768164 GSS131 69807 56256304 GSS132 70565 55587938 GSS133 86110 73189295 GSS134 85143 41078693 GSS135 70890 47208187 GSS136 110589 72587266 GSS137 85505 35974685 GSS138 90317 68090200 GSS139 72983 60685566 GSS14 71399 32053200 GSS140 70268 59091050 GSS141 66045 63246089 GSS142 76546 56958716 GSS143 120678 73151584 GSS144 118120 75833491 GSS145 104470 58807190 GSS146 87785 55611458 GSS147 93584 57308817 GSS148 84855 45789306 GSS149 89457 61585540 GSS15 70902 35386698 GSS150 107826 53936602 GSS151 36002 16069131 GSS16 77588 45718653 GSS17 70917 33380301 GSS18 58098 27809721 GSS19 56846 28928903 GSS2 89386 39537205 GSS20 57739 26520778 GSS21 61874 30956779 GSS22 64594 36359309 GSS23 57554 27013298 GSS24 67323 42717207 GSS25 67065 27611124 GSS26 58099 25563057 GSS27 66921 32502872 GSS28 63807 32259711 GSS29 79745 40297235 GSS3 87699 41940144 GSS30 81858 39884690 GSS31 74231 40338257 GSS32 70483 48496351 GSS33 80018 37638203 GSS34 75905 40292108 GSS35 74680 39228957 GSS36 87602 58164580 GSS37 87588 58187598 GSS38 85666 44866144 GSS39 85576 49293006 GSS4 79303 41207047 GSS40 87563 38537051 GSS41 82644 34586485 GSS42 81262 56910251 GSS43 79484 57507202 GSS44 72038 47481719 GSS45 72039 47425921 GSS46 77903 45328444 GSS47 78067 39654289 GSS48 83666 58381510 GSS49 86562 66336101 GSS5 79041 40731282 GSS50 82870 53037723 GSS51 93465 59769305 GSS52 88505 58833581 GSS53 75140 39946971 GSS54 74721 43056824 GSS55 86001 45533102 GSS56 88387 58835734 GSS57 76126 67457297 GSS58 72027 79237914 GSS59 88526 68371381 GSS6 78255 38895462 GSS60 86528 57613619 GSS61 63554 45301219 GSS62 69612 47963729 GSS63 89663 67257209 GSS64 86239 57970521 GSS65 87443 53960643 GSS66 87639 56716628 GSS67 97285 57999342 GSS68 100603 54541239 GSS69 100689 54430819 GSS7 77539 39418357 GSS70 101342 53604611 GSS71 102376 52291710 GSS72 102372 52297240 GSS73 102698 51883085 GSS74 102115 52621929 GSS75 98275 57526639 GSS76 90158 69774444 GSS77 89834 70780761 GSS78 88016 69598605 GSS79 87824 69678345 GSS8 76330 38105833 GSS80 88777 62604882 GSS81 80874 23716207 GSS82 77788 24527825 GSS83 88355 36460139 GSS84 84331 50966209 GSS85 80712 49105320 GSS86 89514 64699622 GSS87 78425 62241475 GSS88 78225 79463030 GSS89 76581 56011726 GSS9 72576 37287728 GSS90 94927 50984059 GSS91 75757 40496969 GSS92 85431 50782635 GSS93 70486 60652216 GSS94 85643 52902404 GSS95 82610 61003156 GSS96 87349 57520670 GSS97 83304 57006537 GSS98 89868 55793604 GSS99 80412 66365103 HTC1 32151 55611690 HTC2 31639 66256003 HTC3 75274 40529673 HTC4 78751 70885966 HTC5 65519 62777201 HTC6 67755 70959762 HTC7 29110 54604072 HTG1 1315 188873586 HTG10 1239 186540769 HTG11 1427 184142515 HTG12 880 191849995 HTG13 751 192251270 HTG14 741 192399127 HTG15 776 192381500 HTG16 803 192172812 HTG17 764 192209605 HTG18 2009 171976103 HTG19 1030 188145586 HTG2 2562 185987793 HTG20 1070 188102921 HTG21 778 192106674 HTG22 925 190500494 HTG23 880 190905963 HTG24 830 191140525 HTG25 772 192114442 HTG26 843 191432407 HTG27 851 191581690 HTG28 964 189868196 HTG29 919 190713006 HTG3 2450 185370355 HTG30 943 190424142 HTG31 867 191521230 HTG32 957 189832466 HTG33 898 190949533 HTG34 845 191596013 HTG35 833 191899949 HTG36 864 191353870 HTG37 929 190544493 HTG38 968 190396243 HTG39 932 190741139 HTG4 2527 188428356 HTG40 1115 188374887 HTG41 1112 188261954 HTG42 1380 186505114 HTG43 1139 188941560 HTG44 1131 188742320 HTG45 1144 191560408 HTG46 1275 190671410 HTG47 1181 191521698 HTG48 1167 191358793 HTG49 1075 191138880 HTG5 1281 185666662 HTG50 937 189485804 HTG51 997 190703476 HTG52 1026 189987121 HTG53 989 189943966 HTG54 1038 190573242 HTG55 945 190180674 HTG56 1021 190606797 HTG57 968 189447378 HTG58 956 189876028 HTG59 1267 188754233 HTG6 1274 185376276 HTG60 1715 185821991 HTG61 1184 192127291 HTG62 1172 190720629 HTG63 1487 189008132 HTG64 1040 194127227 HTG65 1030 194041439 HTG66 1039 193769647 HTG67 831 182248760 HTG68 472 83647771 HTG7 1247 185590985 HTG8 1288 185020375 HTG9 1183 187103109 INV1 16967 162393862 INV2 1581 165923317 INV3 36312 113475369 INV4 74536 75532490 INV5 78385 73625712 INV6 43999 103961193 INV7 46323 99864729 MAM1 54570 112505471 MAM2 22989 36403099 PAT1 222684 70173292 PAT10 120554 49027493 PAT11 95749 59861177 PAT12 140683 55749253 PAT13 145625 60098102 PAT14 140442 92611565 PAT15 115549 113135050 PAT16 109093 113203901 PAT17 133099 89625935 PAT18 101340 32714762 PAT2 194376 84713809 PAT3 172017 95883630 PAT4 149101 105539595 PAT5 142101 87656121 PAT6 113035 112366763 PAT7 142044 93702514 PAT8 132580 95531649 PAT9 135150 83335936 PHG 2671 16358873 PLN1 34379 122796967 PLN10 3442 173930019 PLN11 60873 63498301 PLN12 77379 78475276 PLN13 72843 77438163 PLN14 27465 125853960 PLN15 45143 113259657 PLN16 38333 79422393 PLN2 1373 176711614 PLN3 10555 155914143 PLN4 77563 77933470 PLN5 66824 65852757 PLN6 32157 57531572 PLN7 1276 164417819 PLN8 1407 182032896 PLN9 6 197759100 PRI1 21107 141783161 PRI10 1438 181584991 PRI11 1297 179103986 PRI12 1540 178279456 PRI13 1604 179607466 PRI14 1483 186451691 PRI15 20624 157857392 PRI16 43770 101131296 PRI17 18746 132623458 PRI18 1608 183755654 PRI19 1721 183432183 PRI2 1461 173265669 PRI20 2086 181633278 PRI21 1993 183846481 PRI22 22439 146214995 PRI23 49662 81060983 PRI24 25590 103496042 PRI25 9551 171193402 PRI26 20712 154881998 PRI27 51453 116005781 PRI28 48816 112932240 PRI29 10661 15212562 PRI3 1278 186100680 PRI4 1330 184316729 PRI5 1193 181067963 PRI6 1204 179442596 PRI7 1233 180072893 PRI8 1370 174608747 PRI9 1232 177946047 ROD1 8865 170006573 ROD10 1004 183299455 ROD11 964 183756659 ROD12 1040 187391858 ROD13 970 181926136 ROD14 27773 143100191 ROD15 1159 190071161 ROD16 1212 193286217 ROD17 4088 189208228 ROD18 42985 72767659 ROD19 24552 131618989 ROD2 925 174159213 ROD20 31558 41816586 ROD3 902 174453768 ROD4 908 174104148 ROD5 930 175114001 ROD6 980 180477162 ROD7 979 180572821 ROD8 991 181593783 ROD9 1012 183609098 STS1 82071 35434098 STS10 58209 44439069 STS11 57991 43708087 STS12 77107 37977210 STS13 88391 38933928 STS14 6303 3090630 STS2 83575 43817876 STS3 78138 31777979 STS4 64795 33318894 STS5 54886 31928904 STS6 54705 32286995 STS7 54689 32212962 STS8 56491 39089600 STS9 58083 44556609 SYN 23702 32276700 UNA 213 114659 VRL1 72431 65643269 VRL2 72034 65678979 VRL3 71763 68845127 VRL4 75546 63646019 VRL5 14009 20511323 VRT1 55730 108726857 VRT2 18203 163184657 VRT3 71269 83953381 VRT4 38173 76225145 VRT5 1187 193516089 VRT6 1283 193176113 VRT7 8196 182306744 VRT8 16951 169328344 VRT9 32905 100281003 2.2.7 Selected Per-Organism Statistics The following table provides the number of entries and bases of DNA/RNA for the twenty most sequenced organisms in Release 149.0 (chloroplast and mitochon- drial sequences not included, and Whole Genome Shotgun sequences not included) : Entries Bases Species 8589480 11201918270 Homo sapiens 6217605 7490526361 Mus musculus 1015248 5668751953 Rattus norvegicus 840452 2101785993 Danio rerio 1129487 1947985221 Bos taurus 2565244 1703766839 Zea mays 354924 1166526669 Oryza sativa (japonica cultivar-group) 1102903 891062906 Xenopus tropicalis 1407513 805167252 Canis familiaris 530740 782116888 Drosophila melanogaster 752090 650155377 Gallus gallus 979323 647137057 Arabidopsis thaliana 207699 527756706 Pan troglodytes 784574 462578028 Sorghum bicolor 627530 423993557 Sus scrofa 693814 419652405 Ciona intestinalis 596261 404130458 Brassica oleracea 387807 403249617 Medicago truncatula 65232 383498886 Macaca mulatta 609078 346620305 Triticum aestivum 2.2.8 Growth of GenBank The following table lists the number of bases and the number of sequence records in each release of GenBank, beginning with Release 3 in 1982. From 1982 to the present, the number of bases in GenBank has doubled approximately every 18 months. Release Date Base Pairs Entries 3 Dec 1982 680338 606 14 Nov 1983 2274029 2427 20 May 1984 3002088 3665 24 Sep 1984 3323270 4135 25 Oct 1984 3368765 4175 26 Nov 1984 3689752 4393 32 May 1985 4211931 4954 36 Sep 1985 5204420 5700 40 Feb 1986 5925429 6642 42 May 1986 6765476 7416 44 Aug 1986 8442357 8823 46 Nov 1986 9615371 9978 48 Feb 1987 10961380 10913 50 May 1987 13048473 12534 52 Aug 1987 14855145 14020 53 Sep 1987 15514776 14584 54 Dec 1987 16752872 15465 55 Mar 1988 19156002 17047 56 Jun 1988 20795279 18226 57 Sep 1988 22019698 19044 57.1 Oct 1988 23800000 20579 58 Dec 1988 24690876 21248 59 Mar 1989 26382491 22479 60 Jun 1989 31808784 26317 61 Sep 1989 34762585 28791 62 Dec 1989 37183950 31229 63 Mar 1990 40127752 33377 64 Jun 1990 42495893 35100 65 Sep 1990 49179285 39533 66 Dec 1990 51306092 41057 67 Mar 1991 55169276 43903 68 Jun 1991 65868799 51418 69 Sep 1991 71947426 55627 70 Dec 1991 77337678 58952 71 Mar 1992 83894652 65100 72 Jun 1992 92160761 71280 73 Sep 1992 101008486 78608 74 Dec 1992 120242234 97084 75 Feb 1993 126212259 106684 76 Apr 1993 129968355 111911 77 Jun 1993 138904393 120134 78 Aug 1993 147215633 131328 79 Oct 1993 157152442 143492 80 Dec 1993 163802597 150744 81 Feb 1994 173261500 162946 82 Apr 1994 180589455 169896 83 Jun 1994 191393939 182753 84 Aug 1994 201815802 196703 85 Oct 1994 217102462 215273 86 Dec 1994 230485928 237775 87 Feb 1995 248499214 269478 88 Apr 1995 286094556 352414 89 Jun 1995 318624568 425211 90 Aug 1995 353713490 492483 91 Oct 1995 384939485 555694 92 Dec 1995 425860958 620765 93 Feb 1996 463758833 685693 94 Apr 1996 499127741 744295 95 Jun 1996 551750920 835487 96 Aug 1996 602072354 920588 97 Oct 1996 651972984 1021211 98 Dec 1996 730552938 1114581 99 Feb 1997 786898138 1192505 100 Apr 1997 842864309 1274747 101 Jun 1997 966993087 1491069 102 Aug 1997 1053474516 1610848 103 Oct 1997 1160300687 1765847 104 Dec 1997 1258290513 1891953 105 Feb 1998 1372368913 2042325 106 Apr 1998 1502542306 2209232 107 Jun 1998 1622041465 2355928 108 Aug 1998 1797137713 2532359 109 Oct 1998 2008761784 2837897 110 Dec 1998 2162067871 3043729 111 Apr 1999 2569578208 3525418 112 Jun 1999 2974791993 4028171 113 Aug 1999 3400237391 4610118 114 Oct 1999 3841163011 4864570 115 Dec 1999 4653932745 5354511 116 Feb 2000 5805414935 5691170 117 Apr 2000 7376080723 6215002 118 Jun 2000 8604221980 7077491 119 Aug 2000 9545724824 8214339 120 Oct 2000 10335692655 9102634 121 Dec 2000 11101066288 10106023 122 Feb 2001 11720120326 10896781 123 Apr 2001 12418544023 11545572 124 Jun 2001 12973707065 12243766 125 Aug 2001 13543364296 12813516 126 Oct 2001 14396883064 13602262 127 Dec 2001 15849921438 14976310 128 Feb 2002 17089143893 15465325 129 Apr 2002 19072679701 16769983 130 Jun 2002 20648748345 17471130 131 Aug 2002 22616937182 18197119 132 Oct 2002 26525934656 19808101 133 Dec 2002 28507990166 22318883 134 Feb 2003 29358082791 23035823 135 Apr 2003 31099264455 24027936 136 Jun 2003 32528249295 25592865 137 Aug 2003 33865022251 27213748 138 Oct 2003 35599621471 29819397 139 Dec 2003 36553368485 30968418 140 Feb 2004 37893844733 32549400 141 Apr 2004 38989342565 33676218 142 Jun 2004 40325321348 35532003 143 Aug 2004 41808045653 37343937 144 Oct 2004 43194602655 38941263 145 Dec 2004 44575745176 40604319 146 Feb 2005 46849831226 42734478 147 Apr 2005 48235738567 44202133 148 Jun 2005 49398852122 45236251 149 Aug 2005 51674486881 46947388 The following table lists the number of bases and the number of sequence records for WGS sequences processed at GenBank, beginning with Release 129.0 in April of 2002. Please note that WGS data are not distributed in conjunction with GenBank releases. Rather, per-project data files are continuously available in the WGS areas of the NCBI FTP site: ftp://ftp.ncbi.nih.gov/ncbi-asn1/wgs ftp://ftp.ncbi.nih.gov/genbank/wgs Release Date Base Pairs Entries 129 Apr 2002 692266338 172768 130 Jun 2002 3267608441 397502 131 Aug 2002 3848375582 427771 132 Oct 2002 3892435593 434224 133 Dec 2002 6702372564 597042 134 Feb 2003 6705740844 597345 135 Apr 2003 6897080355 596818 136 Jun 2003 6992663962 607155 137 Aug 2003 7144761762 593801 138 Oct 2003 8662242833 1683437 139 Dec 2003 14523454868 2547094 140 Feb 2004 22804145885 3188754 141 Apr 2004 24758556215 4112532 142 Jun 2004 25592758366 4353890 143 Aug 2004 28128611847 4427773 144 Oct 2004 30871590379 5285276 145 Dec 2004 35009256228 5410415 146 Feb 2005 38076300084 6111782 147 Apr 2005 39523208346 6685746 148 Jun 2005 46767232565 8711924 149 Aug 2005 53346605784 10276161 3. FILE FORMATS The flat file examples included in this section, while not always from the current release, are usually fairly recent. Any differences compared to the actual records are the result of updates to the entries involved. 3.1 File Header Information With the exception of the index files, gbcon.seq, and the lists of new, changed, and deleted accession numbers, each of the files of a GenBank release begins with the same header, except for the first line, which contains the file name, and the sixth line, which contains the title of the file. The first line of the file contains the file name in character positions 1 to 9 and the full database name (Genetic Sequence Data Bank, aka 'GenBank') starting in column 22. The brief names of the files in this release are listed in section 2.2. The second line contains the date of the current release in the form `day month year', beginning in position 27. The fourth line contains the current GenBank release number. The release number appears in positions 48 to 52 and consists of three numbers separated by a decimal point. The number to the left of the decimal is the major release number. The digit to the right of the decimal indicates the version of the major release; it is zero for the first version. The sixth line contains a title for the file. The eighth line lists the number of entries (loci), number of bases (or base pairs), and number of reports of sequences (equal to number of entries in this case). These numbers are right-justified at fixed positions. The number of entries appears in positions 1 to 8, the number of bases in positions 16 to 26, and the number of reports in positions 40 to 47. The third, fifth, seventh, and ninth lines are blank. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBBCT1.SEQ Genetic Sequence Data Bank 15 August 2005 NCBI-GenBank Flat File Release 149.0 Bacterial Sequences (Part 1) 37811 loci, 97585608 bases, from 37811 reported sequences ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 1. Sample File Header 3.2 Directory Files 3.2.1 Short Directory File The short directory file contains brief descriptions of all of the sequence entries contained in this release. These descriptions are in fifteen groups, one group for each of the fifteen sequence entry data files. The first record at the beginning of a group of entries contains the name of the group in uppercase characters, beginning in position 21. The organism groups are PRIMATE, RODENT, OTHER MAMMAL, OTHER VERTEBRATE, INVERTEBRATE, PLANT, BACTERIAL, STRUCTURAL RNA, VIRAL, PHAGE, SYNTHETIC, UNANNOTATED, EXPRESSED SEQUENCE TAG, PATENT, or SEQUENCE TAGGED SITE. The second record is blank. Each record in the short directory contains the sequence entry name (LOCUS) in the first 12 positions, followed by a brief definition of the sequence beginning in column 13. The definition is truncated (at the end of a word) to leave room at the right margin for at least one space, the sequence length, and the letters `bp'. The length of the sequence is printed right-justified to column 77, followed by the letters `bp' in columns 78 and 79. The next-to-last record for a group has `ZZZZZZZZZZ' in its first ten positions (where the entry name would normally appear). The last record is a blank line. An example of the short directory file format, showing the descriptions of the last entries in the Other Vertebrate sequence data file and the first entries of the Invertebrate sequence data file, is reproduced below: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- ZEFWNT1G3 B.rerio wnt-1 gene (exon 3) for wnt-1 protein. 266bp ZEFWNT1G4 B.rerio wnt-1 gene (exon 4) for wnt-1 protein. 647bp ZEFZF54 Zebrafish homeotic gene ZF-54. 246bp ZEFZFEN Zebrafish engrailed-like homeobox sequence. 327bp ZZZZZZZZZZ INVERTEBRATE AAHAV33A Acanthocheilonema viteae pepsin-inhibitor-like-protein 1048bp ACAAC01 Acanthamoeba castelani gene encoding actin I. 1571bp ACAACTPH Acanthamoeba castellanii actophorin mRNA, complete cds. 671bp ACAMHCA A.castellanii non-muscle myosin heavy chain gene, partial 5894bp ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 2. Short Directory File 3.3 Index Files There are six files containing indices to the entries in this release: Accession number index file (Accession and Version) Secondary accession number index file Keyword phrase index file Author name index file Journal citation index file Gene name index file The index keys (accession numbers, keywords, authors, journals, and gene symbols.) of an index are sorted alphabetically. All index keys appear in uppercase characters even though they appear in mixed case in the sequence entries. Following each index key, the identifiers of the sequence entries containing that key are listed (LOCUS name, division abbreviation, and primary accession number). The division abbreviations are: 1. PRI - primate sequences 2. ROD - rodent sequences 3. MAM - other mammalian sequences 4. VRT - other vertebrate sequences 5. INV - invertebrate sequences 6. PLN - plant, fungal, and algal sequences 7. BCT - bacterial sequences 8. VRL - viral sequences 9. PHG - bacteriophage sequences 10. SYN - synthetic sequences 11. UNA - unannotated sequences 12. EST - EST sequences (expressed sequence tags) 13. PAT - patent sequences 14. STS - STS sequences (sequence tagged sites) 15. GSS - GSS sequences (genome survey sequences) 16. HTG - HTGS sequences (high throughput genomic sequences) 17. HTC - HTC sequences (high throughput cDNA sequences) 18. ENV - Environmental sampling sequences A line-oriented, TAB-delimited format is utilized for the gbaut.idx, gbgen.idx, gbjou.idx, gbkey.idx, and gbsec.idx indexes. Each index key is presented on its own line, and is followed by a LOCUS/Division/Accession triplet for every record containing the key: Indexed-Term LOCUS-name1 Div-code1 Accession1 LOCUS-name2 Div-code2 Accession2 LOCUS-name3 Div-code3 Accession3 .... Here is an example of the format, in which TAB characters are displayed as ^I, and carriage-returns/newlines as $ : (H+,K+)-ATPASE BETA-SUBUNIT$ ^IRATHKATPB^IROD^IM55655$ ^IMUSATP4B1^IROD^IM64685$ ^IMUSATP4B2^IROD^IM64686$ ^IMUSATP4B3^IROD^IM64687$ ^IMUSATP4B4^IROD^IM64688$ ^IDOGATPASEB^IMAM^IM76486$ When viewed by a file browser such as 'less' or 'more' : (H+,K+)-ATPASE BETA-SUBUNIT RATHKATPB ROD M55655 MUSATP4B1 ROD M64685 MUSATP4B2 ROD M64686 MUSATP4B3 ROD M64687 MUSATP4B4 ROD M64688 DOGATPASEB MAM M76486 Note that the index keys can be distinguished from LOCUS/DIV/ACCESSION by the fact that they do not start with a TAB character. So one can extract just the terms via simple text-processing: perl -ne 'print unless /^\s+/' < gbkey.idx > terms.gbkey The format of the primary accession number index file is slightly different, with each indexed key (Accession.Version) present on the same line as the LOCUS/Division/Accession triplet: Accession1.Version1 Locus-name1 Div-code1 Accession1 Accession2.Version2 Locus-name2 Div-code2 Accession2 .... Here is an example of the format, in which TAB characters are displayed as ^I, and carriage-returns/newlines as $ : AC000102.1^IAC000102^IPRI^IAC000102$ AC000103.1^IAC000103^IPLN^IAC000103$ AC000104.1^IF19P19^IPLN^IAC000104$ AC000105.40^IAC000105^IPRI^IAC000105$ AC000106.1^IF7G19^IPLN^IAC000106$ AC000107.1^IAC000107^IPLN^IAC000107$ AC000108.1^IAC000108^IBCT^IAC000108$ AC000109.1^IHSAC000109^IPRI^IAC000109$ AC000110.1^IHSAC000110^IPRI^IAC000110$ When viewed by a file browser such as 'less' or 'more' : AC000102.1 AC000102 PRI AC000102 AC000103.1 AC000103 PLN AC000103 AC000104.1 F19P19 PLN AC000104 AC000105.40 AC000105 PRI AC000105 AC000106.1 F7G19 PLN AC000106 AC000107.1 AC000107 PLN AC000107 AC000108.1 AC000108 BCT AC000108 AC000109.1 HSAC000109 PRI AC000109 AC000110.1 HSAC000110 PRI AC000110 3.3.1 Accession Number Index File - gbacc.idx Accession numbers are unique six character or eight-character alphanumeric identifiers of GenBank database entries. The six-character accession number format consists of a single uppercase letter, followed by 5 digits. The eight-character accession number format consists of two uppercase letters, followed by 6 digits. Accessions provide an unchanging identifier for the data with which they are associated, and we encourage you to cite accession numbers whenever you refer to data from GenBank. GenBank entries can have both 'primary' and 'secondary' accessions associated with them (see Section 3.5.6). Only primary accessions are present in the gbacc.idx index. 3.3.2 Keyword Phrase Index File - gbkey.idx Keyword phrases consist of names for gene products and other characteristics of sequence entries. 3.3.3 Author Name Index File - gbaut*.idx The author name index files list all of the author names that appear in the references within sequence records. 3.3.4 Journal Citation Index File - gbjou.idx The journal citation index file lists all of the citations that appear in the references within sequence records.. All citations are truncated to 80 characters. 3.3.5 Gene Name Index - gbgen.idx The /gene qualifiers of many GenBank entries contain values other than official gene symbols, such as the product or the standard name of the gene. Hence, NCBI has chosen to build an index (gbgen.idx) more like a keyword index for this field, using both the GenBank /gene qualifier and the 'Gene.locus' fields from the NCBI internal database as keys. 3.4 Sequence Entry Files GenBank releases contain one or more sequence entry data files, one for each "division" of GenBank. 3.4.1 File Organization Each of these files has the same format and consists of two parts: header information (described in section 3.1) and sequence entries for that division (described in the following sections). 3.4.2 Entry Organization In the second portion of a sequence entry file (containing the sequence entries for that division), each record (line) consists of two parts. The first part is found in positions 1 to 10 and may contain: 1. A keyword, beginning in column 1 of the record (e.g., REFERENCE is a keyword). 2. A subkeyword beginning in column 3, with columns 1 and 2 blank (e.g., AUTHORS is a subkeyword of REFERENCE). Or a subkeyword beginning in column 4, with columns 1, 2, and 3 blank (e.g., PUBMED is a subkeyword of REFERENCE). 3. Blank characters, indicating that this record is a continuation of the information under the keyword or subkeyword above it. 4. A code, beginning in column 6, indicating the nature of an entry (feature key) in the FEATURES table; these codes are described in Section 3.4.12.1 below. 5. A number, ending in column 9 of the record. This number occurs in the portion of the entry describing the actual nucleotide sequence and designates the numbering of sequence positions. 6. Two slashes (//) in positions 1 and 2, marking the end of an entry. The second part of each sequence entry record contains the information appropriate to its keyword, in positions 13 to 80 for keywords and positions 11 to 80 for the sequence. The following is a brief description of each entry field. Detailed information about each field may be found in Sections 3.4.4 to 3.4.15. LOCUS - A short mnemonic name for the entry, chosen to suggest the sequence's definition. Mandatory keyword/exactly one record. DEFINITION - A concise description of the sequence. Mandatory keyword/one or more records. ACCESSION - The primary accession number is a unique, unchanging code assigned to each entry. (Please use this code when citing information from GenBank.) Mandatory keyword/one or more records. VERSION - A compound identifier consisting of the primary accession number and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the sequence by NCBI. Mandatory keyword/exactly one record. NID - An alternative method of presenting the NCBI GI identifier (described above). The NID is obsolete and was removed from the GenBank flatfile format in December 1999. KEYWORDS - Short phrases describing gene products and other information about an entry. Mandatory keyword in all annotated entries/one or more records. SEGMENT - Information on the order in which this entry appears in a series of discontinuous sequences from the same molecule. Optional keyword (only in segmented entries)/exactly one record. SOURCE - Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword. ORGANISM - Formal scientific name of the organism (first line) and taxonomic classification levels (second and subsequent lines). Mandatory subkeyword in all annotated entries/two or more records. REFERENCE - Citations for all articles containing data reported in this entry. Includes seven subkeywords and may repeat. Mandatory keyword/one or more records. AUTHORS - Lists the authors of the citation. Optional subkeyword/one or more records. CONSRTM - Lists the collective names of consortiums associated with the citation (eg, International Human Genome Sequencing Consortium), rather than individual author names. Optional subkeyword/one or more records. TITLE - Full title of citation. Optional subkeyword (present in all but unpublished citations)/one or more records. JOURNAL - Lists the journal name, volume, year, and page numbers of the citation. Mandatory subkeyword/one or more records. MEDLINE - Provides the Medline unique identifier for a citation. Optional subkeyword/one record. PUBMED - Provides the PubMed unique identifier for a citation. Optional subkeyword/one record. REMARK - Specifies the relevance of a citation to an entry. Optional subkeyword/one or more records. COMMENT - Cross-references to other sequence entries, comparisons to other collections, notes of changes in LOCUS names, and other remarks. Optional keyword/one or more records/may include blank records. FEATURES - Table containing information on portions of the sequence that code for proteins and RNA molecules and information on experimentally determined sites of biological significance. Optional keyword/one or more records. BASE COUNT - Summary of the number of occurrences of each basepair code (a, c, t, g, and other) in the sequence. Optional keyword/exactly one record. NOTE: Obsolete as of release ???? CONTIG - This linetype provides information about how individual sequence records can be combined to form larger-scale biological objects, such as chromosomes or complete genomes. Rather than presenting actual sequence data, a special join() statement on the CONTIG line provides the accession numbers and basepair ranges of the underlying records which comprise the object. As of August 2005, the 2L chromosome arm of Drosophila melanogaster (accession number AE014134) provided a good example of CONTIG use. ORIGIN - Specification of how the first base of the reported sequence is operationally located within the genome. Where possible, this includes its location within a larger genetic map. Mandatory keyword/exactly one record. - The ORIGIN line is followed by sequence data (multiple records). // - Entry termination symbol. Mandatory at the end of an entry/exactly one record. 3.4.3 Sample Sequence Data File An example of a complete sequence entry file follows. (This example has only two entries.) Note that in this example, as throughout the data bank, numbers in square brackets indicate items in the REFERENCE list. For example, in ACARR58S, [1] refers to the paper by Mackay, et al. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBSMP.SEQ Genetic Sequence Data Bank 15 December 1992 GenBank Flat File Release 74.0 Structural RNA Sequences 2 loci, 236 bases, from 2 reported sequences LOCUS AAURRA 118 bp ss-rRNA RNA 16-JUN-1986 DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA. ACCESSION K03160 VERSION K03160.1 GI:173593 KEYWORDS 5S ribosomal RNA; ribosomal RNA. SOURCE A.auricula-judae (mushroom) ribosomal RNA. ORGANISM Auricularia auricula-judae Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes; Heterobasidiomycetidae; Auriculariales; Auriculariaceae. REFERENCE 1 (bases 1 to 118) AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R. TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and their use in studying the phylogenetic position of basidiomycetes among the eukaryotes JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983) FEATURES Location/Qualifiers rRNA 1..118 /note="5S ribosomal RNA" BASE COUNT 27 a 34 c 34 g 23 t ORIGIN 5' end of mature rRNA. 1 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga 61 gtaccgccca gttagtacca cggtggggga ccacgcggga atcctgggtg ctgtggtt // LOCUS ABCRRAA 118 bp ss-rRNA RNA 15-SEP-1990 DEFINITION Acetobacter sp. (strain MB 58) 5S ribosomal RNA, complete sequence. ACCESSION M34766 VERSION M34766.1 GI:173603 KEYWORDS 5S ribosomal RNA. SOURCE Acetobacter sp. (strain MB 58) rRNA. ORGANISM Acetobacter sp. Prokaryotae; Gracilicutes; Scotobacteria; Aerobic rods and cocci; Azotobacteraceae. REFERENCE 1 (bases 1 to 118) AUTHORS Bulygina,E.S., Galchenko,V.F., Govorukhina,N.I., Netrusov,A.I., Nikitin,D.I., Trotsenko,Y.A. and Chumakov,K.M. TITLE Taxonomic studies of methylotrophic bacteria by 5S ribosomal RNA sequencing JOURNAL J. Gen. Microbiol. 136, 441-446 (1990) FEATURES Location/Qualifiers rRNA 1..118 /note="5S ribosomal RNA" BASE COUNT 27 a 40 c 32 g 17 t 2 others ORIGIN 1 gatctggtgg ccatggcggg agcaaatcag ccgatcccat cccgaactcg gccgtcaaat 61 gccccagcgc ccatgatact ctgcctcaag gcacggaaaa gtcggtcgcc gccagayy // ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 9. Sample Sequence Data File 3.4.4 LOCUS Format The items of information contained in the LOCUS record are always found in fixed positions. The locus name (or entry name), which is always sixteen characters or less, begins in position 13. The locus name is designed to help group entries with similar sequences: the first three characters usually designate the organism; the fourth and fifth characters can be used to show other group designations, such as gene product; for segmented entries the last character is one of a series of sequential integers. The number of bases or base pairs in the sequence ends in position 40. The letters `bp' are in positions 42 to 43. Positions 45 to 47 provide the number of strands of the sequence. Positions 48 to 53 indicate the type of molecule sequenced. Topology of the molecule is indicated in positions 56 to 63. GenBank sequence entries are divided among many different 'divisions'. Each entry's division is specified by a three-letter code in positions 65 to 67. See Section 3.3 for an explanation of division codes. Positions 69 to 79 of the record contain the date the entry was entered or underwent any substantial revisions, such as the addition of newly published data, in the form dd-MMM-yyyy. The detailed format for the LOCUS line format is as follows: Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-28 Locus name 29-29 space 30-40 Length of sequence, right-justified 41-41 space 42-43 bp 44-44 space 45-47 spaces, ss- (single-stranded), ds- (double-stranded), or ms- (mixed-stranded) 48-53 NA, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA), mRNA (messenger RNA), uRNA (small nuclear RNA), snRNA, snoRNA. Left justified. 54-55 space 56-63 'linear' followed by two spaces, or 'circular' 64-64 space 65-67 The division code (see Section 3.3) 68-68 space 69-79 Date, in the form dd-MMM-yyyy (e.g., 15-MAR-1991) Although each of these data values can be found at column-specific positions, we encourage those who parse the contents of the LOCUS line to use a token-based approach. This will prevent the need for software changes if the spacing of the data values ever has to be modified. 3.4.5 DEFINITION Format The DEFINITION record gives a brief description of the sequence, proceeding from general to specific. It starts with the common name of the source organism, then gives the criteria by which this sequence is distinguished from the remainder of the source genome, such as the gene name and what it codes for, or the protein name and mRNA, or some description of the sequence's function (if the sequence is non-coding). If the sequence has a coding region, the description may be followed by a completeness qualifier, such as cds (complete coding sequence). There is no limit on the number of lines that may be part of the DEFINITION. The last line must end with a period. 3.4.5.1 DEFINITION Format for NLM Entries The DEFINITION line for entries derived from journal-scanning at the NLM is an automatically generated descriptive summary that accompanies each DNA and protein sequence. It contains information derived from fields in a database that summarize the most important attributes of the sequence. The DEFINITION lines are designed to supplement the accession number and the sequence itself as a means of uniquely and completely specifying DNA and protein sequences. The following are examples of NLM DEFINITION lines: NADP-specific isocitrate dehydrogenase [swine, mRNA, 1 gene, 1585 nt] 94 kda fiber cell beaded-filament structural protein [rats, lens, mRNA Partial, 1 gene, 1873 nt] inhibin alpha {promoter and exons} [mice, Genomic, 1 gene, 1102 nt, segment 1 of 2] cefEF, cefG=acetyl coenzyme A:deacetylcephalosporin C o-acetyltransferase [Acremonium chrysogenum, Genomic, 2 genes, 2639 nt] myogenic factor 3, qmf3=helix-loop-helix protein [Japanese quails, embryo, Peptide Partial, 246 aa] The first part of the definition line contains information describing the genes and proteins represented by the molecular sequences. This can be gene locus names, protein names and descriptions that replace or augment actual names. Gene and gene product are linked by "=". Any special identifying terms are presented within brackets, such as: {promoter}, {N-terminal}, {EC 2.13.2.4}, {alternatively spliced}, or {3' region}. The second part of the definition line is delimited by square brackets, '[]', and provides details about the molecule type and length. The biological source, i.e., genus and species or common name as cited by the author. Developmental stage, tissue type and strain are included if available. The molecule types include: Genomic, mRNA, Peptide. and Other Genomic Material. Genomic molecules are assumed to be partial sequence unless "Complete" is specified, whereas mRNA and peptide molecules are assumed to be complete unless "Partial" is noted. 3.4.6 ACCESSION Format This field contains a series of six-character and/or eight-character identifiers called 'accession numbers'. The six-character accession number format consists of a single uppercase letter, followed by 5 digits. The eight-character accession number format consists of two uppercase letters, followed by 6 digits. The 'primary', or first, of the accession numbers occupies positions 13 to 18 (6-character format) or positions 13 to 20 (8-character format). Subsequent 'secondary' accession numbers (if present) are separated from the primary, and from each other, by a single space. In some cases, multiple lines of secondary accession numbers might be present, starting at position 13. The primary accession number of a GenBank entry provides a stable identifier for the biological object that the entry represents. Accessions do not change when the underlying sequence data or associated features change. Secondary accession numbers arise for a number of reasons. For example, a single accession number may initially be assigned to a sequence described in a publication. If it is later discovered that the sequence must be entered into the database as multiple entries, each entry would receive a new primary accession number, and the original accession number would appear as a secondary accession number on each of the new entries. In the event that a large number of continuous secondary accession numbers exist, a range can be employed: SecAccession1-SecAccession2 In such cases, the alphabetic prefix letters of the initial and terminal accession numbers within the range *MUST* be identical. For example: AE000111-AE000510O ^^ ^^ Additionally, the value of the numeric portion of the initial secondary within the range must be less than the value of the numeric portion of the terminal secondary. 3.4.7 VERSION Format This line contains two types of identifiers for a GenBank database entry: a compound accession number and an NCBI GI identifier. LOCUS AF181452 1294 bp DNA PLN 12-OCT-1999 DEFINITION Hordeum vulgare dehydrin (Dhn2) gene, complete cds. ACCESSION AF181452 VERSION AF181452.1 GI:6017929 ^^^^^^^^^^ ^^^^^^^^^^ Compound NCBI GI Accession Identifier Number A compound accession number consists of two parts: a stable, unchanging primary-accession number portion (see Section 3.4.6 for a description of accession numbers), and a sequentially increasing numeric version number. The accession and version numbers are separated by a period. The initial version number assigned to a new sequence is one. Compound accessions are often referred to as "Accession.Version" . An accession number allows one to retrieve the same biological object in the database, regardless of any changes that are made to the entry over time. But those changes can include changes to the sequence data itself, which is of fundamental importance to many database users. So a numeric version number is associated with the sequence data in every database entry. If an entry (for example, AF181452) undergoes two sequence changes, its compound accession number on the VERSION line would start as AF181452.1 . After the first sequence change this would become: AF181452.2 . And after the second change: AF181452.3 . The NCBI GI identifier of the VERSION line also serves as a method for identifying the sequence data that has existed for a database entry over time. GI identifiers are numeric values of one or more digits. Since they are integer keys, they are less human-friendly than the Accession.Version system described above. Returning to our example for AF181452, it was initially assigned GI 6017929. If the sequence changes, a new integer GI will be assigned, perhaps 7345003 . And after the second sequence change, perhaps the GI would become 10456892 . Why are both these methods for identifying the version of the sequence associated with a database entry in use? For two reasons: - Some data sources processed by NCBI for incorporation into its Entrez sequence retrieval system do not version their own sequences. - GIs provide a uniform, integer identifier system for every sequence NCBI has processed. Some products and systems derived from (or reliant upon) NCBI products and services prefer to use these integer identifiers because they can all be processed in the same manner. GenBank Releases contain only the most recent versions of all sequences in the database. However, older versions can be obtained via GI-based or Accession.Version-based queries with NCBI's web-Entrez and network-Entrez applications. A sequence revision history web page is also available: http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/girevhist NOTE: All the version numbers for the compound Accession.Version identifier system were initialized to a value of one in February 1999, when that system was introduced. 3.4.8 KEYWORDS Format The KEYWORDS field does not appear in unannotated entries, but is required in all annotated entries. Keywords are separated by semicolons; a "keyword" may be a single word or a phrase consisting of several words. Each line in the keywords field ends in a semicolon; the last line ends with a period. If no keywords are included in the entry, the KEYWORDS record contains only a period. 3.4.9 SEGMENT Format The SEGMENT keyword is used when two (or more) entries of known relative orientation are separated by a short (<10 kb) stretch of DNA. It is limited to one line of the form `n of m', where `n' is the segment number of the current entry and `m' is the total number of segments. 3.4.10 SOURCE Format The SOURCE field consists of two parts. The first part is found after the SOURCE keyword and contains free-format information including an abbreviated form of the organism name followed by a molecule type; multiple lines are allowed, but the last line must end with a period. The second part consists of information found after the ORGANISM subkeyword. The formal scientific name for the source organism (genus and species, where appropriate) is found on the same line as ORGANISM. The records following the ORGANISM line list the taxonomic classification levels, separated by semicolons and ending with a period. 3.4.11 REFERENCE Format The REFERENCE field consists of five parts: the keyword REFERENCE, and the subkeywords AUTHORS, TITLE (optional), JOURNAL, MEDLINE (optional), PUBMED (optional), and REMARK (optional). The REFERENCE line contains the number of the particular reference and (in parentheses) the range of bases in the sequence entry reported in this citation. Additional prose notes may also be found within the parentheses. The numbering of the references does not reflect publication dates or priorities. The AUTHORS line lists the authors in the order in which they appear in the cited article. Last names are separated from initials by a comma (no space); there is no comma before the final `and'. The list of authors ends with a period. The TITLE line is an optional field, although it appears in the majority of entries. It does not appear in unpublished sequence data entries that have been deposited directly into the GenBank data bank, the EMBL Nucleotide Sequence Data Library, or the DNA Data Bank of Japan. The TITLE field does not end with a period. The JOURNAL line gives the appropriate literature citation for the sequence in the entry. The word `Unpublished' will appear after the JOURNAL subkeyword if the data did not appear in the scientific literature, but was directly deposited into the data bank. For published sequences the JOURNAL line gives the Thesis, Journal, or Book citation, including the year of publication, the specific citation, or In press. For Book citations, the JOURNAL line is specially-formatted, and includes: editor name(s) book title page number(s) publisher-name/publisher-location year For example: LOCUS AY277550 1440 bp DNA linear BCT 17-JUN-2003 DEFINITION Stenotrophomonas maltophilia strain CSC13-6 16S ribosomal RNA gene, partial sequence. ACCESSION AY277550 .... REFERENCE 1 (bases 1 to 1440) AUTHORS Gonzalez,J.M., Laiz,L. and Saiz-Jimenez,C. TITLE Classifying bacterial isolates from hypogean environments: Application of a novel fluorimetric method dor the estimation of G+C mol% content in microorganisms by thermal denaturation temperature JOURNAL (in) Saiz-Jimenez,C. (Ed.); MOLECULAR BIOLOGY AND CULTURAL HERITAGE: 47-54; A.A. Balkema, The Netherlands (2003) The presence of "(in)" signals the fact that the reference is for a book rather than a journal article. A semi-colon signals the end of the editor names. The next semi-colon signals the end of the page numbers, and the colon that immediately *precedes* the page numbers signals the end of the book title. The publisher name and location are a free-form text string. Finally, the year appears at the very end of the JOURNAL line, enclosed in parentheses. The MEDLINE line provides the National Library of Medicine's Medline unique identifier for a citation (if known). Medline UIs are 8 digit numbers. The PUBMED line provides the PubMed unique identifier for a citation (if known). PUBMED ids are numeric, and are record identifiers for article abstracts in the PubMed database : http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed Citations in PubMed that do not fall within Medline's scope will have only a PUBMED identifier. Similarly, citations that *are* in Medline's scope but which have not yet been assigned Medline UIs will have only a PUBMED identifier. If a citation is present in both the PubMed and Medline databases, both a MEDLINE and a PUBMED line will be present. The REMARK line is a textual comment that specifies the relevance of the citation to the entry. 3.4.12 FEATURES Format GenBank releases use a feature table format designed jointly by GenBank, the EMBL Nucleotide Sequence Data Library, and the DNA Data Bank of Japan. This format is in use by all three databases. The most complete and accurate Feature Table documentation can be found on the Web at: http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html Any discrepancy between the abbreviated feature table description of these release notes and the complete documentation on the Web should be resolved in favor of the version at the above URL. The Feature Table specification is also available as a printed document: `The DDBJ/EMBL/GenBank Feature Table: Definition'. Contact GenBank at the address shown on the first page of these Release Notes if you would like a copy. The feature table contains information about genes and gene products, as well as regions of biological significance reported in the sequence. The feature table contains information on regions of the sequence that code for proteins and RNA molecules. It also enumerates differences between different reports of the same sequence, and provides cross-references to other data collections, as described in more detail below. The first line of the feature table is a header that includes the keyword `FEATURES' and the column header `Location/Qualifier.' Each feature consists of a descriptor line containing a feature key and a location (see sections below for details). If the location does not fit on this line, a continuation line may follow. If further information about the feature is required, one or more lines containing feature qualifiers may follow the descriptor line. The feature key begins in column 6 and may be no more than 15 characters in length. The location begins in column 22. Feature qualifiers begin on subsequent lines at column 22. Location, qualifier, and continuation lines may extend from column 22 to 80. Feature tables are required, due to the mandatory presence of the source feature. The sections below provide a brief introduction to the feature table format. 3.4.12.1 Feature Key Names The first column of the feature descriptor line contains the feature key. It starts at column 6 and can continue to column 20. The list of valid feature keys is shown below. Remember, the most definitive documentation for the feature table can be found at: http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html allele Obsolete; see variation feature key attenuator Sequence related to transcription termination C_region Span of the C immunological feature CAAT_signal `CAAT box' in eukaryotic promoters CDS Sequence coding for amino acids in protein (includes stop codon) conflict Independent sequence determinations differ D-loop Displacement loop D_segment Span of the D immunological feature enhancer Cis-acting enhancer of promoter function exon Region that codes for part of spliced mRNA gene Region that defines a functional gene, possibly including upstream (promotor, enhancer, etc) and downstream control elements, and for which a name has been assigned. GC_signal `GC box' in eukaryotic promoters iDNA Intervening DNA eliminated by recombination intron Transcribed region excised by mRNA splicing J_region Span of the J immunological feature LTR Long terminal repeat mat_peptide Mature peptide coding region (does not include stop codon) misc_binding Miscellaneous binding site misc_difference Miscellaneous difference feature misc_feature Region of biological significance that cannot be described by any other feature misc_recomb Miscellaneous recombination feature misc_RNA Miscellaneous transcript feature not defined by other RNA keys misc_signal Miscellaneous signal misc_structure Miscellaneous DNA or RNA structure modified_base The indicated base is a modified nucleotide mRNA Messenger RNA mutation Obsolete: see variation feature key N_region Span of the N immunological feature old_sequence Presented sequence revises a previous version polyA_signal Signal for cleavage & polyadenylation polyA_site Site at which polyadenine is added to mRNA precursor_RNA Any RNA species that is not yet the mature RNA product prim_transcript Primary (unprocessed) transcript primer Primer binding region used with PCR primer_bind Non-covalent primer binding site promoter A region involved in transcription initiation protein_bind Non-covalent protein binding site on DNA or RNA RBS Ribosome binding site rep_origin Replication origin for duplex DNA repeat_region Sequence containing repeated subsequences repeat_unit One repeated unit of a repeat_region rRNA Ribosomal RNA S_region Span of the S immunological feature satellite Satellite repeated sequence scRNA Small cytoplasmic RNA sig_peptide Signal peptide coding region snRNA Small nuclear RNA source Biological source of the sequence data represented by a GenBank record. Mandatory feature, one or more per record. For organisms that have been incorporated within the NCBI taxonomy database, an associated /db_xref="taxon:NNNN" qualifier will be present (where NNNNN is the numeric identifier assigned to the organism within the NCBI taxonomy database). stem_loop Hair-pin loop structure in DNA or RNA STS Sequence Tagged Site; operationally unique sequence that identifies the combination of primer spans used in a PCR assay TATA_signal `TATA box' in eukaryotic promoters terminator Sequence causing transcription termination transit_peptide Transit peptide coding region transposon Transposable element (TN) tRNA Transfer RNA unsure Authors are unsure about the sequence in this region V_region Span of the V immunological feature variation A related population contains stable mutation - (hyphen) Placeholder -10_signal `Pribnow box' in prokaryotic promoters -35_signal `-35 box' in prokaryotic promoters 3'clip 3'-most region of a precursor transcript removed in processing 3'UTR 3' untranslated region (trailer) 5'clip 5'-most region of a precursor transcript removed in processing 5'UTR 5' untranslated region (leader) 3.4.12.2 Feature Location The second column of the feature descriptor line designates the location of the feature in the sequence. The location descriptor begins at position 22. Several conventions are used to indicate sequence location. Base numbers in location descriptors refer to numbering in the entry, which is not necessarily the same as the numbering scheme used in the published report. The first base in the presented sequence is numbered base 1. Sequences are presented in the 5' to 3' direction. Location descriptors can be one of the following: 1. A single base; 2. A contiguous span of bases; 3. A site between two bases; 4. A single base chosen from a range of bases; 5. A single base chosen from among two or more specified bases; 6. A joining of sequence spans; 7. A reference to an entry other than the one to which the feature belongs (i.e., a remote entry), followed by a location descriptor referring to the remote sequence; A site between two residues, such as an endonuclease cleavage site, is indicated by listing the two bases separated by a carat (e.g., 23^24). A single residue chosen from a range of residues is indicated by the number of the first and last bases in the range separated by a single period (e.g., 23.79). The symbols < and > indicate that the end point of the range is beyond the specified base number. A contiguous span of bases is indicated by the number of the first and last bases in the range separated by two periods (e.g., 23..79). The symbols < and > indicate that the end point of the range is beyond the specified base number. Starting and ending positions can be indicated by base number or by one of the operators described below. Operators are prefixes that specify what must be done to the indicated sequence to locate the feature. The following are the operators available, along with their most common format and a description. complement (location): The feature is complementary to the location indicated. Complementary strands are read 5' to 3'. join (location, location, .. location): The indicated elements should be placed end to end to form one contiguous sequence. order (location, location, .. location): The elements are found in the specified order in the 5 to 3 direction, but nothing is implied about the rationality of joining them. 3.4.12.3 Feature Qualifiers Qualifiers provide additional information about features. They take the form of a slash (/) followed by a qualifier name and, if applicable, an equal sign (=) and a qualifier value. Feature qualifiers begin at column 22. Qualifiers convey many types of information. Their values can, therefore, take several forms: 1. Free text; 2. Controlled vocabulary or enumerated values; 3. Citations or reference numbers; 4. Sequences; 5. Feature labels. Text qualifier values must be enclosed in double quotation marks. The text can consist of any printable characters (ASCII values 32-126 decimal). If the text string includes double quotation marks, each set must be `escaped' by placing a double quotation mark in front of it (e.g., /note="This is an example of ""escaped"" quotation marks"). Some qualifiers require values selected from a limited set of choices. For example, the `/direction' qualifier has only three values `left,' `right,' or `both.' These are called controlled vocabulary qualifier values. Controlled qualifier values are not case sensitive; they can be entered in any combination of upper- and lowercase without changing their meaning. Citation or published reference numbers for the entry should be enclosed in square brackets ([]) to distinguish them from other numbers. A literal sequence of bases (e.g., "atgcatt") should be enclosed in quotation marks. Literal sequences are distinguished from free text by context. Qualifiers that take free text as their values do not take literal sequences, and vice versa. The `/label=' qualifier takes a feature label as its qualifier. Although feature labels are optional, they allow unambiguous references to the feature. The feature label identifies a feature within an entry; when combined with the accession number and the name of the data bank from which it came, it is a unique tag for that feature. Feature labels must be unique within an entry, but can be the same as a feature label in another entry. Feature labels are not case sensitive; they can be entered in any combination of upper-and lowercase without changing their meaning. The following is a partial list of feature qualifiers. /anticodon Location of the anticodon of tRNA and the amino acid for which it codes /bound_moiety Moiety bound /citation Reference to a citation providing the claim of or evidence for a feature /codon Specifies a codon that is different from any found in the reference genetic code /codon_start Indicates the first base of the first complete codon in a CDS (as 1 or 2 or 3) /cons_splice Identifies intron splice sites that do not conform to the 5'-GT... AG-3' splice site consensus /db_xref A database cross-reference; pointer to related information in another database. A description of all cross-references can be found at: http://www.ncbi.nlm.nih.gov/collab/db_xref.html /direction Direction of DNA replication /EC_number Enzyme Commission number for the enzyme product of the sequence /evidence Value indicating the nature of supporting evidence /frequency Frequency of the occurrence of a feature /function Function attributed to a sequence /gene Symbol of the gene corresponding to a sequence region (usable with all features) /label A label used to permanently identify a feature /map Map position of the feature in free-format text /mod_base Abbreviation for a modified nucleotide base /note Any comment or additional information /number A number indicating the order of genetic elements (e.g., exons or introns) in the 5 to 3 direction /organism Name of the organism that is the source of the sequence data in the record. /partial Differentiates between complete regions and partial ones /phenotype Phenotype conferred by the feature /product Name of a product encoded by a coding region (CDS) feature /pseudo Indicates that this feature is a non-functional version of the element named by the feature key /rpt_family Type of repeated sequence; Alu or Kpn, for example /rpt_type Organization of repeated sequence /rpt_unit Identity of repeat unit that constitutes a repeat_region /standard_name Accepted standard name for this feature /transl_except Translational exception: single codon, the translation of which does not conform to the reference genetic code /translation Amino acid translation of a coding region /type Name of a strain if different from that in the SOURCE field /usedin Indicates that feature is used in a compound feature in another entry 3.4.12.4 Cross-Reference Information One type of information in the feature table lists cross-references to the annual compilation of transfer RNA sequences in Nucleic Acids Research, which has kindly been sent to us on CD-ROM by Dr. Sprinzl. Each tRNA entry of the feature table contains a /note= qualifier that includes a reference such as `(NAR: 1234)' to identify code 1234 in the NAR compilation. When such a cross-reference appears in an entry that contains a gene coding for a transfer RNA molecule, it refers to the code in the tRNA gene compilation. Similar cross-references in entries containing mature transfer RNA sequences refer to the companion compilation of tRNA sequences published by D.H. Gauss and M. Sprinzl in Nucleic Acids Research. 3.4.12.5 Feature Table Examples In the first example a number of key names, feature locations, and qualifiers are illustrated, taken from different sequences. The first table entry is a coding region consisting of a simple span of bases and including a /gene qualifier. In the second table entry, an NAR cross-reference is given (see the previous section for a discussion of these cross-references). The third and fourth table entries use the symbols `<`and `>' to indicate that the beginning or end of the feature is beyond the range of the presented sequence. In the fifth table entry, the symbol `^' indicates that the feature is between bases. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- CDS 5..1261 /product="alpha-1-antitrypsin precursor" /map="14q32.1" /gene="PI" tRNA 1..87 /note="Leu-tRNA-CAA (NAR: 1057)" /anticodon=(pos:35..37,aa:Leu) mRNA 1..>66 /note="alpha-1-acid glycoprotein mRNA" transposon <1..267 /note="insertion element IS5" misc_recomb 105^106 /note="B.subtilis DNA end/IS5 DNA start" conflict 258 /replace="t" /citation=[2] ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 10. Feature Table Entries The next example shows the representation for a CDS that spans more than one entry. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- LOCUS HUMPGAMM1 3688 bp ds-DNA PRI 15-OCT-1990 DEFINITION Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M) gene, 5' end. ACCESSION M55673 M25818 M27095 KEYWORDS phosphoglycerate mutase. SEGMENT 1 of 2 . . . FEATURES Location/Qualifiers CAAT_signal 1751..1755 /gene="PGAM-M" TATA_signal 1791..1799 /gene="PGAM-M" exon 1820..2274 /number=1 /EC_number="5.4.2.1" /gene="PGAM-M" intron 2275..2377 /number=1 /gene="PGAM2" exon 2378..2558 /number=2 /gene="PGAM-M" . . . // LOCUS HUMPGAMM2 677 bp ds-DNA PRI 15-OCT-1990 DEFINITION Human phosphoglycerate mutase (muscle specific isozyme) (PGAM-M), exon 3. ACCESSION M55674 M25818 M27096 KEYWORDS phosphoglycerate mutase. SEGMENT 2 of 2 . . . FEATURES Location/Qualifiers exon 255..457 /number=3 /gene="PGAM-M" intron order(M55673:2559..>3688,<1..254) /number=2 /gene="PGAM-M" mRNA join(M55673:1820..2274,M55673:2378..2558,255..457) /gene="PGAM-M" CDS join(M55673:1861..2274,M55673:2378..2558,255..421) /note="muscle-specific isozyme" /gene="PGAM2" /product="phosphoglycerate mutase" /codon_start=1 /translation="MATHRLVMVRHGESTWNQENRFCGWFDAELSEKGTEEAKRGAKA IKDAKMEFDICYTSVLKRAIRTLWAILDGTDQMWLPVVRTWRLNERHYGGLTGLNKAE TAAKHGEEQVKIWRRSFDIPPPPMDEKHPYYNSISKERRYAGLKPGELPTCESLKDTI ARALPFWNEEIVPQIKAGKRVLIAAHGNSLRGIVKHLEGMSDQAIMELNLPTGIPIVY ELNKELKPTKPMQFLGDEETVRKAMEAVAAQGKAK" . . . // ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 11. Joining Sequences 3.4.13 ORIGIN Format The ORIGIN record may be left blank, may appear as `Unreported.' or may give a local pointer to the sequence start, usually involving an experimentally determined restriction cleavage site or the genetic locus (if available). The ORIGIN record ends in a period if it contains data, but does not include the period if the record is left empty (in contrast to the KEYWORDS field which contains a period rather than being left blank). 3.4.14 SEQUENCE Format The nucleotide sequence for an entry is found in the records following the ORIGIN record. The sequence is reported in the 5' to 3' direction. There are sixty bases per record, listed in groups of ten bases followed by a blank, starting at position 11 of each record. The number of the first nucleotide in the record is given in columns 4 to 9 (right justified) of the record. 3.4.15 CONTIG Format As an alternative to SEQUENCE, a CONTIG record can be present following the ORIGIN record. A join() statement utilizing a syntax similar to that of feature locations (see the Feature Table specification mentioned in Section 3.4.12) provides the accession numbers and basepair ranges of other GenBank sequences which contribute to a large-scale biological object, such as a chromosome or complete genome. Here is an example of the use of CONTIG : CONTIG join(AE003590.3:1..305900,AE003589.4:61..306076, AE003588.3:61..308447,AE003587.4:61..314549,AE003586.3:61..306696, AE003585.5:61..343161,AE003584.5:61..346734,AE003583.3:101..303641, [ lines removed for brevity ] AE003782.4:61..298116,AE003783.3:16..111706,AE002603.3:61..143856) However, the CONTIG join() statement can also utilize a special operator which is *not* part of the syntax for feature locations: gap() : Gap of unknown length. gap(X) : Gap with an estimated integer length of X bases. To be represented as a run of n's of length X in the sequence that can be constructed from the CONTIG line join() statement . gap(unkX) : Gap of unknown length, which is to be represented as an integer number (X) of n's in the sequence that can be constructed from the CONTIG line join() statement. The value of this gap operator consists of the literal characters 'unk', followed by an integer. Here is an example of a CONTIG line join() that utilizes the gap() operator: CONTIG join(complement(AADE01002756.1:1..10234),gap(1206), AADE01006160.1:1..1963,gap(323),AADE01002525.1:1..11915,gap(1633), AADE01005641.1:1..2377) The first and last elements of the join() statement may be a gap() operator. But if so, then those gaps should represent telomeres, centromeres, etc. Consecutive gap() operators are illegal. 4. ALTERNATE RELEASES NCBI is supplying sequence data in the GenBank flat file format to maintain compatibility with existing software which require that particular format. Although we have made every effort to ensure that these data are presented in the traditional flat file format, if you encounter any problems in using these data with software which is based upon the flat file format, please contact us at: info@ncbi.nlm.nih.gov The flat file is just one of many possible report formats that can be generated from the richer representation supported by the ASN.1 form of the data. Developers of new software tools should consider using the ASN.1 form directly to take advantage of those features. Documentation and a Software Developer's Toolkit for ASN.1 are available through NCBI. You may call NCBI at (301)496-2475, or subscribe to a developers' electronic newsgroup by sending your name, address, affiliation, and e-mail address to: bits-request@ncbi.nlm.nih.gov The Software Developer's Toolkit and PostScript documentation for UNIX, VMS, Ultrix, AIX, MacOS, DOS, and Microsoft Windows systems is available in a compressed UNIX tar file by anonymous ftp from 'ftp.ncbi.nih.gov', in the toolbox/ncbi_tools directory. The file is 'ncbi.tar.Z'. 5. KNOWN PROBLEMS OF THE GENBANK DATABASE 5.1 Incorrect Gene Symbols in Entries and Index The /gene qualifier for many GenBank entries contains values other than the official gene symbol, such as the product or the standard name of the gene. The gene symbol index (gbgen.idx) is created from the data in the /gene qualifier and therefore may contain data other than official gene symbols. 6. GENBANK ADMINISTRATION The National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, is responsible for the production and distribution of the NIH GenBank Sequence Database. NCBI distributes GenBank sequence data by anonymous FTP, e-mail servers and other network services. For more information, you may contact NCBI at the e-mail address: info@ncbi.nlm.nih.gov or by phone: 301-496-2475. 6.1 Registered Trademark Notice GenBank (R) is a registered trademark of the U.S. Department of Health and Human Services for the Genetic Sequence Data Bank. 6.2 Citing GenBank If you have used GenBank in your research, we would appreciate it if you would include a reference to GenBank in all publications related to that research. When citing data in GenBank, it is appropriate to give the sequence name, primary accession number, and the publication in which the sequence first appeared. If the data are unpublished, we urge you to contact the group which submitted the data to GenBank to see if there is a recent publication or if they have determined any revisions or extensions of the data. It is also appropriate to list a reference for GenBank itself. The following publication, which describes the GenBank database, should be cited: Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Rapp B.A., Wheeler D.L. GenBank. Nucl. Acids Res. 28(1):15-18 (2000) The following statement is an example of how you may cite GenBank data. It cites the sequence, its primary accession number, the group who determined the sequence, and GenBank. The numbers in parentheses refer to the GenBank citation above and to the REFERENCE in the GenBank sequence entry. `We scanned the GenBank (1) database for sequence similarities and found one sequence (2), GenBank accession number J01016, which showed significant similarity...' (1) Benson, D.A. et al. Nucl. Acids Res. 28(1):15-18 (2000) (2) Nellen, W. and Gallwitz, D. J. Mol. Biol. 159, 1-18 (1982) 6.3 GenBank Distribution Formats and Media Complete flat file releases of the GenBank database are available via NCBI's anonymous ftp server: ftp://ftp.ncbi.nih.gov Each release is cumulative, incorporating all previous GenBank data. No retrieval software is provided. GenBank distribution via CD-ROM ceased as of GenBank Release 106.0 (April, 1998). A mirror of the GenBank FTP site at the NCBI is available at the University of Indiana: ftp://bio-mirror.net/biomirror/genbank/ 6.4 Other Methods of Accessing GenBank Data Entrez is a molecular biology database system that presents an integrated view of DNA and protein sequence data, 3D structure data, complete genomes, and associated MEDLINE entries. The system is produced by the National Center for Biotechnology Information (NCBI), and is available only via the Internet (using the Web-Entrez and Network-Entrez applications). Accessing Entrez is easy: if you have a World Wide Web browser, such as Netscape or Internet-Explorer, simply point your browser to: http://www.ncbi.nlm.nih.gov/ The Web version of Entrez has all the capabilities of the network version, but with the visual style of the World Wide Web. If you prefer the "look and feel" of Network-Entrez, you may download Network-Entrez from the NCBI's FTP server: ftp://ftp.ncbi.nih.gov/ Versions are available for PC/Windows, Macintosh and several Unix variants. For information about Network-Entrez, Web-Entrez or any other NCBI services, you may contact NCBI by e-mail at info@ncbi.nlm.nih.gov or by phone at 301-496-2475. 6.5 Request for Corrections and Comments We welcome your suggestions for improvements to GenBank. We are especially interested to learn of errors or inconsistencies in the data. BankIt or Sequin can be used to submit revisions to previous submissions. In addition, suggestions and corrections can be sent by electronic mail to: update@ncbi.nlm.nih.gov. Please be certain to indicate the GenBank release number (e.g., Release 149.0) and the primary accession number of the entry to which your comments apply; it is helpful if you also give the entry name and the current contents of any data field for which you are recommending a change. 6.6 Credits and Acknowledgments Credits - GenBank Release Coordination Mark Cavanaugh GenBank Submission Coordination Ilene Mizrachi GenBank Annotation Staff Michael Baxter, Lori Black, Larissa Brown, Larry Chlumsky, Karen Clark, Christina Couldrey, Irene Fang, Linda Frisse, Michael Fetchko, Anjanette Johnston, Richard McVeigh, Leonie Misquitta, Ilene Mizrachi, DeAnne Olsen Cravaritis, Chris O'Sullivan, Leigh Riley, Gert Roosen, Susan Schafer, and Linda Yankie Data Management and Preparation Vladimir Alekseyev, Serge Bazhin, Mark Cavanaugh, WonHee Jang, Jonathan Kans, Michael Kimelman, Jim Ostell, Lynn Schriml, Carolyn Shenmen, Karl Sirotkin, Vladimir Soussov, Elena Starchenko, Hanzhen Sun, Tatiana Tatusova, Aleksey Vysokolov, Lukas Wagner, Jane Weisemann, Eugene Yaschenko Database Administration Slava Khotomliansky, Joe Pepersack, Tony Stearman User Support Masoumeh Assadi, Medha Bhagwat, Peter Cooper, Susan Dombrowski, Andrei Gabrielian, Renata Geer, Chuong Huynh, Emir Khatipov, Hanguan Liu, Wayne Matten, Scott McGinnis, Rana Morris, Steve Pechous, Vyvy Pham, Monica Romiti, Eric Sayers, Tao Tao, Majda Valjavec-Gratian, David Wheeler Project Direction David Lipman Acknowledgments - Contractor support for GenBank production and distribution has been provided by Management Systems Designers, Inc., ComputerCraft Corporation, and The KEVRIC Company, Inc. 6.7 Disclaimer The United States Government makes no representations or warranties regarding the content or accuracy of the information. The United States Government also makes no representations or warranties of merchantability or fitness for a particular purpose or that the use of the sequences will not infringe any patent, copyright, trademark, or other rights. The United States Government accepts no responsibility for any consequence of the receipt or use of the information. For additional information about GenBank releases, please contact NCBI by e-mail at info@ncbi.nlm.nih.gov, by phone at (301) 496-2475, or by mail at: GenBank National Library of Medicine Bldg. 38A Rm. 8N-809 8600 Rockville Pike Bethesda, MD 20894 FAX: (301) 480-9241 .