URI: 
       tventi.7 - plan9port - [fork] Plan 9 from user space
  HTML git clone git://src.adamsgaard.dk/plan9port
   DIR Log
   DIR Files
   DIR Refs
   DIR README
   DIR LICENSE
       ---
       tventi.7 (11062B)
       ---
            1 .TH VENTI 7
            2 .SH NAME
            3 venti \- archival storage server
            4 .SH DESCRIPTION
            5 Venti is a block storage server intended for archival data.
            6 In a Venti server, the SHA1 hash of a block's contents acts
            7 as the block identifier for read and write operations.
            8 This approach enforces a write-once policy, preventing
            9 accidental or malicious destruction of data.  In addition,
           10 duplicate copies of a block are coalesced, reducing the
           11 consumption of storage and simplifying the implementation
           12 of clients.
           13 .PP
           14 This manual page documents the basic concepts of
           15 block storage using Venti as well as the Venti network protocol.
           16 .PP
           17 .MR Venti (1)
           18 documents some simple clients.
           19 .MR Vac (1) ,
           20 .MR vacfs (4) ,
           21 and
           22 .MR vbackup (8)
           23 are more complex clients.
           24 .PP
           25 .MR Venti (3)
           26 describes a C library interface for accessing
           27 Venti servers and manipulating Venti data structures.
           28 .PP
           29 .MR Venti (8)
           30 describes the programs used to run a Venti server.
           31 .PP
           32 .SS "Scores
           33 The SHA1 hash that identifies a block is called its
           34 .IR score .
           35 The score of the zero-length block is called the
           36 .IR "zero score" .
           37 .PP
           38 Scores may have an optional 
           39 .IB label :
           40 prefix, typically used to
           41 describe the format of the data.
           42 For example, 
           43 .MR vac (1)
           44 uses a
           45 .B vac:
           46 prefix, while
           47 .MR vbackup (8)
           48 uses prefixes corresponding to the file system
           49 types: 
           50 .BR ext2: ,
           51 .BR ffs: ,
           52 and so on.
           53 .SS "Files and Directories
           54 Venti accepts blocks up to 56 kilobytes in size.  
           55 By convention, Venti clients use hash trees of blocks to
           56 represent arbitrary-size data
           57 .IR files .
           58 The data to be stored is split into fixed-size
           59 blocks and written to the server, producing a list
           60 of scores.
           61 The resulting list of scores is split into fixed-size pointer
           62 blocks (using only an integral number of scores per block)
           63 and written to the server, producing a smaller list
           64 of scores.
           65 The process continues, eventually ending with the
           66 score for the hash tree's top-most block.
           67 Each file stored this way is summarized by
           68 a
           69 .B VtEntry
           70 structure recording the top-most score, the depth
           71 of the tree, the data block size, and the pointer block size.
           72 One or more 
           73 .B VtEntry
           74 structures can be concatenated
           75 and stored as a special file called a
           76 .IR directory .
           77 In this
           78 manner, arbitrary trees of files can be constructed
           79 and stored.
           80 .PP
           81 Scores passed between programs conventionally refer
           82 to
           83 .B VtRoot
           84 blocks, which contain descriptive information
           85 as well as the score of a directory block containing a small number
           86 of directory entries.
           87 .PP
           88 Conventionally, programs do not mix data and directory entries
           89 in the same file.  Instead, they keep two separate files, one with
           90 directory entries and one with metadata referencing those
           91 entries by position.
           92 Keeping this parallel representation is a minor annoyance
           93 but makes it possible for general programs like
           94 .I venti/copy
           95 (see
           96 .MR venti (1) )
           97 to traverse the block tree without knowing the specific details
           98 of any particular program's data.
           99 .SS "Block Types
          100 To allow programs to traverse these structures without
          101 needing to understand their higher-level meanings,
          102 Venti tags each block with a type.  The types are:
          103 .PP
          104 .nf
          105 .ft L
          106     VtDataType     000  \f1data\fL
          107     VtDataType+1   001  \fRscores of \fPVtDataType\fR blocks\fL
          108     VtDataType+2   002  \fRscores of \fPVtDataType+1\fR blocks\fL
          109     \fR\&...\fL
          110     VtDirType      010  VtEntry\fR structures\fL
          111     VtDirType+1    011  \fRscores of \fLVtDirType\fR blocks\fL
          112     VtDirType+2    012  \fRscores of \fLVtDirType+1\fR blocks\fL
          113     \fR\&...\fL
          114     VtRootType     020  VtRoot\fR structure\fL
          115 .fi
          116 .PP
          117 The octal numbers listed are the type numbers used
          118 by the commands below.
          119 (For historical reasons, the type numbers used on
          120 disk and on the wire are different from the above.
          121 They do not distinguish
          122 .BI VtDataType+ n
          123 blocks from
          124 .BI VtDirType+ n
          125 blocks.)
          126 .SS "Zero Truncation
          127 To avoid storing the same short data blocks padded with
          128 differing numbers of zeros, Venti clients working with fixed-size
          129 blocks conventionally
          130 `zero truncate' the blocks before writing them to the server.
          131 For example, if a 1024-byte data block contains the 
          132 11-byte string 
          133 .RB ` hello " " world '
          134 followed by 1013 zero bytes,
          135 a client would store only the 11-byte block.
          136 When the client later read the block from the server,
          137 it would append zero bytes to the end as necessary to
          138 reach the expected size.
          139 .PP
          140 When truncating pointer blocks
          141 .RB ( VtDataType+ \fIn
          142 and
          143 .BI VtDirType+ n
          144 blocks),
          145 trailing zero scores are removed
          146 instead of trailing zero bytes.
          147 .PP
          148 Because of the truncation convention,
          149 any file consisting entirely of zero bytes,
          150 no matter what its length, will be represented by the zero score:
          151 the data blocks contain all zeros and are thus truncated
          152 to the empty block, and the pointer blocks contain all zero scores
          153 and are thus also truncated to the empty block, 
          154 and so on up the hash tree.
          155 .SS Network Protocol
          156 A Venti session begins when a
          157 .I client
          158 connects to the network address served by a Venti
          159 .IR server ;
          160 the conventional address is 
          161 .BI tcp! server !venti
          162 (the
          163 .B venti
          164 port is 17034).
          165 Both client and server begin by sending a version
          166 string of the form
          167 .BI venti- versions - comment \en \fR.
          168 The
          169 .I versions
          170 field is a list of acceptable versions separated by
          171 colons.
          172 The protocol described here is version
          173 .BR 02 .
          174 The client is responsible for choosing a common
          175 version and sending it in the
          176 .B VtThello
          177 message, described below.
          178 .PP
          179 After the initial version exchange, the client transmits
          180 .I requests
          181 .RI ( T-messages )
          182 to the server, which subsequently returns
          183 .I replies
          184 .RI ( R-messages )
          185 to the client.
          186 The combined act of transmitting (receiving) a request
          187 of a particular type, and receiving (transmitting) its reply
          188 is called a
          189 .I transaction
          190 of that type.
          191 .PP
          192 Each message consists of a sequence of bytes.
          193 Two-byte fields hold unsigned integers represented
          194 in big-endian order (most significant byte first).
          195 Data items of variable lengths are represented by
          196 a one-byte field specifying a count,
          197 .IR n ,
          198 followed by
          199 .I n
          200 bytes of data.
          201 Text strings are represented similarly,
          202 using a two-byte count with
          203 the text itself stored as a UTF-encoded sequence
          204 of Unicode characters (see
          205 .MR utf (7) ).
          206 Text strings are not
          207 .SM NUL\c
          208 -terminated:
          209 .I n
          210 counts the bytes of UTF data, which include no final
          211 zero byte.
          212 The
          213 .SM NUL
          214 character is illegal in text strings in the Venti protocol.
          215 The maximum string length in Venti is 1024 bytes.
          216 .PP
          217 Each Venti message begins with a two-byte size field 
          218 specifying the length in bytes of the message,
          219 not including the length field itself.
          220 The next byte is the message type, one of the constants
          221 in the enumeration in the include file
          222 .BR <venti.h> .
          223 The next byte is an identifying
          224 .IR tag ,
          225 used to match responses to requests.
          226 The remaining bytes are parameters of different sizes.
          227 In the message descriptions, the number of bytes in a field
          228 is given in brackets after the field name.
          229 The notation
          230 .IR parameter [ n ]
          231 where
          232 .I n
          233 is not a constant represents a variable-length parameter:
          234 .IR n [1]
          235 followed by
          236 .I n
          237 bytes of data forming the
          238 .IR parameter .
          239 The notation
          240 .IR string [ s ]
          241 (using a literal
          242 .I s
          243 character)
          244 is shorthand for
          245 .IR s [2]
          246 followed by
          247 .I s
          248 bytes of UTF-8 text.
          249 The notation
          250 .IR parameter []
          251 where 
          252 .I parameter
          253 is the last field in the message represents a 
          254 variable-length field that comprises all remaining
          255 bytes in the message.
          256 .PP
          257 All Venti RPC messages are prefixed with a field
          258 .IR size [2]
          259 giving the length of the message that follows
          260 (not including the
          261 .I size
          262 field itself).
          263 The message bodies are:
          264 .ta \w'\fLVtTgoodbye 'u
          265 .IP
          266 .ne 2v
          267 .B VtThello
          268 .IR tag [1]
          269 .IR version [ s ]
          270 .IR uid [ s ]
          271 .IR strength [1]
          272 .IR crypto [ n ]
          273 .IR codec [ n ]
          274 .br
          275 .B VtRhello
          276 .IR tag [1]
          277 .IR sid [ s ] 
          278 .IR rcrypto [1]
          279 .IR rcodec [1]
          280 .IP
          281 .ne 2v
          282 .B VtTping
          283 .IR tag [1]
          284 .br
          285 .B VtRping
          286 .IR tag [1]
          287 .IP
          288 .ne 2v
          289 .B VtTread
          290 .IR tag [1]
          291 .IR score [20]
          292 .IR type [1]
          293 .IR pad [1]
          294 .IR count [2]
          295 .br
          296 .B VtRread
          297 .IR tag [1]
          298 .IR data []
          299 .IP
          300 .ne 2v
          301 .B VtTwrite
          302 .IR tag [1]
          303 .IR type [1]
          304 .IR pad [3]
          305 .IR data []
          306 .br
          307 .B VtRwrite
          308 .IR tag [1]
          309 .IR score [20]
          310 .IP
          311 .ne 2v
          312 .B VtTsync
          313 .IR tag [1]
          314 .br
          315 .B VtRsync
          316 .IR tag [1]
          317 .IP
          318 .ne 2v
          319 .B VtRerror
          320 .IR tag [1]
          321 .IR error [ s ]
          322 .IP
          323 .ne 2v
          324 .B VtTgoodbye
          325 .IR tag [1]
          326 .PP
          327 Each T-message has a one-byte
          328 .I tag
          329 field, chosen and used by the client to identify the message.
          330 The server will echo the request's
          331 .I tag
          332 field in the reply.
          333 Clients should arrange that no two outstanding
          334 messages have the same tag field so that responses
          335 can be distinguished.
          336 .PP
          337 The type of an R-message will either be one greater than
          338 the type of the corresponding T-message or
          339 .BR Rerror ,
          340 indicating that the request failed.
          341 In the latter case, the
          342 .I error
          343 field contains a string describing the reason for failure.
          344 .PP
          345 Venti connections must begin with a 
          346 .B hello
          347 transaction.
          348 The
          349 .B VtThello
          350 message contains the protocol
          351 .I version
          352 that the client has chosen to use.
          353 The fields
          354 .IR strength ,
          355 .IR crypto ,
          356 and
          357 .IR codec
          358 could be used to add authentication, encryption,
          359 and compression to the Venti session
          360 but are currently ignored.
          361 The 
          362 .IR rcrypto ,
          363 and
          364 .I rcodec
          365 fields in the 
          366 .B VtRhello
          367 response are similarly ignored.
          368 The
          369 .IR uid 
          370 and
          371 .IR sid
          372 fields are intended to be the identity
          373 of the client and server but, given the lack of
          374 authentication, should be treated only as advisory.
          375 The initial
          376 .B hello
          377 should be the only
          378 .B hello
          379 transaction during the session.
          380 .PP
          381 The
          382 .B ping
          383 message has no effect and 
          384 is used mainly for debugging.
          385 Servers should respond immediately to pings.
          386 .PP
          387 The
          388 .B read
          389 message requests a block with the given
          390 .I score
          391 and
          392 .IR type .
          393 Use
          394 .I vttodisktype
          395 and
          396 .I vtfromdisktype
          397 (see
          398 .MR venti (3) )
          399 to convert a block type enumeration value
          400 .RB ( VtDataType ,
          401 etc.)
          402 to the 
          403 .I type
          404 used on disk and in the protocol.
          405 The
          406 .I count
          407 field specifies the maximum expected size
          408 of the block.
          409 The
          410 .I data
          411 in the reply is the block's contents.
          412 .PP
          413 The
          414 .B write
          415 message writes a new block of the given
          416 .I type
          417 with contents
          418 .I data
          419 to the server.
          420 The response includes the
          421 .I score
          422 to use to read the block,
          423 which should be the SHA1 hash of 
          424 .IR data .
          425 .PP
          426 The Venti server may buffer written blocks in memory,
          427 waiting until after responding to the
          428 .B write
          429 message before writing them to
          430 permanent storage.
          431 The server will delay the response to a
          432 .B sync
          433 message until after all blocks in earlier
          434 .B write
          435 messages have been written to permanent storage.
          436 .PP
          437 The
          438 .B goodbye
          439 message ends a session.  There is no
          440 .BR VtRgoodbye :
          441 upon receiving the
          442 .BR VtTgoodbye
          443 message, the server terminates up the connection.
          444 .PP
          445 Version
          446 .B 04
          447 of the Venti protocol is similar to version
          448 .B 02
          449 (described above)
          450 but has two changes to accomodates larger payloads.
          451 First, it replaces the leading 2-byte packet size with
          452 a 4-byte size.
          453 Second, the
          454 .I count
          455 in the
          456 .B VtTread
          457 packet may be either 2 or 4 bytes;
          458 the total packet length distinguishes the two cases.
          459 .SH SEE ALSO
          460 .MR venti (1) ,
          461 .MR venti (3) ,
          462 .MR venti (8)
          463 .br
          464 Sean Quinlan and Sean Dorward,
          465 ``Venti: a new approach to archival storage'',
          466 .I "Usenix Conference on File and Storage Technologies" ,
          467 2002.