URI: 
       tLinuxHDEncSettings.txt - tomb - the crypto undertaker
  HTML git clone git://parazyd.org/tomb.git
   DIR Log
   DIR Files
   DIR Refs
   DIR README
   DIR LICENSE
       ---
       tLinuxHDEncSettings.txt (20410B)
       ---
            1 
            2 Linux hard disk encryption settings
            3 
            4    This page intends to educate the reader about the existing weaknesses
            5    of the public-IV on-disk format commonly used with cryptoloop and
            6    dm-crypt (used in IV-plain mode). This page aims to facilitate risk
            7    calculation when utilising Linux hard disk encryption. The attacks
            8    presented on this page may pose a thread to you, but at the same time
            9    may be totally irrelevant for others. At the end of this document, the
           10    reader should be able to make a good choice according to his security
           11    needs.
           12 
           13    A good quote with respect to this topic is ''All security involves
           14    trade-offs'' from Beyond Fear (Bruce Schneier). You should keep in mind
           15    that perfect security is unachievable and by all means shouldn't be
           16    your goal. For instance, when using pass phrase based cryptography, you
           17    have to trust in that the underlying system is secure, the computer
           18    system has not been tampered with, and nobody is watching you. The most
           19    obvious weakness is the last one, but even if you make sure nobody nor
           20    any camera is around, how about the keyboard you're typing on? Has it
           21    been manipulated while you have been getting your lunch?
           22 
           23    So security comes for a price, and the price when designing
           24    cryptography security algorithms is performance. You will be introduced
           25    to the fastest of all setups available, the "public-IV", which
           26    sacrifices security properties for speed. After that we will talk about
           27    ESSIV, the newest of IV modes implemented. It comes for a small price,
           28    but it can deal with watermarking for a relatively small price. Then
           29    you'll be introduced to the draft specifications of the Security in
           30    Storage Working Group ([18]SISWG). Currently SISWG is considering EME
           31    and LRW for standardisation. EME along with it's cousin CMC seems to
           32    provide the best security level, but imposes additional encryption
           33    steps. Plumb-IV is discussed only for reference, because it has the
           34    same performance penalty as CMC, but in constrast suffers from
           35    weaknesses of CBC encryption.
           36 
           37    As convention, this document will use the term "blocks", when it
           38    referes to a single block of plain or cipher text (usually 16 byte),
           39    and will use the term "sectors", when it refers to a 512-byte wide hard
           40    disk block.
           41 
           42 CBC Mode: The basic level
           43 
           44    Most hard disk encryption systems utilise CBC to encrypt bulk data.
           45    Good descriptions on CBC and other common cipher modes are available at
           46      * [19]Wikipedia
           47      * [20]Connected: An Internet Encyclopedia
           48      * [21]NIST: Recommendation for Block Cipher Modes of Operation (CBC
           49        is at PDF Page 17)
           50 
           51    Please make sure you're familiar with CBC before proceeding.
           52 
           53    Since CBC encryption is a recursive algorithm, the encryption of the
           54    n-th block requires the encryption of all preceding blocks, 0 till n-1.
           55    Thus, if we would run the whole hard disk encryption in CBC mode, one
           56    would have to re-encrypt the whole hard disk, if the first computation
           57    step changed, this is, when the first plain text block changed. Of
           58    course, this is an undesired property, therefore the CBC chaining is
           59    cut every sector and restarted with a new initialisation vector (IV),
           60    so we can encrypt sectors individually. The choice of the sector as
           61    smallest unit matches with the smallest unit of hard disks, where a
           62    sector is also atomic in terms of access.
           63 
           64    For reference, I will give a formal definition of CBC encryption and
           65    decryption. Note, that decryption is not recursive, in contrast to
           66    encryption, since it's a function only of C[n-1] and C[n].
           67    Encryption:
           68    C[-1] = IV
           69    C[n] = E(P[n] ⊕ C[n-1])
           70    Decryption:
           71    C[-1] = IV
           72    P[n] = C[n-1] ⊕ D(C[n])
           73    The next sections will deal with how this IV is chosen.
           74 
           75 The IV Modes
           76 
           77 The "public-IV"
           78 
           79    The IV for sector n is simply the 32-bit version of the number n
           80    encoded in little-endian padded with zeros to the block-size of the
           81    cipher used, if necessary. This is the most simple IV mode, but at the
           82    same the most vulnerable.
           83 
           84 ESSIV
           85 
           86    E(Sector|Salt) IV, short ESSIV, derives the IV from key material via
           87    encryption of the sector number with a hashed version of the key
           88    material, the salt. ESSIV does not specify a particular hash algorithm,
           89    but the digest size of the hash must be an accepted key size for the
           90    block cipher in use. As the IV depends on a none public piece of
           91    information, the key, the sequence of IV is not known, and the attacks
           92    based on this can't be launched.
           93 
           94 plumb IV
           95 
           96    The IV is computed by hashing (or MAC-ing) the plain text from the
           97    second block till the last. Additionally, the sector number and the key
           98    are used as input as well. If a byte changes in the plain text of the
           99    blocks 2 till n, the first block is influenced by the change of the IV.
          100    As the first encryption effects all subsequent encryption steps due to
          101    the nature of CBC, the whole sector is changed.
          102 
          103    Decryption is possible because CBC is not recursive for decryption. The
          104    prerequisites for a successful CBC decryption are two subsequent cipher
          105    blocks. The former one is decrypted and the first one is XOR-ed into
          106    the decryption result yielding the original plain text. Therefore
          107    independent of the IV scheme, decryption is possible from the 2nd to
          108    the last block. After the recovery of these plain text blocks, the IV
          109    can be computed, and finally the first block can be decrypted as well.
          110 
          111    The only weakness of this scheme is it's performance. It has to process
          112    data twice: first for obtaining the IV, and then to produce the CBC
          113    encryption with this IV. With the same performance penalty CMC is able
          114    to achieve better security properties (CMC is discussed later), thus
          115    plumb-IV will remain unimplemented.
          116 
          117 The attack arsenal
          118 
          119 Content leaks
          120 
          121    This attack can be mounted against any system operating in CBC Mode. It
          122    rests on the property, that in CBC decryption, the preceding cipher
          123    block's influence is simple, that is, it's XORed into the plain text.
          124    The preceding cipher block, C[n-1], is readily available on disk (for n
          125    > 0) or may be deduced from the IV (for n = 0). If an attacker finds
          126    two blocks with identical cipher text, he knows that both cipher texts
          127    have been formed according to:
          128    C[m] = E(P[m] ⊕ C[m-1] )
          129    C[n] = E(P[n] ⊕ C[n-1] )
          130    Since he found that C[m] = C[n], it holds
          131    P[m] ⊕ C[m-1] = P[n] ⊕ C[n-1]
          132    which can be rewritten as
          133    C[m-1] ⊕ C[n-1] = P[n] ⊕ P[m]
          134    The left hand side is known to the attacker by reading the preceding
          135    cipher text from disk. If one of the blocks is the first block of a
          136    sector, the IV must be examined instead (when it's available as it is
          137    in public-IV). The attacker is now able to deduce the difference
          138    between the plain texts by examining the difference of C[m-1] and
          139    C[n-1]. If one of the plain text blocks happens to be zero, the
          140    difference yields the original content of the other related plain text
          141    block.
          142 
          143    Another information is available to the attacker. Any succeeding
          144    identical pair of cipher text, that follows the initial identical
          145    cipher pair, is equal. No information about the content of those pairs
          146    can be extracted, since the information is extracted from the
          147    respective preceding cipher blocks, but those are all required to be
          148    equal.
          149 
          150    Let's have a look at the chance of succeeding with this attack.
          151    Assuming the output of a cipher forms an uniform distribution, the
          152    chance, p, of finding an identical block is 2^-blocksize. For instance,
          153    p = 1/2^128 for a 128-bit cipher. Because the number of possible pairs
          154    develops as an arithmetic series in n, the number of sectors, the
          155    chance of not finding two identical blocks is given by
          156    (1-p)^n(n-1)/2
          157    As p is very small, but in contrast the power is very big, we apply the
          158    logarithm to get meaningful answers, that is
          159    n(n-1)/2 ln (1-p)
          160    An example: The number of cipher blocks available on 200GB disk with
          161    known C[n-1] is 200GB × 1024^2 KB/GB × 64/1KB ^1. Or in other words, a
          162    128-bit block is 16 bytes, so the number of 16-byte blocks in a 200GB
          163    hard disk is 13.4 billion. Therefore, n = 1.342e10. For a 128-bit
          164    cipher, p = 2^-128. Hence,
          165    ln(1-p) = -2.939e-39
          166    n(n-1)/2 = 9.007e19
          167 
          168    n(n-1)/2 ln (1-p) = -2.647e-19
          169    1-e^-2.776e-13 = 2.647e-19
          170    The last term is the chance of finding at least one pair of identical
          171    cipher blocks. But how does this number grow in n? Obviously
          172    exponentially. Plotting a few a decimal powers shows that the chance
          173    for finding at least on identical cipher pair flips to 1 around n =
          174    10^20 (n = 10^40 for a 256-bit cipher). This inflexion point is reached
          175    for a 146 million TB storage (or a hundered thousand trillion trillions
          176    TB storage for a 256-bit cipher).
          177 
          178    ^1The blocks with available preceding cipher blocks is 62/1KB for all
          179    non-public IV schemes, i.e. ESSIV/plumb IV
          180 
          181 Data existence leak: The Watermark
          182 
          183    No IV format discussed on this page allows the user to deny the
          184    existence of encrypted data. Neither cryptoloop nor dm-crypt is an
          185    implementation of a deniable cryptography system. But the problem is
          186    more serious with public-IV.
          187 
          188    With public IV and the predicable difference it introduces in the first
          189    blocks of a sequence of plain text, data can be watermarked, which
          190    means, the watermarked data is detectable even when the key has not
          191    been recovered. As shown in the paragraph above, the existence of two
          192    blocks with identical cipher text is very unlikely and coincidence can
          193    be excluded, which is relevant when somebody tries to demonstrate
          194    before the law that certain data is in an encrypted partition.
          195 
          196    As the IV progresses with a foreseeable pattern and is guaranteed to
          197    change the least significant bit ever step, we can build identical pair
          198    of cipher text by writing three consecutive sectors each with a flipped
          199    LSB relative to the previous. (The reason it's three instead of two is,
          200    that the second least significant bit might change as well.) This
          201    "public-IV"-driven CBC encryption will output exactly the same cipher
          202    text for two consecutive sectors. An attacker can search the disk for
          203    identical consecutive blocks to find the watermark. This can be done in
          204    a single pass, and is much more feasible than finding to identical
          205    blocks, that are scattered on the disk, as in the previous attack. A
          206    few bits of information can be encoded into the watermarks, which might
          207    serve as tag to prove the existence copyright infringing material.
          208 
          209    A complete description of watermarking can be found in [22]Encrypted
          210    Watermarks and Linux Laptop Security. The attack can be defeated by
          211    using ESSIV.
          212 
          213 Data modification leak
          214 
          215    CBC encryption is recursive, so the n-th block depends on all previous
          216    blocks. But the other way round would also be nice. Why? The weakness
          217    becomes visible, if storage on a remote computer is used, or more
          218    likely, the hard disk exhibits good forensic properties. The point is,
          219    the attacker has to have access to preceding (in time) cipher text of a
          220    sector, either by recording it from the network, or by using forensic
          221    methods.
          222 
          223    An attacker can now guess data modification patterns by examining the
          224    historic data. If a sector is overwritten with a partial changed plain
          225    text, there is an amount of bytes at the beginning, which are
          226    unchanged. This point of change^2 is directly reflected in the cipher
          227    text. So an attacker can deduce the point of the change in plain text
          228    by finding the point where the cipher text starts to differ.
          229 
          230    This weakness is present in public-IV and ESSIV.
          231 
          232    ^2aligned to the cipher block size boundaries
          233 
          234 Malleable plain text
          235 
          236    The decryption structure of CBC is the source of this weakness.
          237    Malleability (with respect to cryptography) is defined as a
          238    modification of the cipher text that will resulting in a predictable
          239    change in plain text. To put it formally, there is a function f(C),
          240    that, if applied to the cipher text, C' = f(C), will result in a known
          241    function f', which will predict the resulting plain text, P' = D(C'),
          242    correctly assuming P is known, that is P' = f'(P).
          243 
          244    As we can see in it's definition, CBC decryption depends on C[n-1]. An
          245    attacker can flip arbitrary bits in the plain text by flipping bit in
          246    C[n-1]. More formally^3, if
          247    P = P[1] || P[2] || ... || P[i] || ... || P[n]
          248    C = E[CBC](P)
          249    C = C[1] || C[2] || ... || C[i-1] || ... || C[n]
          250    the function
          251    f(C[1] || ... || C[n]) = C[1] || ... || C[i-1] XOR M || ... || C[n]
          252    follows the function f', which predicts the resulting plain text
          253    correctly as,
          254    f'(P[1] || ... || P[n]) = P[1] || ... || P[i] XOR M || ... || P[n]
          255    The first block of the CBC cipher text stream is not malleable, because
          256    it depends on the IV, which is not modifiable for an attacker.
          257 
          258    ^3The IV parameter for E[CBC] has been intentionally omitted.
          259 
          260 Movable
          261 
          262    On the expense of one block decrypting to garbage, an attacker can move
          263    around plain text as he likes. CBC decryption depends on two variables,
          264    C[n-1] and C[n]. Both can be modified at free will. To make meaningful
          265    modifications, an attacker has to replace the pair C[n-1] and C[n] with
          266    other cipher text pair from disk. The first block C[n-1] will decrypt
          267    to garbage, but the second block C[n] will yield a copy of the plain
          268    text of the copied cipher block. This attack is also known as
          269    copy&paste attack. This attack is mountable against any CBC setup. The
          270    only limitation is, the first block, C[0], can't be replaced with
          271    something meaningful, as C[-1] can't be modified, because it's the IV.
          272 
          273 CMC and EME: Tweakable wide block cipher modes
          274 
          275    CMC is a new chaining mode. It stands for ''CBC-Mask-CBC''. It works by
          276    processing the data in three steps, first CBC, then masking the cipher
          277    text, and then another CBC step, but this time backwards. The last step
          278    introduces a dependency from the last block to the first block. The
          279    authors of the CMC paper provide a prove for the security of this mode,
          280    making a secure 128-bit cipher a secure 4096-bit cipher (sector size).
          281    As in normal CBC, this scheme also takes an IV, but the authors call it
          282    tweak.
          283 
          284    EME is CMC's cousin. EME has also been authored by Haveli and Rogaway
          285    as well been authored for the same purpose. The difference to CMC is,
          286    that EME is parallelizable, that is, all operations of the underlying
          287    cipher can be evaluated in parallel. To introduce an interdependency
          288    among the resulting cipher blocks, the encryption happens in two
          289    stages. Between these stages a mask is computed from all intermediate
          290    blocks and applied to each intermediate block. This step causes an
          291    interdependency among the cipher blocks. After applying the mask,
          292    another encryption step diffuses the mask.
          293 
          294    The interdependency among the resulting blocks allow CMC and EME to be
          295    nonmovable, nonmalleable, to prevent content leaks and in-sector data
          296    modification patterns. The tweaks are encrypted by both cipher modes,
          297    thus both are nonwatermarkable.
          298 
          299    For simplicity, the EME description above omitted the pre- and post-
          300    whitening steps as well as the multiplications in GF(2^128). An
          301    in-depth specification can be found at the [23]Cryptology ePrint
          302    Archive. An applicable draft specification for EME-32-AES can be found
          303    at [24]SISWG. I have written an EME-32-AES test implementation for
          304    Linux 2.6. It's available [25]here. The CMC paper is available from the
          305    [26]Cryptology ePrint Archive as well.
          306 
          307 LRW: A tweakable narrow block cipher mode
          308 
          309    EME as well as CMC are comparatively secure cipher modes, but heavy in
          310    terms of performance. LRW tries to cope with most of security
          311    requirements, and at the same time provide a good performance. LRW is a
          312    narrow block cipher mode, that is, it operates only on one block,
          313    instead of a whole sector. To make a cipher block tied to a location on
          314    disk (to make it unmovable), a logical index is included in the
          315    computation. For LRW you have to provide two keys, one for the cipher
          316    and one for the cipher mode. The second key is multiplied with a
          317    logical index under GF(2^128) and used as pre- and post- whitening for
          318    encryption. With those whitening steps the block is effectively tied to
          319    a logical index. The logical index is usually the absolute position on
          320    disk measured with the block size of the cipher algorithm. The
          321    different choice of the measuring unit is the only different between
          322    the logical index and the public-IV.
          323 
          324    The LRW draft is available from the [27]SISWG mailing list archive.
          325 
          326 Summarising
          327 
          328    The following table shows a comparison between the security properties
          329    of different encryption setups and their computational costs. The
          330    number of cipher calls, XOR operations and additional operations are
          331    stated in terms of encryption blocks, n.
          332    IV mode cipher mode content leaks watermarkable malleable movable
          333    modification detection^5 cipher calls XOR ops additional op.
          334    public-IV CBC Yes Yes Yes Yes Yes n n None
          335    ESSIV CBC Yes No Yes Yes Yes n+1 n None
          336    Plumb-IV1^4 CBC Yes No Yes Yes No 2n-1 2n None
          337    public-IV CMC No No No No No 2n+1 2n+1 1 LW GF ⊗
          338    public-IV EME No No No No No 2n+1 5n 3n-1 LW GF ⊗
          339    public-IV LRW No No No No Yes n 2n n HW GF ⊗
          340 
          341    Legend:
          342      * LW GF ⊗: light-weight Galois field multiplication, that is, a
          343        multiplication with a constant x^2^i, which can be computed in
          344        θ(1).
          345      * HW GF ⊗: heavy-weight Galois field multiplication, that is, a
          346        multiplication with an arbitrary constant, which can be computed in
          347        θ(bits).
          348 
          349    ^4plumb-IV1 uses CBC-MAC instead of hashing, so we can make a good
          350    comparison with other ciphers in terms of cipher/XOR calls.
          351    ^5detectable partial in-sector modification
          352      __________________________________________________________________
          353 
          354    Clemens Fruhwirth, , also author of LUKS and ESSIV, porter of
          355    cryptoloop, aes-i586 for 2.6., twofish-i586, and implementor of
          356    EME-32-AES. This text is an excerpt of my diploma thesis.
          357 
          358 This page has been reviewed by
          359 
          360    Dr. Ernst Molitor
          361    Arno Wagner
          362    James Hughes , "Security in Storage Working Group" chair
          363 
          364    Additional thanks to Pascal Brisset, for pointing out an error in the
          365    Bernoulli estimation in an earlier version of this document, further
          366    Adam J. Richter for pointing out an error in the KB/GB ratio.
          367 
          368    Content and design, Copyright © 2004-2008 Clemens Fruhwirth, unless
          369    stated otherwise
          370    Original design by [28]haran | Additional art by [29]LinuxArt | | Blog
          371    by [30]NanoBlogger
          372 
          373 References
          374 
          375    1. http://clemens.endorphin.org/
          376    2. http://clemens.endorphin.org/credits
          377    3. http://clemens.endorphin.org/aboutme
          378    4. http://clemens.endorphin.org/cryptography
          379    5. http://blog.clemens.endorphin.org/
          380    6. http://clemens.endorphin.org/patches
          381    7. http://clemens.endorphin.org/archive
          382    8. http://clemens.endorphin.org/Cryptoloop_Migration_Guide
          383    9. http://clemens.endorphin.org/LUKS
          384   10. http://clemens.endorphin.org/AFsplitter
          385   11. http://clemens.endorphin.org/lo-tracker
          386   12. http://blog.clemens.endorphin.org/2008/12/luks-on-disk-format-revision-111.html
          387   13. http://blog.clemens.endorphin.org/2008/11/xmonad-gridselect.html
          388   14. http://blog.clemens.endorphin.org/2008/11/workaround-for-bittorrent-traffic.html
          389   15. http://blog.clemens.endorphin.org/2008/09/i-love-lolcat-meme.html
          390   16. http://blog.clemens.endorphin.org/2008/09/counter-steganography-research.html
          391   17. http://clemens.endorphin.org/cryptography
          392   18. http://www.siswg.org/
          393   19. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
          394   20. http://www.freesoft.org/CIE/Topics/143.htm
          395   21. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf
          396   22. http://www.tcs.hut.fi/~mjos/doc/wisa2004.pdf
          397   23. http://eprint.iacr.org/2003/147/
          398   24. http://grouper.ieee.org/groups/1619/email/pdf00011.pdf
          399   25. http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/544
          400   26. http://eprint.iacr.org/2003/148/
          401   27. http://grouper.ieee.org/groups/1619/email/msg00160.html
          402   28. http://www.oswd.org/user/profile/id/3013
          403   29. http://www.linuxart.com/
          404   30. http://nanoblogger.sourceforge.net/