URI: 
       Title: How to check your data integrity?
       Author: Solène
       Date: 17 March 2017
       Tags: unix security
       Description: 
       
       Today, the topic is data degradation, bit rot, birotting, damaged files
       or whatever you call it. It's when your data get corrupted over the
       time, due to disk fault or some unknown reason.
       
       # What is data degradation ? #
       
       I shamelessy paste one line from wikipedia: "*Data degradation is the
       gradual corruption of computer data due to an accumulation of
       non-critical failures in a data storage device. The phenomenon is also
       known as data decay or data rot.*".
       
       [Data degradation on
       Wikipedia](https://en.wikipedia.org/wiki/Data_degradation)
       
       So, how do we know we encounter a bit rot ?
       
           bit rot = (checksum changed) && NOT (modification time changed)
       
       While updating a file could be mistaken as bit rot, there is a
       difference
       
           update = (checksum changed) && (modification time changed)
       
       # How to check if we encounter bitrot ? #
       
       There is no way you can prevent bitrot. But there are some ways to
       detect it, so you can restore a corrupted file from a backup, or
       repair it with the right tool (you can't repair a file with a hammer,
       except if it's some kind of HammerFS ! :D )
       
       In the following I will describe software I found to check (or even
       repair) bitrot. If you know others tools which are not in this list, I
       would be happy to hear about it, please mail me.
       
       In the following examples, I will use this method to generate bitrot
       on a file:
       
           % touch -d "2017-03-16T21:04:00"
       my_data/some_file_that_will_be_corrupted
           % generate_checksum_database_with_tool
           % echo "a" >> my_data/some_file_that_will_be_corrupted
           % touch -d "2017-03-16T21:04:00"
       my_data/some_file_that_will_be_corrupted
           % start_tool_for_checking
       
       We generate the checksum database, then we alter a file by adding a
       "a" at the end of the file and we restore the modification and acess
       time of the file. Then, we start the tool to check for data
       corruption.
       
       The first **touch** is only for convenience, we could get the
       modification time with **stat** command and pass the same value to
       touch after modification of the file.
       
       ## bitrot ##
       
       This is a python script, it's **very** easy to use. I will scan a
       directory and create a database with the checksum of the files and
       their modification date.
       
       **Initialization usage:**
       
           % cd /home/my_data/
           % bitrot
           Finished. 199.41 MiB of data read. 0 errors found.
           189 entries in the database, 189 new, 0 updated, 0 renamed, 0
       missing.
           Updating bitrot.sha512... done.
           % echo $?
           0
       
       **Verify usage (case OK):**
       
           % cd /home/my_data/
           % bitrot
           Checking bitrot.db integrity... ok.
           Finished. 199.41 MiB of data read. 0 errors found.
           189 entries in the database, 0 new, 0 updated, 0 renamed, 0
       missing.
           % echo $?
           0
       
       Exit status is 0, so our data are not damaged.
       
       **Verify usage (case Error):**
       
       
           % cd /home/my_data/
           % bitrot
           Checking bitrot.db integrity... ok.
           error: SHA1 mismatch for ./sometextfile.txt: expected
       17b4d7bf382057dc3344ea230a595064b579396f, got
       db4a8d7e27bb9ad02982c0686cab327b146ba80d. Last good hash checked on
       2017-03-16 21:04:39.
           Finished. 199.41 MiB of data read. 1 errors found.
           189 entries in the database, 0 new, 0 updated, 0 renamed, 0
       missing.
           error: There were 1 errors found.
           % echo $?
           1
dataswamp.org:70 /~solene/article-integrity:107: port field too long