Title: How to check your data integrity?
Author: Solène
Date: 17 March 2017
Tags: unix security
Description:
Today, the topic is data degradation, bit rot, birotting, damaged files
or whatever you call it. It's when your data get corrupted over the
time, due to disk fault or some unknown reason.
# What is data degradation ? #
I shamelessy paste one line from wikipedia: "*Data degradation is the
gradual corruption of computer data due to an accumulation of
non-critical failures in a data storage device. The phenomenon is also
known as data decay or data rot.*".
[Data degradation on
Wikipedia](https://en.wikipedia.org/wiki/Data_degradation)
So, how do we know we encounter a bit rot ?
bit rot = (checksum changed) && NOT (modification time changed)
While updating a file could be mistaken as bit rot, there is a
difference
update = (checksum changed) && (modification time changed)
# How to check if we encounter bitrot ? #
There is no way you can prevent bitrot. But there are some ways to
detect it, so you can restore a corrupted file from a backup, or
repair it with the right tool (you can't repair a file with a hammer,
except if it's some kind of HammerFS ! :D )
In the following I will describe software I found to check (or even
repair) bitrot. If you know others tools which are not in this list, I
would be happy to hear about it, please mail me.
In the following examples, I will use this method to generate bitrot
on a file:
% touch -d "2017-03-16T21:04:00"
my_data/some_file_that_will_be_corrupted
% generate_checksum_database_with_tool
% echo "a" >> my_data/some_file_that_will_be_corrupted
% touch -d "2017-03-16T21:04:00"
my_data/some_file_that_will_be_corrupted
% start_tool_for_checking
We generate the checksum database, then we alter a file by adding a
"a" at the end of the file and we restore the modification and acess
time of the file. Then, we start the tool to check for data
corruption.
The first **touch** is only for convenience, we could get the
modification time with **stat** command and pass the same value to
touch after modification of the file.
## bitrot ##
This is a python script, it's **very** easy to use. I will scan a
directory and create a database with the checksum of the files and
their modification date.
**Initialization usage:**
% cd /home/my_data/
% bitrot
Finished. 199.41 MiB of data read. 0 errors found.
189 entries in the database, 189 new, 0 updated, 0 renamed, 0
missing.
Updating bitrot.sha512... done.
% echo $?
0
**Verify usage (case OK):**
% cd /home/my_data/
% bitrot
Checking bitrot.db integrity... ok.
Finished. 199.41 MiB of data read. 0 errors found.
189 entries in the database, 0 new, 0 updated, 0 renamed, 0
missing.
% echo $?
0
Exit status is 0, so our data are not damaged.
**Verify usage (case Error):**
% cd /home/my_data/
% bitrot
Checking bitrot.db integrity... ok.
error: SHA1 mismatch for ./sometextfile.txt: expected
17b4d7bf382057dc3344ea230a595064b579396f, got
db4a8d7e27bb9ad02982c0686cab327b146ba80d. Last good hash checked on
2017-03-16 21:04:39.
Finished. 199.41 MiB of data read. 1 errors found.
189 entries in the database, 0 new, 0 updated, 0 renamed, 0
missing.
error: There were 1 errors found.
% echo $?
1
dataswamp.org:70 /~solene/article-integrity:107: port field too long