tthe-wrong-sysadmin.txt - monochromatic - monochromatic blog: http://blog.z3bra.org
HTML git clone git://z3bra.org/monochromatic
DIR Log
DIR Files
DIR Refs
---
tthe-wrong-sysadmin.txt (8168B)
---
1 # The wrong sysadmin
2
3 28 April, 2015
4
5 *NOTE: This was replicated from the [Unix
6 Diary](http://nixers.net/showthread.php?tid=1539&pid=11836#pid11836) thread at
7 [http://nixers.net](http://nixers.net)*
8
9 Dear Unix diary,
10
11 today I've been a bad sysadmin.
12 It just happened. I host my own git repository, and earlier this evening I was working on my crux port tree, when I decided to commit and push my work. But this time, something went wrong and git didn't let me push any reference. Amongst all the messages returned by git, I saw this one:
13
14 remote: fatal: write error: No space left on device
15
16 Fucking shit. I instantly imagine what's happening: my /var partition wasn't correctly sized upon creation. This is where I host my website, gopherhole, git repo, pictures, videos, ... Every 'production' service. And after serving me well for several years, it's now full.
17
18 Hopefully, I had setup all my partitions on top of LVM, and let like 200GiB available, just in case things go wrong. And they did.
19
20 So here am I, staring at my red prompt, typing a few commands:
21
22 root ~# df -h
23 Filesystem Size Used Available Use% Mounted on
24 mdev 1.0M 0 1.0M 0% /dev
25 shm 499.4M 0 499.4M 0% /dev/shm
26 /dev/dm-1 4.0G 797.9M 3.2G 20% /
27 tmpfs 99.9M 208.0K 99.7M 0% /run
28 cgroup_root 10.0M 0 10.0M 0% /sys/fs/cgroup
29 /dev/sda1 96.8M 14.5M 77.3M 16% /boot
30 /dev/mapper/vg0-var 50.0G 50.0G 20.0K 100% /var
31 /dev/mapper/vg0-home 100.0G 12.9G 85.2G 13% /home
32 /dev/mapper/vg0-data 600.0G 346.7G 252.1G 58% /data
33 tmpfs 499.4M 0 499.4M 0% /tmp
34 tmpfs 499.4M 32.4M 467.0M 6% /home/z3bra/tmp
35 /dev/mapper/vg0-data 600.0G 346.7G 252.1G 58% /var/lib/mpd/music
36
37 root ~# mount | grep /var
38 /dev/mapper/vg0-var on /var type xfs (rw,relatime,attr2,inode64,noquota)
39
40 root ~# lvs
41 LV VG Attr LSize
42 data vg0 -wi-ao---- 600.00g
43 home vg0 -wi-ao---- 100.00g
44 root vg0 -wi-ao---- 4.00g
45 swap vg0 -wi-ao---- 1.00g
46 var vg0 -wi-ao---- 50.00g
47
48 root ~# vgs
49 VG #PV #LV #SN Attr VSize VFree
50 vg0 1 5 0 wz--n- 931.41g 176.41g
51
52 Ok, so it's not the first time this happens, remember? You already grew your /home partition, and it went good! Just do the same with /var! It works without a reboot!
53
54 What was those commands again?
55
56 root ~# lvextend -L +20G vg0/var
57 Extending logical volume var to 70.00 GiB
58 63e74d07f000-63e74d2c1000 r-xp 00000000 fd:01 8430401 /lib/libdevmapper.so.1.02: mlock failed: Out of memory
59 63e74d2c6000-63e74d4cb000 r-xp 00000000 fd:01 8430404 /lib/libdevmapper-event.so.1.02: mlock failed: Out of memory
60 Logical volume var successfully resized
61 Internal error: Reserved memory (9064448) not enough: used 9084928. Increase activation/reserved_memory?
62
63 root ~# lvs
64 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
65 data vg0 -wi-ao---- 600.00g
66 home vg0 -wi-ao---- 100.00g
67 root vg0 -wi-ao---- 4.00g
68 swap vg0 -wi-ao---- 1.00g
69 var vg0 -wi-ao---- 70.00g
70
71 root ~# xfs_growfs -d /var
72 meta-data=/dev/mapper/vg0-var isize=256 agcount=4, agsize=3276800 blks
73 = sectsz=4096 attr=2, projid32bit=1
74 = crc=0
75 data = bsize=4096 blocks=13107200, imaxpct=25
76 = sunit=0 swidth=0 blks
77 naming =version 2 bsize=4096 ascii-ci=0 ftype=0
78 log =internal bsize=4096 blocks=6400, version=2
79 = sectsz=4096 sunit=1 blks, lazy-count=1
80 realtime =none extsz=4096 blocks=0, rtextents=0
81 data blocks changed from 13107200 to 18350080
82
83 root ~# df -h
84 Filesystem Size Used Available Use% Mounted on
85 mdev 1.0M 0 1.0M 0% /dev
86 shm 499.4M 0 499.4M 0% /dev/shm
87 /dev/dm-1 4.0G 797.9M 3.2G 20% /
88 tmpfs 99.9M 208.0K 99.7M 0% /run
89 cgroup_root 10.0M 0 10.0M 0% /sys/fs/cgroup
90 /dev/sda1 96.8M 14.5M 77.3M 16% /boot
91 /dev/mapper/vg0-var 70.0G 50.0G 20.0G 71% /var
92 /dev/mapper/vg0-home 100.0G 12.9G 85.2G 13% /home
93 /dev/mapper/vg0-data 600.0G 346.7G 252.1G 58% /data
94 tmpfs 499.4M 0 499.4M 0% /tmp
95 tmpfs 499.4M 32.4M 467.0M 6% /home/z3bra/tmp
96 /dev/mapper/vg0-data 600.0G 346.7G 252.1G 58% /var/lib/mpd/music
97
98 Phew... I'm safe now! So what the hell was going on? I decided to investigate a bit further, to see what I should watch next time.
99 That's how I realised that I did a **HUGE** mistake...
100
101 root ~# cd /var/
102 root var# du -sh *
103 48.5G backup
104 156.7M cache
105 0 db
106 0 empty
107 228.8M git
108 5.7M gopher
109 4.5G lib
110 0 local
111 0 lock
112 7.9M log
113 0 mail
114 0 run
115 40.0K spool
116 0 tmp
117 1.1G www
118
119 root var# cd backup/
120
121 root backup# du -sh *
122 12.0K bin
123 20.0K etc
124 48.5G out
125 20.0K usr
126 84.0K var
127
128 root backup# mountpoint out
129 out is not a mountpoint
130
131 root backup# cd out/
132
133 root out# ll
134 total 50841516
135 drwxr-sr-x 2 backup users 4.0K Apr 28 02:11 ./
136 drwxr-sr-x 8 backup users 4.0K Feb 2 20:24 ../
137 -rw-r--r-- 1 backup users 5.3G Apr 25 07:43 data
138 -rw-r--r-- 1 backup users 0 Apr 25 07:43 data.0.BAK
139 -rw-r--r-- 1 backup users 12.0G Apr 26 04:37 homedir
140 -rw-r--r-- 1 backup users 12.0G Apr 22 04:43 homedir.0.BAK
141 -rw-r--r-- 1 backup users 12.0G Apr 25 05:00 homedir.1.BAK
142 -rw-r--r-- 1 backup users 44.0K Apr 26 04:42 homedir.2.BAK
143 -rw-r--r-- 1 backup users 1.2G Apr 28 02:11 production
144 -rw-r--r-- 1 backup users 1.2G Apr 21 02:10 production.0.BAK
145 -rw-r--r-- 1 backup users 1.2G Apr 22 02:11 production.1.BAK
146 -rw-r--r-- 1 backup users 1.2G Apr 23 02:11 production.2.BAK
147 -rw-r--r-- 1 backup users 1.2G Apr 24 02:11 production.3.BAK
148 -rw-r--r-- 1 backup users 1.2G Apr 25 02:12 production.4.BAK
149 -rw-r--r-- 1 backup users 0 Apr 26 02:11 production.5.BAK
150 -rw-r--r-- 1 backup users 5.3M Apr 27 02:12 production.6.BAK
151 -rw-r--r-- 1 backup users 0 Apr 28 02:11 production.7.BAK
152
153 My backup system doesn't check wether it saves to a mountpoint or not. Shit.
154 For a whole week, all my backups where created in my /var partition instead of a backup USB drive meant for this purpose. And it filled it up pretty quickly.
155
156 My backup system send me a mail after each backup, explaining me how it went. The fact it's saving to a mountpoint or not is written in it. I just stopped checking. Silly me.
157
158 I realise that this issue could have been easily solved by mounting my backup disk elsewhere, then moving the files, and remounting where it should be. But I didn't. Instead, I grew a partition that didn't need to be (the backups filled 48GiB out of 50Gib allocated to /var), and this partition can't be shrinked anymore, as it's an XFS filesystem.
159
160 So today I learnt two things, the hard way:
161
162 1. Don't do anything until you know what's going on
163 2. Configure systems checks and READ THEM
164
165 I hope you'll learn from my mistakes. For now I think I'll just print this over my desktop, as a reminder:
166
167 root ~# df -h /var/
168 Filesystem Size Used Available Use% Mounted on
169 /dev/mapper/vg0-var 70.0G 1.5G 68.5G 2% /var