URI: 
       tthe-wrong-sysadmin.txt - monochromatic - monochromatic blog: http://blog.z3bra.org
  HTML git clone git://z3bra.org/monochromatic
   DIR Log
   DIR Files
   DIR Refs
       ---
       tthe-wrong-sysadmin.txt (8168B)
       ---
            1 # The wrong sysadmin
            2 
            3 28 April, 2015
            4 
            5 *NOTE: This was replicated from the [Unix
            6 Diary](http://nixers.net/showthread.php?tid=1539&pid=11836#pid11836) thread at
            7 [http://nixers.net](http://nixers.net)*
            8 
            9 Dear Unix diary,
           10 
           11 today I've been a bad sysadmin.
           12 It just happened. I host my own git repository, and earlier this evening I was working on my crux port tree, when I decided to commit and push my work. But this time, something went wrong and git didn't let me push any reference. Amongst all the messages returned by git, I saw this one:
           13 
           14     remote: fatal: write error: No space left on device
           15 
           16 Fucking shit. I instantly imagine what's happening: my /var partition wasn't correctly sized upon creation. This is where I host my website, gopherhole, git repo, pictures, videos, ... Every 'production' service. And after serving me well for several years, it's now full.
           17 
           18 Hopefully, I had setup all my partitions on top of LVM, and let like 200GiB available, just in case things go wrong. And they did.
           19 
           20 So here am I, staring at my red prompt, typing a few commands:
           21 
           22     root ~# df -h
           23     Filesystem                Size      Used Available Use% Mounted on
           24     mdev                      1.0M         0      1.0M   0% /dev
           25     shm                     499.4M         0    499.4M   0% /dev/shm
           26     /dev/dm-1                 4.0G    797.9M      3.2G  20% /
           27     tmpfs                    99.9M    208.0K     99.7M   0% /run
           28     cgroup_root              10.0M         0     10.0M   0% /sys/fs/cgroup
           29     /dev/sda1                96.8M     14.5M     77.3M  16% /boot
           30     /dev/mapper/vg0-var      50.0G     50.0G     20.0K 100% /var
           31     /dev/mapper/vg0-home    100.0G     12.9G     85.2G  13% /home
           32     /dev/mapper/vg0-data    600.0G    346.7G    252.1G  58% /data
           33     tmpfs                   499.4M         0    499.4M   0% /tmp
           34     tmpfs                   499.4M     32.4M    467.0M   6% /home/z3bra/tmp
           35     /dev/mapper/vg0-data    600.0G    346.7G    252.1G  58% /var/lib/mpd/music
           36 
           37     root ~# mount | grep /var
           38     /dev/mapper/vg0-var on /var type xfs (rw,relatime,attr2,inode64,noquota)
           39 
           40     root ~# lvs
           41       LV   VG   Attr       LSize
           42       data vg0  -wi-ao---- 600.00g
           43       home vg0  -wi-ao---- 100.00g
           44       root vg0  -wi-ao----   4.00g
           45       swap vg0  -wi-ao----   1.00g
           46       var  vg0  -wi-ao----  50.00g
           47 
           48     root ~# vgs
           49       VG   #PV #LV #SN Attr   VSize   VFree
           50       vg0    1   5   0 wz--n- 931.41g 176.41g
           51 
           52 Ok, so it's not the first time this happens, remember? You already grew your /home partition, and it went good! Just do the same with /var! It works without a reboot!
           53 
           54 What was those commands again?
           55 
           56     root ~# lvextend -L +20G vg0/var
           57       Extending logical volume var to 70.00 GiB
           58       63e74d07f000-63e74d2c1000 r-xp 00000000 fd:01 8430401                    /lib/libdevmapper.so.1.02: mlock failed: Out of memory
           59       63e74d2c6000-63e74d4cb000 r-xp 00000000 fd:01 8430404                    /lib/libdevmapper-event.so.1.02: mlock failed: Out of memory
           60       Logical volume var successfully resized
           61       Internal error: Reserved memory (9064448) not enough: used 9084928. Increase activation/reserved_memory?
           62 
           63     root ~# lvs
           64       LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
           65       data vg0  -wi-ao---- 600.00g
           66       home vg0  -wi-ao---- 100.00g
           67       root vg0  -wi-ao----   4.00g
           68       swap vg0  -wi-ao----   1.00g
           69       var  vg0  -wi-ao----  70.00g
           70 
           71     root ~# xfs_growfs -d /var
           72     meta-data=/dev/mapper/vg0-var    isize=256    agcount=4, agsize=3276800 blks
           73              =                       sectsz=4096  attr=2, projid32bit=1
           74              =                       crc=0
           75     data     =                       bsize=4096   blocks=13107200, imaxpct=25
           76              =                       sunit=0      swidth=0 blks
           77     naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
           78     log      =internal               bsize=4096   blocks=6400, version=2
           79              =                       sectsz=4096  sunit=1 blks, lazy-count=1
           80     realtime =none                   extsz=4096   blocks=0, rtextents=0
           81     data blocks changed from 13107200 to 18350080
           82 
           83     root ~# df -h
           84     Filesystem                Size      Used Available Use% Mounted on
           85     mdev                      1.0M         0      1.0M   0% /dev
           86     shm                     499.4M         0    499.4M   0% /dev/shm
           87     /dev/dm-1                 4.0G    797.9M      3.2G  20% /
           88     tmpfs                    99.9M    208.0K     99.7M   0% /run
           89     cgroup_root              10.0M         0     10.0M   0% /sys/fs/cgroup
           90     /dev/sda1                96.8M     14.5M     77.3M  16% /boot
           91     /dev/mapper/vg0-var      70.0G     50.0G     20.0G  71% /var
           92     /dev/mapper/vg0-home    100.0G     12.9G     85.2G  13% /home
           93     /dev/mapper/vg0-data    600.0G    346.7G    252.1G  58% /data
           94     tmpfs                   499.4M         0    499.4M   0% /tmp
           95     tmpfs                   499.4M     32.4M    467.0M   6% /home/z3bra/tmp
           96     /dev/mapper/vg0-data    600.0G    346.7G    252.1G  58% /var/lib/mpd/music
           97 
           98 Phew... I'm safe now! So what the hell was going on? I decided to investigate a bit further, to see what I should watch next time.
           99 That's how I realised that I did a **HUGE** mistake...
          100 
          101     root ~# cd /var/
          102     root var# du -sh *
          103     48.5G   backup
          104     156.7M  cache
          105     0       db
          106     0       empty
          107     228.8M  git
          108     5.7M    gopher
          109     4.5G    lib
          110     0       local
          111     0       lock
          112     7.9M    log
          113     0       mail
          114     0       run
          115     40.0K   spool
          116     0       tmp
          117     1.1G    www
          118 
          119     root var# cd backup/
          120 
          121     root backup# du -sh *
          122     12.0K   bin
          123     20.0K   etc
          124     48.5G   out
          125     20.0K   usr
          126     84.0K   var
          127 
          128     root backup# mountpoint out
          129     out is not a mountpoint
          130 
          131     root backup# cd out/
          132 
          133     root out# ll
          134     total 50841516
          135     drwxr-sr-x    2 backup   users       4.0K Apr 28 02:11 ./
          136     drwxr-sr-x    8 backup   users       4.0K Feb  2 20:24 ../
          137     -rw-r--r--    1 backup   users       5.3G Apr 25 07:43 data
          138     -rw-r--r--    1 backup   users          0 Apr 25 07:43 data.0.BAK
          139     -rw-r--r--    1 backup   users      12.0G Apr 26 04:37 homedir
          140     -rw-r--r--    1 backup   users      12.0G Apr 22 04:43 homedir.0.BAK
          141     -rw-r--r--    1 backup   users      12.0G Apr 25 05:00 homedir.1.BAK
          142     -rw-r--r--    1 backup   users      44.0K Apr 26 04:42 homedir.2.BAK
          143     -rw-r--r--    1 backup   users       1.2G Apr 28 02:11 production
          144     -rw-r--r--    1 backup   users       1.2G Apr 21 02:10 production.0.BAK
          145     -rw-r--r--    1 backup   users       1.2G Apr 22 02:11 production.1.BAK
          146     -rw-r--r--    1 backup   users       1.2G Apr 23 02:11 production.2.BAK
          147     -rw-r--r--    1 backup   users       1.2G Apr 24 02:11 production.3.BAK
          148     -rw-r--r--    1 backup   users       1.2G Apr 25 02:12 production.4.BAK
          149     -rw-r--r--    1 backup   users          0 Apr 26 02:11 production.5.BAK
          150     -rw-r--r--    1 backup   users       5.3M Apr 27 02:12 production.6.BAK
          151     -rw-r--r--    1 backup   users          0 Apr 28 02:11 production.7.BAK
          152 
          153 My backup system doesn't check wether it saves to a mountpoint or not. Shit.
          154 For a whole week, all my backups where created in my /var partition instead of a backup USB drive meant for this purpose. And it filled it up pretty quickly.
          155 
          156 My backup system send me a mail after each backup, explaining me how it went. The fact it's saving to a mountpoint or not is written in it. I just stopped checking. Silly me.
          157 
          158 I realise that this issue could have been easily solved by mounting my backup disk elsewhere, then moving the files, and remounting where it should be. But I didn't. Instead, I grew a partition that didn't need to be (the backups filled 48GiB out of 50Gib allocated to /var), and this partition can't be shrinked anymore, as it's an XFS filesystem.
          159 
          160 So today I learnt two things, the hard way:
          161 
          162 1. Don't do anything until you know what's going on
          163 2. Configure systems checks and READ THEM
          164 
          165 I hope you'll learn from my mistakes. For now I think I'll just print this over my desktop, as a reminder:
          166 
          167     root ~# df -h /var/
          168     Filesystem                Size      Used Available Use% Mounted on
          169     /dev/mapper/vg0-var      70.0G      1.5G     68.5G   2% /var