Can you suggest improvements to this process of maintaining a large hosts file?
When I find an objectionable site (advertisement, redirect, flash content, etc.) I simply add the domain to my hosts file and never have to see that content again.
But every few months, I find a few more unwanted parasites slipping through, so I pick up a new (huge) hosts file, from, say:
What I do is save the hosts.txt file to /tmp/hosts.txt and then strip out all but the redirects (remove double spaces, remove comments, & remove the localhost line)
My first problem is getting the syntax right for the removal of the localhosts line (I actually have to delete that one line manually because some of the valid lines to keep also have localhosts in the domain name).
grep -v \# hosts.txt | sed -e 's/ / /g' | grep -v 127.0.0.1 localhost\$ | sort -u >> /etc/hosts
NOTE: The sed is removing redundant spaces so it's consistent with my existing hosts file.
Then, I try to cull out duplicates, using:
Unfortunately, that process moves the localhost and other top-level lines to a lower level which I then have to manually bring back to the top to keep a semblance of the original hosts order.
sort -u /etc/hosts -o /etc/hosts
This process works - but it could use an improvement.
127.0.0.1 machine localhost.localdomain localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
# begin parasitic blocks