Linux Blog

Remove lines that are in another file

Filed under: Shell Script Sundays — TheLinuxBlog.com at 6:52 pm on Sunday, February 7, 2010

Remove lines from a file that exist in another fileI had an issue this week where I needed to remove lines from one file if they existed in another file. Looking back it was frustrating as such a task should be simple.

I tried all sorts of things. Differencing the two files and using grep to grab the lines I wanted. Whatever I tried just did not produce the expected results. Thanks to a buddy I found the solution which ended up being to sort the two files before using diff.

Example:
Assuming two files exist, File_1 and File_2. File_1 containing lines with a, b, c and. File_2 containing b and d. If we want to remove b and d from File_1 because they exist in File_2 you could use something like the this:

1
2
3
4
5
6
7
8
9
10
11
12
owen@linuxblog:~$ cat File_1.txt
a
b
c
d
owen@linuxblog:~$ cat File_2.txt
b
d
 
owen@linuxblog:~$ diff File_1.txt File_2.txt | grep \< | cut -d \  -f 2
a
c

That’s all fine and dandy until File_2.txt contains the same lines in a different order. Running the same command produces different results. See Below:

1
2
3
4
5
6
7
8
owen@linuxblog:~$ cat File_2.txt
d
b
 
owen@linuxblog:~$ diff File_1.txt File_2.txt | grep \< | cut -d-f 2
a
b
c

The solution as noted above is to use sort before hand and then difference them:

1
2
3
4
owen@linuxblog:~$ sort File_1.txt >> File_1-sorted; sort File_2.txt >> File_2-sorted;
owen@linuxblog:~$ diff File_1-sorted File_2-sorted | grep \< | cut -d \  -f 2
a
c

Obviously the example has been simplified, when dealing with thousands of lines the sort could take a while. With that said I’m sure there are more efficient ways to achieve the same results. I wouldn’t doubt there being a command better suited to do this. Have at it in the comments.