Linux Blog

Remove lines that are in another file

Filed under: Shell Script Sundays — TheLinuxBlog.com at 6:52 pm on Sunday, February 7, 2010

Remove lines from a file that exist in another fileI had an issue this week where I needed to remove lines from one file if they existed in another file. Looking back it was frustrating as such a task should be simple.

I tried all sorts of things. Differencing the two files and using grep to grab the lines I wanted. Whatever I tried just did not produce the expected results. Thanks to a buddy I found the solution which ended up being to sort the two files before using diff.

Example:
Assuming two files exist, File_1 and File_2. File_1 containing lines with a, b, c and. File_2 containing b and d. If we want to remove b and d from File_1 because they exist in File_2 you could use something like the this:

1
2
3
4
5
6
7
8
9
10
11
12
owen@linuxblog:~$ cat File_1.txt
a
b
c
d
owen@linuxblog:~$ cat File_2.txt
b
d
 
owen@linuxblog:~$ diff File_1.txt File_2.txt | grep \< | cut -d \  -f 2
a
c

That’s all fine and dandy until File_2.txt contains the same lines in a different order. Running the same command produces different results. See Below:

1
2
3
4
5
6
7
8
owen@linuxblog:~$ cat File_2.txt
d
b
 
owen@linuxblog:~$ diff File_1.txt File_2.txt | grep \< | cut -d-f 2
a
b
c

The solution as noted above is to use sort before hand and then difference them:

1
2
3
4
owen@linuxblog:~$ sort File_1.txt >> File_1-sorted; sort File_2.txt >> File_2-sorted;
owen@linuxblog:~$ diff File_1-sorted File_2-sorted | grep \< | cut -d \  -f 2
a
c

Obviously the example has been simplified, when dealing with thousands of lines the sort could take a while. With that said I’m sure there are more efficient ways to achieve the same results. I wouldn’t doubt there being a command better suited to do this. Have at it in the comments.

Man Pages for commands in this post »

sort
diff
grep
cat
cut

7 Comments »

Comment by Kaleb

February 7, 2010 @ 7:24 pm

Well yours ma work but I would use sed for this function….I think it would probly make it simpler

Comment by TheLinuxBlog.com

February 7, 2010 @ 7:25 pm

@Kaleb Show me how to do it!

Comment by root jerais

February 7, 2010 @ 7:36 pm

check join or paste commands within shell script, might help:
http://bit.ly/d03fBU

Comment by kernel.net

March 29, 2010 @ 12:40 pm

@root thx I didn’t know about join!

@thelinuxblog.com thanks for this article, the other day I spent like half an hour trying to figure out how to compare two files to get something like a set difference.
Your thing using cut just gave me the right idea ;)

Comment by showgood

August 24, 2010 @ 8:43 pm

first sort File1.txt and File2.txt, then
comm -13 File1.txt File2.txt should give you what you want..

Comment by Linux tips

September 28, 2010 @ 3:58 am

earlier I spent lot of time to figure out to compare two files to get something like a set difference,but this thing using cut just gave me the right path for faster proceedings

Pingback by Remove Lines from File which appear in another File | Ask Programming & Technology

November 4, 2013 @ 12:54 pm

[…] I know this is a question that might have been asked more often, but I only found one command online that gave me an error with a bad […]

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>