Linux Blog

Recursive MD5 Sum Script

Filed under: Shell Script Sundays — TheLinuxBlog.com at 12:08 am on Sunday, December 9, 2007

This week I made this shell script to search one level deep and MD5 all of the files. I did this because I had multiple images and I wanted to see what images were the same so that I could merge them together. Its a pretty simple script & the output is the same as md5suming a file except there is more than one sum generated.

#MD5 Files in the directories
md5Dir () {
echo $directory;
for x in $(ls -1 $directory); do
md5sum $directory'/'$x;
done;
}
#Lists The Directories
for i in $(ls | grep active); do
directory=$i;
md5Dir;
done;

It only does one level deep but thats good enough for now. I am going to make it search recursively depending on the levels given by the user. I would also like to make it display files that are the same at the end.

It gets the job done for small directories, but if I wanted to run it on large multiple directories with lots of files in them I would definitely redirect the output to a file because it can be quite overwhelming. To run it just copy the code into a file and do the following:

sh [filename]

I hope this helps some one who is trying to MD5 multiple files in different directories!

Man Pages for commands in this post »

md5sum
ls
grep

24 Comments »

361

Comment by brian

February 21, 2008 @ 6:34 pm

try this:

for x in $(find);
do if [ ! -d $x ]; then md5sum $x; fi;
done

362

Comment by Owen

February 22, 2008 @ 11:10 am

for x in $(find);
do if [ ! -d $x ]; then md5sum $x; fi;
done;

This works great for all files in all directories except for directory or file names with spaces in it. There is a way to correct this using the find command. I’ll have to dig up an old post to remember what it is.

380

Comment by PsyBlade

March 4, 2008 @ 6:26 pm

try

find $@ ! -type d -print0 | xargs -0 md5sum

works with spaces and accepts all find parameters including -maxdepth

424

Comment by matthew

March 16, 2008 @ 7:42 pm

I was wondering if anyone was able to do the same thing, only the output being in a tree fashion
i.e.
|– dir1
| |– file1.txt d41d8cd98f00b204e9800998ecf8427e
| `– file2.txt d41d8cd98f00b204e9800998ecf8427e
|– dir2
| |– file1.txt d41d8cd98f00b204e9800998ecf8427e
| `– file2.txt d41d8cd98f00b204e9800998ecf8427e
|– file1.txt d41d8cd98f00b204e9800998ecf8427e
`– file2.txt d41d8cd98f00b204e9800998ecf8427e

448

Comment by Owen

March 23, 2008 @ 9:18 pm

Hey Matthew, I’ll look into doing something like this in a future article.

465

Comment by matthew

March 27, 2008 @ 11:56 am

great, I ended up giving up on tree, it was an awful lot of work to start taking apart each line of the output

instead I went a little uglier and improvised a tree like output


for i in $(find $@ ! -type d )
do
file_id=`ls "$i"`
md5sum_long=$(md5sum "$i")
md5sum_solo=`echo "$md5sum_long" | cut -c 1-32 `
file_size=$(stat -c%s "$i")
dir_count=`echo "$file_id" | grep -o "/" | wc -l `

counter="1"
while [ "$counter" != "$dir_count" ]; do
echo -n "| " \;> "$file_name"
counter=$(($counter+ 1))
done

echo -e "|--" "$file_id \t $file_size \t $md5sum_solo" \> "$file_name"
done

466

Comment by Owen

March 27, 2008 @ 12:27 pm

Nice script, although I edited it to show \> instead of >> as >> tries to echo to the file in my .sh script.

469

Comment by Ron

March 28, 2008 @ 8:07 pm

re:PsyBlade

>find $@ ! -type d -print0 | xargs -0 md5sum

What does the $@ yield as the path?

And what if you want to create md5 file per directory to see if any bit rot is happening.

697

Comment by touisk

June 9, 2008 @ 3:40 pm

What about…

- To compute the checksums :
$ find . -name ‘*’ -exec md5sum {} >checksum.md5 \;

- To check :
$ md5sum -c ckechsum.md5

706

Comment by Owen

June 10, 2008 @ 1:34 pm

@Touisk

Thats probably the best method I’ve seen so far. I think I like the way that works but would like to add some more options. Highlighting / Color options and options to show missing / corrupt files only. Also I think I would like to add skip hidden directories / files, skip directories / files, skip on file size and some other options. I think I’ll make all these features eventually and bundle it up into a convenient recursive md5 sum script.

I will use it on a daily basis so its worth doing.

- Owen.

724

Comment by touisk

June 18, 2008 @ 2:22 pm

Hi

so, from the check …

$ md5sum -c ckechsum.md5 > log.txt

Note : good checks ENDS by ‘ OK’

so, we can …

list the good files :
$ grep ‘OK$’ log.txt | less

list the corrupt files :
$ grep -v ‘OK$’ log.txt | less

You can also find a file (filename containing a sub-string) to know if it’s OK !
$ grep ” log.txt

Bye.

731

Comment by Owen

June 23, 2008 @ 7:33 am

@Touisk those are good ways to filter the output. I am still working on a tree output and the features that I mentioned in the above post. I’ll be sure to post them when I’m done. Thanks for your contributions!

774

Comment by Kveri

July 19, 2008 @ 2:30 am

what about this:

find * ! -type d -print0 | xargs -0 md5sum

Comment by log69

January 30, 2009 @ 9:55 am

And if you wanna get the md5sum of a directory to compare it with others, then PSYBLADE’s solution + cat is the solution. It gives only 1 md5sum:

find $@ ! -type d -print0 | xargs -0 cat | md5sum

Comment by log69

January 30, 2009 @ 10:04 am

or a better one ;)

find ! -type d -print0 | xargs -0 md5sum | md5sum

Comment by der Dennis

October 9, 2009 @ 3:03 am

Another solution would be the md5deep utility ->

http://md5deep.sourceforge.net/

Comment by aprotim

November 11, 2009 @ 5:13 pm

What’s wrong with:

md5sum $directory/*

It will print errors to stderr for non-plain files found, but stdout will have the hashes, as required…

Comment by TheLinuxBlog.com

November 12, 2009 @ 10:45 am

@APROTIM nothing is wrong with md5sum

/* for finding files within a directory, but it won’t go into the directories. It would basically do the same thing as “md5sum $directory’/'$x;” the $x is there in case I wanted to extend it to multiple levels.

Comment by David E.

November 27, 2009 @ 11:23 am

I like the comment about using the following command: “find * ! -type d -print0 | xargs -0 md5sum”

You could accomplish all that by using the find command as follows: “find ./ -type f -exec md5sum $1 {} \;”

The important thing to note about the script in this post is that when parsing files by filename, its usually best to avoid using the “for” command; as it uses a space as the default field separator… This means that if it runs across a filename with a space in it, which is common if you are scripting against a file server, it will run into a lot of problems.

If you have an aversion against using find, it would be more advisable to do the following: “ls | while read FILE; do md5sum “$FILE”; done”

Comment by spargonaut

January 9, 2010 @ 10:00 pm

this is great.
thanks to everyone for the input.

I’m trying to accomplish something similar ( in addition to sharpening my shell chops ).
After creating an md5sum for each file, I want it to start back at the beginning of the file and search for duplicates, and write the possible duplicates to another file.

this thread gives me a great head start.

-js

Comment by Ben in Seattle

August 1, 2010 @ 12:11 am

I love seeing the evolution of this script in the comments as it gets tighter and better. It really shows you how Unix is a toolbox: even with only basic knowledge of the shell you can get your work done, but as you become an expert you findsharp, precise tools and the knowledge to handle them.

David E’s version looks well-nigh perfect, but it has a couple extraneous parts that can be cut out. First is the slash in “./” which can just be typed “.“. The second is the “$1” which I think was a typo or a remnant of a shell script.

So, my tweak on his version would be:

find ./ -type f -exec md5sum $1 {} \; > checksums.md5

Or, if you’re using GNU/Linux (which you probably are if you’re using the Linux kernel), you can omit the period since GNU find defaults to searching the current working directory. Also, instead of using “\;“, which runs a separate md5sum process for every file, GNU find has the nifty plus syntax, “+“, which passes all the filenames found as command line arguments to one instance of md5sum. That will make it run faster and also has the benefit that plus is not a shell special character so it doesn’t have to be escaped by a backslash. So, my final recommendation (though I’m sure somebody will come along and improve upon this), is:

find -type f -exec md5sum {} + > checksums.md5

Of course, to check the sums, it’s still the same md5sum -c checksums.md5

Comment by Ben in Seattle

August 1, 2010 @ 12:19 am

Uhh… duh…. I neglected to remove the “/” or “$1″ from my tweak of David E’s version. Not a big deal, since I think my later version is better, but I feel silly now.

Comment by enars

July 28, 2012 @ 5:09 am

Another solution would be the md5deep utility ->

http://md5deep.sourceforge.net/

under terminal (ubuntu 10.10)

$ md5deep

$ sudo apt-get install md5deep

Comment by Jonathan

July 28, 2012 @ 7:52 pm

I’ve had a need for verifying integrity of backups/mirrors which contain a large number of files and ended up writing a command-line program called MassHash. It’s written in Python. A GTK+ Launcher is also available. You may want to check it out…

http://code.google.com/p/masshash/

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>