Linux Blog

Recursive MD5 Sum Script

Filed under: Shell Script Sundays — TheLinuxBlog.com at 12:08 am on Sunday, December 9, 2007

This week I made this shell script to search one level deep and MD5 all of the files. I did this because I had multiple images and I wanted to see what images were the same so that I could merge them together. Its a pretty simple script & the output is the same as md5suming a file except there is more than one sum generated.

#MD5 Files in the directories
md5Dir () {
echo $directory;
for x in $(ls -1 $directory); do
md5sum $directory’/'$x;
done;
}
#Lists The Directories
for i in $(ls | grep active); do
directory=$i;
md5Dir;
done;

It only does one level deep but thats good enough for now. I am going to make it search recursively depending on the levels given by the user. I would also like to make it display files that are the same at the end.

It gets the job done for small directories, but if I wanted to run it on large multiple directories with lots of files in them I would definitely redirect the output to a file because it can be quite overwhelming. To run it just copy the code into a file and do the following:

sh <filename>

I hope this helps some one who is trying to MD5 multiple files in different directories!

Man Pages for commands in this post »

md5sum
ls
grep

20 Comments »

361

Comment by brian

February 21, 2008 @ 6:34 pm

try this:

for x in $(find);
do if [ ! -d $x ]; then md5sum $x; fi;
done

362

Comment by Owen

February 22, 2008 @ 11:10 am

for x in $(find);
do if [ ! -d $x ]; then md5sum $x; fi;
done;

This works great for all files in all directories except for directory or file names with spaces in it. There is a way to correct this using the find command. I’ll have to dig up an old post to remember what it is.

380

Comment by PsyBlade

March 4, 2008 @ 6:26 pm

try

find $@ ! -type d -print0 | xargs -0 md5sum

works with spaces and accepts all find parameters including -maxdepth

424

Comment by matthew

March 16, 2008 @ 7:42 pm

I was wondering if anyone was able to do the same thing, only the output being in a tree fashion
i.e.
|– dir1
| |– file1.txt d41d8cd98f00b204e9800998ecf8427e
| `– file2.txt d41d8cd98f00b204e9800998ecf8427e
|– dir2
| |– file1.txt d41d8cd98f00b204e9800998ecf8427e
| `– file2.txt d41d8cd98f00b204e9800998ecf8427e
|– file1.txt d41d8cd98f00b204e9800998ecf8427e
`– file2.txt d41d8cd98f00b204e9800998ecf8427e

448

Comment by Owen

March 23, 2008 @ 9:18 pm

Hey Matthew, I’ll look into doing something like this in a future article.

465

Comment by matthew

March 27, 2008 @ 11:56 am

great, I ended up giving up on tree, it was an awful lot of work to start taking apart each line of the output

instead I went a little uglier and improvised a tree like output


for i in $(find $@ ! -type d )
do
file_id=`ls "$i"`
md5sum_long=$(md5sum "$i")
md5sum_solo=`echo "$md5sum_long" | cut -c 1-32 `
file_size=$(stat -c%s "$i")
dir_count=`echo "$file_id" | grep -o "/" | wc -l `

counter="1"
while [ "$counter" != "$dir_count" ]; do
echo -n "| " \;> "$file_name"
counter=$(($counter+ 1))
done

echo -e "|--" "$file_id \t $file_size \t $md5sum_solo" \> "$file_name"
done

466

Comment by Owen

March 27, 2008 @ 12:27 pm

Nice script, although I edited it to show \> instead of >> as >> tries to echo to the file in my .sh script.

469

Comment by Ron

March 28, 2008 @ 8:07 pm

re:PsyBlade

>find $@ ! -type d -print0 | xargs -0 md5sum

What does the $@ yield as the path?

And what if you want to create md5 file per directory to see if any bit rot is happening.

697

Comment by touisk

June 9, 2008 @ 3:40 pm

What about…

- To compute the checksums :
$ find . -name ‘*’ -exec md5sum {} >checksum.md5 \;

- To check :
$ md5sum -c ckechsum.md5

706

Comment by Owen

June 10, 2008 @ 1:34 pm

@Touisk

Thats probably the best method I’ve seen so far. I think I like the way that works but would like to add some more options. Highlighting / Color options and options to show missing / corrupt files only. Also I think I would like to add skip hidden directories / files, skip directories / files, skip on file size and some other options. I think I’ll make all these features eventually and bundle it up into a convenient recursive md5 sum script.

I will use it on a daily basis so its worth doing.

- Owen.

724

Comment by touisk

June 18, 2008 @ 2:22 pm

Hi

so, from the check …

$ md5sum -c ckechsum.md5 > log.txt

Note : good checks ENDS by ‘ OK’

so, we can …

list the good files :
$ grep ‘OK$’ log.txt | less

list the corrupt files :
$ grep -v ‘OK$’ log.txt | less

You can also find a file (filename containing a sub-string) to know if it’s OK !
$ grep ” log.txt

Bye.

731

Comment by Owen

June 23, 2008 @ 7:33 am

@Touisk those are good ways to filter the output. I am still working on a tree output and the features that I mentioned in the above post. I’ll be sure to post them when I’m done. Thanks for your contributions!

774

Comment by Kveri

July 19, 2008 @ 2:30 am

what about this:

find * ! -type d -print0 | xargs -0 md5sum

Comment by log69

January 30, 2009 @ 9:55 am

And if you wanna get the md5sum of a directory to compare it with others, then PSYBLADE’s solution + cat is the solution. It gives only 1 md5sum:

find $@ ! -type d -print0 | xargs -0 cat | md5sum

Comment by log69

January 30, 2009 @ 10:04 am

or a better one ;)

find ! -type d -print0 | xargs -0 md5sum | md5sum

Comment by der Dennis

October 9, 2009 @ 3:03 am

Another solution would be the md5deep utility ->

http://md5deep.sourceforge.net/

Comment by aprotim

November 11, 2009 @ 5:13 pm

What’s wrong with:

md5sum $directory/*

It will print errors to stderr for non-plain files found, but stdout will have the hashes, as required…

Comment by TheLinuxBlog.com

November 12, 2009 @ 10:45 am

@APROTIM nothing is wrong with md5sum

/* for finding files within a directory, but it won’t go into the directories. It would basically do the same thing as “md5sum $directory’/'$x;” the $x is there in case I wanted to extend it to multiple levels.

Comment by David E.

November 27, 2009 @ 11:23 am

I like the comment about using the following command: “find * ! -type d -print0 | xargs -0 md5sum”

You could accomplish all that by using the find command as follows: “find ./ -type f -exec md5sum $1 {} \;”

The important thing to note about the script in this post is that when parsing files by filename, its usually best to avoid using the “for” command; as it uses a space as the default field separator… This means that if it runs across a filename with a space in it, which is common if you are scripting against a file server, it will run into a lot of problems.

If you have an aversion against using find, it would be more advisable to do the following: “ls | while read FILE; do md5sum “$FILE”; done”

Comment by spargonaut

January 9, 2010 @ 10:00 pm

this is great.
thanks to everyone for the input.

I’m trying to accomplish something similar ( in addition to sharpening my shell chops ).
After creating an md5sum for each file, I want it to start back at the beginning of the file and search for duplicates, and write the possible duplicates to another file.

this thread gives me a great head start.

-js

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>