Tao Te KaChing
Workin' the cash register of the Great Tao

Duplicate Files, Hash Codes, SQLite, and Me...

My wife's been getting on my case about having a gazillion different hard drives with everything and our mothers on them all around the house.  I mean, come on everybody, she just wants her pictures in one @%$!# spot!  She also "misused" Picasa, and now has a bunch of duplicates on her laptop (she doesn't read my blog, so I ain't worried she'll read that).

So, out shopping for Little Liam last weekend, and we decide to pop into Circuit City's closing-its-doors blowout sale.  I grabbed her a 500 GB Western Digital external drive and, when we got home, proceeded immediately on a simple solution to shut her pie hole.

The result: MyPicturesConsolidator!  It is a WYSIWYG image grabber, duplicate detector, and file-copier-consolidator all in one, gorgeous package!

Ok, this program is NOT a work of art, but may contain some good stuff you can use, and it works pretty solidly, so...


How it works:

First, you select where you want any pictures it finds to get copied to:


Second, select the logical drive you want to scan for pictures.  I included a Refresh Drive List button for changing between USB drives:


Third, click Find My Pictures!  And you're good!

Behind the scenes:

I wanted a "list" to be maintained that kept track of files we've gone through.  I decided to use a SQLite database that would hold MD5 and SHA1 hashes of the pictures.  A good side effect of this is, just take it with the exe and SQLite dll to another computer along with your destination drive (or network share path, etc.), and the duplicates list maintained in the SQLite db should work golden for you.

MD5 and SHA1 generation is, for lack of a better phrase, retardedly easy via .NET.  An MD5 hash of a file, for instance, can be had in one line of code:

byte[] md5Hash = new System.Security.Cryptography.MD5CryptoServiceProvider().ComputeHash(System.IO.File.ReadAllBytes(filename));

The code is here.  Go ahead and take a look.  There's some dumb things I'm doing in there that deal with my wife's needs (i.e. Picasa uses file creation dates, ergo I try to find the earliest for her when I can, etc.).