iPhoto AppleScript to Remove Duplicates

Short Story:

I had several years of photos that I needed to identify and remove the duplicate. Instead of manually combing through 12,000 (read Long Story below) and before carpal tunnel set in, I needed a script to help me out. My situation may or may not be unique, so this script may not work 100% out-of-the-box for you, but it should get you started.

This script will identify duplicate photos in your iPhoto library and mark them with a comment (keyword) of “duplicate”. It will not delete anything.

To use:

  1. Download and unzip the script
  2. Double-click the script to open in Script Editor
  3. Go into iPhoto and select a group of photos you want to compare
  4. Switch back to Script Editor and run the script
  5. Don’t Touch Anything! Just let the script finish, it could take a while if you are comparing a lot of photos
  6. After the script is done, go back into iphoto and search for “duplicate”
  7. You can highlight all the duplicates and delete them or move them some place safe

Photos are considered a duplicate if:

  1. both heights match
  2. both widths match
  3. the photo date in iPhoto match, this is typically the EXIF creation date
There are no error checks in this script and it presents no interface except an alert when it’s done. If you need help, just post a comment below and I’ll do my best.

Long Story:

About a year ago I was editing down my iPhoto library of about 6000 images, just gitting rid of those out-of-focus shots and the ones of my wife’s feet (a curiously large number of these). After a long night of editing, the next morning I awoke to start again, but when I ran iPhoto there was nothing in the library.
It was all gone!
I couldn’t find anything anywhere. Could I restore from a backup? Ooh nooo. I had erased my backup drive the day before in preparation for moving the unwanted photos onto the backup drive and then making a new backup of my iPhoto Library. So I had no backup.
Not really funny. These were all the shots of my boys being born, first steps, first birthdays, first everything. I was up sh*t creek and it put a serious hurt in my stomach. At least I knew what to do: do nothing on the computer, boot from the Mac OS X install DVD and use Disk Utility to make a byte-for-byte copy of my internal hard disk. I could use this disk image to recover the images, hopefully.
So I tried several image recovery utilities and finally settled on PhotoRescue for Mac. I mounted the disk image of my internal disk and set PhotoRescue to the task. About 9 hours later (not a typo), PhotoRescue gave me several folders of recovered JPEGs, TIFFs, GIFs and PNGs. I tossed all but the JPEGs. I felt a little better at this point.
But when I looked in the JPEG folder there was over 12,000 images! Huh? Well, PhotoRescue does not discriminate, it recovers ALL images, including thumnails, web graphics, pron (you’ve been warned). Frankly, it was unbelievable and overwhelming.
So I set about dividing the images into folder that I knew were junk images and ones that I may want to keep. First, I eliminated everything below about 120K. I knew that my oldest digital camera was around 3M pixels and it saved a file that was typically > 200K so those images below 120K were most likely thumbnails and web images. That cut my stack almost in half.
Next I looked for images > 3M. These were corrupted image files that while they looked ok in Preview, I knew there was no way a 1200×1600 images was 40M. Just a consequence of PhotoRescue’s recovery routine. I can live with that, believe me. So I tossed everything > 3M because my current 6M pixel camera images are under 2M in size.
This left me with about 6,000 images that I imported into a new iPhoto Library. From the looks of it, all my images were there! What a relief, but the bad news was nothing was rotated properly, and there were many, many duplicates. Thousands of duplicates to be exact. After I rotated all the images so that I could view them properly, I set about removing the duplicates.
The good news about removing the duplicates was that they were fairly easy to spot. When I imported all the recovered images into iPhoto it apparently used the EXIF data data to date stamp each photo instead of using the photo file’s creation date, which was set by PhotoRescue to the day I performed the recovery. So all my photo’s were dated properly, I just had to look at each photo that matches (they were sorted by date) the one next to it and delete one of them.
A closer look at the duplicate photos revealed that while they had the same height, width and date/time, they varied in size. I was not able to determine why the file sizes varied as the images themselves looked identical, but my best guess is that the size difference came about from iPhoto’s insistence that when you rotate an image iPhoto considers this an “edit” and makes a copy of the original and add’s some iPhoto specific data (no verification on this though). So hey, if you are going to keep one, why not keep the smaller of the image files? So that’s what I was doing.
After hours and days of removing duplicates, I decided there has to be a better way. A bit of searching for “applescript iphoto remove duplicates” let me to Brattoo Propaganda Software’s Duplicate Annihilator. I tried the demo and it works very well. But there was one thing I wanted to do that Duplicate Annihilator could not, and that is mark the larger of the duplicate files. Duplicate Annihilator marks duplicate files by date/time which I am sure is what most people want to do. So definitely check it out.
So Duplicate Annihilator minor missing feature led me to write my own AppleScript to do pretty much the same. The script is pretty simple and requires no additional libraries or command line voodoo. But I will say that coding in Ruby for the past year-and-a-half really reminds my why I don’t like AppleScript. AS gets the job done, but it’s so much more work, frankly it’s confusing, and if you don’t do it often it’s a lot of work getting your head around AS’s nomenclature.
For you fellow rubists, there is rb-appscript which would have made my pain a little easier, but it relies on ruby and having the rb-appscript gem installed and that would be too much for most casual Mac users. So AppleScript won this round, but only because I knew I wanted to share the script for others.
Good luck to all you photo recoverers. I’ve been down your road before.
Posted in Mac. Tagged with , .

121 Responses

Comments always appreciated...