Proper missing image reprocessing with Paperclip + Rails
As SNGTRKR is still in its infancy, we find ourselves tweaking the image sizes our models have on a semi-regular basis. Fortunately Paperclip makes that a doddle; just a one line change to the model definition and away we go.
Now all thats left to address is the missing copies of this new image size in production. We have well over 10,000 records with large images associated with them in the database, so the naïve approach of reprocessing every model isn’t going to suffice here. In theory though we still have a weapon in our arsenal; Paperclip provides the paperclip:refresh:missing_styles
rake task, which will look for a YAML file in your project that stores the previously generated styles.
Unfortunately, that’s as far as that file goes. It does not store per record granularity, so if that rake should bail at some point through your 10,000+ records, well, start again! Haha, screw that. This actual happened to us, for reasons I am still not 100% sure on, but I can only assume it was an S3 connection issue or similar. To solve it, I wrote my own method to do roughly the same as what Paperclip does, but in a slightly more intelligent way; we check whether a potentially missing image exists for each record of a given ActiveRecord model (this is a Rails-centric solution) by actually trying to open the URL the image would be at.
Not only can this be stopped and effectively resumed (that is it will never reprocess an image that has already been processed), but if an error is raised during the reprocessing step, it will entirely delete the image from the record, as we assume there has been some kind of corruption between the record and its S3 files.
Feel free to use it, and if you make any improvements, please leave a comment!