Kerollos Magdy Profile picture
Feb 10, 2022 8 tweets 2 min read
Today I made a simple script that traverses a directory to find similar files according to their MD5 hash value.

github.com/kerolloz/same-…

The reason why I was trying to solve this problem is that I have a directory with a huge number of PDF files (books). I was almost sure that there are some duplicates among these files (maybe even with different names, that's why the hashing).

I started by implementing a simple and straightforward script in Node.js.