Hi,
Any chance you could help me adapt your find-dupes.awk script to work on a Linux system? Based on your notes, I was able to figure out the following changes:
- Instead of
ls -lTR, use ls -l --full-time -R | grep -v ^d
- Use
md5_exec = "md5sum"
- Change
$9 to $8: file = substr($0,match($0, $8)+length($8)+1,length($0))
- Change
$2 to $1 since we are using md5sum: hash = $1
I couldn't figure out the rest, starting with the line sizes[$5], as I don't know awk. Would appreciate it as I'm trying to find dupes using the md5sum from the stackexchange thread that you referenced, and it's still running after 1 day on 1.3TB worth of data.
Thanks in advance.
Hi,
Any chance you could help me adapt your
find-dupes.awkscript to work on a Linux system? Based on your notes, I was able to figure out the following changes:ls -lTR, usels -l --full-time -R | grep -v ^dmd5_exec = "md5sum"$9to$8:file = substr($0,match($0, $8)+length($8)+1,length($0))$2to$1since we are usingmd5sum:hash = $1I couldn't figure out the rest, starting with the line
sizes[$5], as I don't knowawk. Would appreciate it as I'm trying to find dupes using themd5sumfrom the stackexchange thread that you referenced, and it's still running after 1 day on 1.3TB worth of data.Thanks in advance.