Linux repositories inspector

freedup(1)

freedup (utils)
September 2008

freedup

Links substantially identical, duplicate files to save file system space

NAME

freedup - find duplicate files and establishes links between them while ignoring selectable tags

SYNOPSIS

FreeDup [OPTIONS][ITREE]...

DESCRIPTION

Search all given file system trees for identical files and link them to the most frequently referenced inode or if equally referenced to the inode of the first file tree. If the devices differ a symbolic link is used instead of a hard link. Symbolic links will not replace files, when at least one of the directory trees is not starting with a ’/’.
-a equivalent to -gup. It is provided to allow simple compatibility to freedups by William Stearns, where -a has the opposite meaning of -n for freedup.
-b Set the base directory, which is the current directory by default. Usually of interest when using freedup in scripts together with partial directory path names.
-c count file space savings per file. A message denotes the number of saved files.
-d requires the modification time stamps to be equal. Most frequently of interest for source files in combination with make. When used a second time (toggled to off) arbitrary time stamps are allowed.
-D <sec> requires the modification time stamps to be within the given range of seconds. Use -d to switch it off.
-e <env> reads a setup environment from the config file $HOME/.freedup.cfg and presets the variables to the values for that environment. If there is no such environment registered, the variables are stored before execution. The directories are NOT stored. If you want to add them, d it manually as directory=<tree> and pay attention not to provide unneeded trailing spaces.
-f requires the path-stripped file names to be equal. No real use for this except for paranoia and testing reasons.
-g requires groups of the files to be equal for linking. (see -p)
-h shows this help. All other option are ignored then.
-H Normally, when two or more files point to the same disk area they are not treated as duplicates and therefore not reported; this option will change this behaviour. Use it as in fdupes -rH.
-i activate interactive mode. You are prompted to decide on each file, whether to delete or to link it. The list of files contains a number and a letter to reference each file. The number may be used to delete a file. Entering the corresponding letter invokes it to be linked to the file referenced by the first given letter. The special characters ’#’, ’@’, ’<’, ’>’, ’-’, and ’+’ suggest linking all files to the first given, the first with the most links, the oldest, the newest, the smallest, or the biggest respectively. The smallest and biggest reflect different sizes when using extra styles, otherwise they default to oldest and newest. In this case hardlinks on the same device and symbolic links for different devices are suggested. In order to ease your decisions the file names are also preceeded by colon separated device number, inode, permissions, and the link counter in square brackets. The interactive mode may be a replacement for fdupes -rd. The options -in make freedup to behave like fdupes -r.
-k <key> sets the linking order, i.e. which file will replace the others. Depending on your options the may have influence on timestamps, permissions, ownership, and in case of more than two identical files als which files are treated as linked and which are not. Allowed keys are the special characters from interactive mode.
-l Establish only those links that are hardlinks, do nothing if symlinks are required, e.g. caused by different file systems. Makes freedup do nothing in combination with -w.
-m <bytes>
deprecated: only link files larger than the amount of bytes given. Better performance will be achieved using -o "-size +<bytes>c".
-n do not execute the link command.
-o <opts>
pass an option string to the initially called find command. Be aware that the find command only applies directory arguments, not to standard input. Since there are usually spaces enclosed be aware to set quotes right. Only the last given -o option will be used. This disables the internal tree-scanning and replaces it with a pipe. The option string is appended to find . -type f . Use -o -true to enable the pipe instead of the internal routine.
-p requires file permissions to be equal. Security issues may be assumed. The author does not believe so, since the examined file is already present with the later permissions and linking only takes place with the same content (hash cheating will fail). On the other hand, this might be a way to propagate improper settings.
-P lt;mask> set the file permission mask that will be used during comparison of permissions. The mask needs to be given as and octal number. Default is to use 7777. When setting the mask -p is forced to on automatically.
-q suppress any informational output (also that to stderr) except in case of severe errors.
-s generate symbolic links although some given paths are given without a leading slash. This might lead to broken links since it works well from the current directory but unlikely in all hierarchies. This only affects symbolic links, since the inode numbers of hard links are unique within each file system.
-t <type>
selects the hash method that will be used. Valid choices are sha512, sha384, sha256, sha224, sha1, md5, sum. If the type is incomplete the first matching type will be chosen automatically. If this option is not used and an internal hash method is compiled into the program, this will be used. Warnings and error messages are only available when running with -v.
-T when replacing a file by a link, the modification time of that directory changes. If you want to keep the time stamp of the directory like it was before linking its file, then use this option. The access time is restored to the pre-linking value as well, but this might not last long when freedups continues to investigate the file tree.
-u requires users of the files to be equal for linking. (see -p)
-V show version and copyright.
-v runs in verbose mode, i.e. it displays shell commands to perform linking. In difference to the embedded commands, the displayed commands to not do establish the link before deleting the former file.
-w Establish all links as weak symbolic links only. This alters the behaviour fundamentally. Please watch out when using this option in combination with -s. In consequence files may be hard to retrieve from different working directories.
-x <style>
selects the comparison style that will be used. Valid choices are auto, mp3, mp4, mpc, ogg or jpg. If the type is incomplete the first matching type will be chosen automatically. If this option is used only the internal hash method can be used, no external one. Please be aware, that this disables full comparison of the files and can result in loss of information.
-# <num> This option manages the way that hash sum evaluation influences the comparison algorithm.
0 (zero) means not to use the hash sum at all.
1 (one) enables "classic" hash sum usage, i.e. to evaluate or retrieve hash sum before comparing two files. this mode is required for external hash functions.
2 (two) enables the new "on the fly" evaluation, which reduces the read overhead. [default since version 1.3-1]
other values are increased or lowered to match the above ones
There is no way to disable that all files are compared byte-by-byte (only the extra mode limits this).
-0 Do not link files that do have no contents, i.e. file size is zero. This avoids large link clusters.
lt;dir> any directory to scan for duplicate files recursively. The recursion is activated per default. Use -o to restrict the search to the initial file system (-xdev) or to a maximum tree depth (-maxdepth).
Many Options (-cdfnpsuv) are implemented as toggle switches. All given options are processed before starting the program. The final state of each option applies.
<dir> trees given later are linked to the files found in earlier ones. Since a sorting algorithm is applied, there is no use in adding one directory tree several times, except certain additional options for find are provided.
With no <dir> tree given, a list of files (NOT dirs!) will be read from standard input. This is useful in conjunction with locate(1L) and find(1L). An example would be
find /usr/src -xdev -iname ’*.h’ -print | freedup -c

CONFIG FILE

The config files will be read from (and written to) $HOME/.freedup.cfg. The environment name is enclosed in brackets and is valid until the next environment is started. There are no sections without environment names. The keys and values are separated by = signs that may be surrounded by white spaces. The key words match those the long option names without the leading minus signs. There are a few string options. Please note, that for them trailing spaces will be assigned to the variable, too. You may add one or more directory entries per environment only manually.
Config files can be generated automatically when using -e with an environment name that does not exist yet. All option settings will be stored using it this way. Directory trees or file lists are not stored. This may also be used to copy environments using freedup. As an example assume that the environment mp3 exists and the environment Music does not. When executing the following command
freedup -e mp3 -in -e Music . somedir
the environment mp3 will be copied to Music with the interact and noaction options toggled.
Environments are read at the very position in the command line, i.e. it is likely that they override all command line settings that may have been made before. On the other hand, only settings that are present in the config file will be set, others are unchanged. If each environment has only a few option entries, you may use multiple environments to combine different settings. Please be aware that this is not recommended due to the resulting complexity.
Environments are written after all settings are complete, i.e. only the last environment name will survive. Hence, there is no use in specifying more than one non-existing environment on one command line. And non-existing environments should not be followed by existing ones if you want to store the current settings.
Here is an example of a config file:
[freedups]
samegroup=1
sameuser=1
sameperm=1
[fdupes]
interact=1
noaction=1
nonzero=1
[mymusic]
basedir=/home/freedup
findoptions=-iname ’*3’
interact=1
noaction=1
nonzero=1
extra=1
directory=/home/freedup/test

OPERATING SYSTEMS

freedup is developed in POSIX compliant C under Linux/AMD, and tested with Linux/Intel, Cygwin/WindowsXP, AIX 4.3.3 and AIX 5.3. The sources contain a full test suite to check for the correct execution. Beyond that tests, that you may easily verify, there is no warranty (or similar) that the program will behave as you or I expect it. Please try always to use the newest release from http://freedup.org/ and keep the author informed of severe bugs, since he uses the program frequently.

COLLATERAL

freedup concentrates on providing an interface to claim space by replacing files with links to those with identical content. However similar tools provide additional services, which are easy to achieve by single command lines.
An excess mode would list all files but one. Instead of providing one more option, you are kindly asked to use this command (with care):
freedup -in . | awk ’{if(NF!=0)print x;x=$0}’ | xargs rm
Working with Windows there seem difficulties with retrieving linked files. In case you are looking for linked files you may want to use:
find . -type f -noleaf -links +1 -printf "%n %i %f %h0 | sort | less
Using -type l instead of -links +1 with find allows to retrieve symbolic links instead of hard links. Therefore you can replace symbolic links by the file that is referenced by using:
find test -type l -exec cp {} {}.tmp$$ \; -exec mv {}.tmp$$ {} \;
Assuming you checked the list of files you want to delete, this command completes the task
find ./ -type f -empty -print0 | xargs -0 rm
With -type d you can delete empty directories instead.

REPORTING BUGS

Report bugs to <>.

COPYRIGHT

Copyright © 2007 Andreas Neuper
FreeDup is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
FreeDup is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with FreeDup. If not, see <http://www.gnu.org/licenses/>.

FILES

$HOME/.freedup.cfg

SEE ALSO

qsort(3) , ln(1) , find(1L) , locate(1L)
The best documentation for freedup is maintained within the source code.

REFERENCED BY

⇧ Top