The console always helps. :cool:
First, list all the media files by accessing your DokuWiki's
media directory:
[user@host] $ cd $PATH_TO_DOKUWIKI/data
[user@host] $ cd media;
[user@host] $ find -not -type d | cut -c 2- | tr '/' ':' > /tmp/mediafiles.txt
[user@host] $ cd ..
This creates
/tmp/mediafiles.txt a text file listing all the media files, with slashes (directories) converted to colons (namespaces).
Now, find all text files in the
pages[/m] directory, and list all text patterns of the form [m]{{:mediafile[...]}} (note the leading colon is there to dismiss external links).
[user@host] $ cd pages
[user@host] $ find | xargs grep -P -oh "\{\{[.]?\:.+?\..{3}(\|.+)?\}\}" | sed -e 's/{{\./{{/' -e 's/|[^}]*//g' -e 's/[{{|}}]//g'
[user@host] $ cd ..
This creates
/tmp/mediareferences.txt a text file containing all the media file invocations, stripped of their markdown. It requires that the media references begin with a colon (or a period) as if they were absolute links, but should work for most media references in a wiki.
Now the only thing remaining is to find all files indicated in
/tmp/mediafiles.txt[/m] that do not appear in [m]/tmp/mediareferences.txt:
[user@host] $ grep -v -F -f /tmp/mediafiles.txt /tmp/mediareferences.txt > orphanedmedia.txt
VoilĂ .
orphanedmedia.txt contains the wikipaths of all the media files that are never invoked. It should be possible to assemble this as a maintenance script...
Not 100% safe (see above) but should locate most orphan files if media references are always inserted through the media manager. Also note I'm not a Bash master or something, just worked out some tools until it worked.