Not logged in. · Lost password · Register
Forum: General Help and Support Plugins Plugin Wishlist RSS
Plugin for detecting orphan media files
Avatar
ramfree17 #1
Member since Apr 2009 · 3 posts · Location: Philippines
Group memberships: Members
Show profile · Link to this post
Subject: Plugin for detecting orphan media files
Our wiki has some media files that are probably no longer referenced within the whole wiki system. Is there a tool that will help in identifying these files? If not then please treat this as a request for a new plugin.

thanks!
Avatar
onn #2
Member since Apr 2009 · 10 posts · Location: Warsaw, Poland
Group memberships: Members
Show profile · Link to this post
I have the same problem :) Any help would be appreciated :)
Avatar
Jeroen0611 #3
Member since Jul 2010 · 6 posts
Group memberships: Members
Show profile · Link to this post
Have this problem too. Still no solution? Anyone?
J.
Avatar
ryan.chappelle #4
User title: Chilean DW Fan
Member since May 2008 · 218 posts · Location: Temuco, Chile
Group memberships: Local Moderators, Members, Newsletter Team
Show profile · Link to this post
Subject: Locating orphan (media) files
The console always helps.  :cool:

First, list all the media files by accessing your DokuWiki's media directory:

[user@host] $ cd $PATH_TO_DOKUWIKI/data
[user@host] $ cd media;
[user@host] $ find  -not -type d | cut -c 2- | tr '/' ':' > /tmp/mediafiles.txt
[user@host] $ cd ..

This creates /tmp/mediafiles.txt a text file listing all the media files, with slashes (directories) converted to colons (namespaces).

Now, find all text files in the pages directory, and list all text patterns of the form {{:mediafile[...]}} (note the leading colon is there to dismiss external links).

[user@host] $ cd pages
[user@host] $ find  | xargs grep -P -oh "\{\{[.]?\:.+?\..{3}(\|.+)?\}\}" | sed -e 's/{{\./{{/' -e 's/|[^}]*//g' -e 's/[{{|}}]//g'
[user@host] $ cd ..

This creates /tmp/mediareferences.txt a text file containing all the media file invocations, stripped of their markdown. It requires that the media references begin with a colon (or a period) as if they were absolute links, but should work for most media references in a wiki.

Now the only thing remaining is to find all files indicated in/tmp/mediafiles.txt that do not appear in /tmp/mediareferences.txt:

[user@host] $ grep -v -F -f /tmp/mediafiles.txt /tmp/mediareferences.txt > orphanedmedia.txt

Voilàorphanedmedia.txt contains the wikipaths of all the media files that are never invoked. It should be possible to assemble this as a maintenance script...

Not 100% safe (see above) but should locate most orphan files if media references are always inserted through the media manager. Also note I'm not a Bash master or something, just worked out some tools until it worked.
Chilean DW Fan!
my plugins for DokuWiki
GULIX, my area's LUG
Surviving earthquakes since Feb 2010!
Avatar
turnermm (Moderator) #5
Member since Oct 2009 · 4544 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
Congrats, Ryan.   A masterful piece of unix magic!
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Avatar
gymnophoria #6
Member since Jan 2010 · 34 posts · Location: London, UK
Group memberships: Members
Show profile · Link to this post
Nice, but this is no help for Windows users :) A plugin would be useful.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2019-03-20, 02:24:47 (UTC +01:00)