sync - merge - Dokuwiki file structure
Hi all!
I want to sync my Dokuwiki with Unison. Just for the record: I've got a local Dokuwiki instance at home (Linux only, webserver NginX) as a personal wiki and another instance on my usb stick (usage with different OS, webserver NginX). They won't be altered at the same time (unfortunately I can't double myself), so I don't have to be too careful regarding conflicting versions...
What I intend to do is sync the wiki at home with Unison when I plug the usb stick in, triggered by an udev-rule. For removing the stick, I intend to have a simple desktop shortcut that triggers the sync script and umount. In this scenario, using the sync plugin isn't what I want, because usually I'm not logged in (in the wiki) as Administrator and the whole thing seems a little uncomfortable to me... Anyway: I have had a look on the sync plugin.
By now, the script that is triggered by udev works fine and looks as follows:
(By the way: I included some cleanup work as can be found in the dokuwiki wiki...)
----
#!/bin/bash
# When called by udev, no environment variables are present! Every command has
# to be preceded by the full path!
cleanup() {
# $1 ... full path to data directory of wiki
# $2 ... number of days after which old files are to be removed
# purge files older than $2 days from the attic (old revisions)
# find "$1"/attic/ -type f -mtime +$2 -print0 | xargs -0r rm -f
# remove stale lock files (files which are 1-2 days old)
find "$1"/locks/ -name '*.lock' -type f -mtime +1 -print0 | xargs -0r rm -f
# remove empty directories
find "$1"/pages/ -depth -type d -empty -print0 | xargs -0r rmdir
# remove files older than $2 days from the cache
find "$1"/cache/?/ -type f -mtime +$2 -print0 | xargs -0r rm -f
}
# set up variables...
HOME_DIR=/home/username
DEVICE_PATH=/media/mountpoint
SYNC_DIR_HOME=${HOME_DIR}/DokuWiki
SYNC_DIR_DEVICE=${DEVICE_PATH}/wwwroot
ATTACH_LOG=${HOME_DIR}/dokuwiki-attach.log
UNISON_STDOUT=${HOME_DIR}/dokuwiki-attach-unison.stdout
UNISON_STDERR=${HOME_DIR}/dokuwiki-attach-unison.stderr
USER=username
GROUP=username
# backup old logfile
if [ -f $ATTACH_LOG ]
then
/bin/cp $ATTACH_LOG ${ATTACH_LOG}.bak
fi
# remove old logfile anyways...
/bin/rm -f $ATTACH_LOG
# create new logfile
echo -e $(/bin/date) >> $ATTACH_LOG
echo >> $ATTACH_LOG
# mount USB device => IMPORTANT TO DO THAT NOW!!!
/bin/mount $DEVICE_PATH
# Obviously udev does mount the volume AFTER this script, no matter, what
# order specified by the preceding numbers... :-?
# This could however also be done via another udev rule... :-/
# cleanup DokuWiki installations: (path to datadir, number of days)
echo -ne "Clean up local DokuWiki installation..." >> $ATTACH_LOG
cleanup ${SYNC_DIR_HOME}/data 14 && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
echo -ne "Clean up USB DokuWiki installation..." >> $ATTACH_LOG
cleanup ${SYNC_DIR_DEVICE}/data 14 && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
# set environment variable $HOME => NEEDED BY UNISON!!!
export HOME=$HOME_DIR
# Unison run
echo -ne "Synchronizing USB and local wiki..." >> $ATTACH_LOG
/usr/local/bin/unison $SYNC_DIR_HOME $SYNC_DIR_DEVICE \
-auto -batch -silent -log -fat -ui 'text' \
-ignore 'Path data/cache' \
-ignore 'Path data/locks' \
-backuploc 'local' -maxbackups '5' -backup 'Name *' \
-merge 'Name *.changes -> cat CURRENT1 CURRENT2 > NEW && sort -u NEW && uniq NEW' \
> $UNISON_STDOUT 2> $UNISON_STDERR
case "$?" in
"0") echo " done." >> $ATTACH_LOG ;;
*) echo " FAILED!" >> $ATTACH_LOG ;;
esac
echo >> $ATTACH_LOG
# Ownership & Permissions
echo -ne "Changing ownership of .unison-files (probably been changed by udev-invoked unison)..." >> $ATTACH_LOG
cd ${HOME_DIR}/.unison
find . -type f -print0 | xargs -0 chown $USER:$GROUP && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
echo >> $ATTACH_LOG
cd $SYNC_DIR_HOME
echo -e "Setting up local permissions..." >> $ATTACH_LOG
echo -ne "\tFiles..." >> $ATTACH_LOG
find . -type f -print0 | xargs -0 chmod 0660 && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
echo -ne "\tDirectories..." >> $ATTACH_LOG
find . -type d -print0 | xargs -0 chmod 0770 && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
echo -e "Setting up local ownership..." >> $ATTACH_LOG
echo -ne "\tFiles..." >> $ATTACH_LOG
find . -type f -print0 | xargs -0 chown $USER:www-data && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
echo -ne "\tDirectories..." >> $ATTACH_LOG
find . -type d -print0 | xargs -0 chown $USER:www-data && echo " done." >> $ATTACH_LOG || echo " FAILED." >> $ATTACH_LOG
cd $HOME_DIR
chown $USER:$GROUP $ATTACH_LOG ${ATTACH_LOG}.bak $UNISON_STDOUT $UNISON_STDERR
exit 0
----
The script works fine to what I expected. I have to admit, that I mount my usb stick via /etc/fstab, which isn't good style (udev would be better), but that shouldn't be focussed here...
Currently, I "merge" only the *.changes files, because they are the only ones, I seem to understand. But there are still other files, that maybe need to be merged... so here are my notes on the files:
*.changes
- meaning: changelog of pages
- lines with identifier (probably depending on time and unique) at the beginning => sortable!
- merge command: (merge, sort, remove duplicates)
-merge 'Name *.changes -> cat CURRENT1 CURRENT2 > NEW && sort -u NEW && uniq NEW' \
-nice thing about that: I only use standard unix command line tools
*.idx
- meaning: No idea.
- Do they have to be merged? Or is "prefer newer" enough?
- Should I ignore the files and better launch some index building script after the sync?
*.changes.trimmed
- meaning: No idea.
- By now, I've got only some (not for every page!) empty files...?
*.indexed
- meaning: No idea.
- All files that I examined, contained just the number "2"...?
- Is "prefer newer" sufficient?
*.txt
- meaning: current page files
- Is there already an identical version in the attic? Does Dokuwiki create an attic archive file of the current page state each time a page is saved?
=> yes => "prefer newer", older versions are anyway imported as attic archive files and via corresponding *.changes files.
=> no => prefer newer but manually store older version in attic (Uaaahhh...!!)
- convention of names?
- modify page changelog?
*.meta
- I've read the documentation on metadata, but it's not that clear to me...
- If for the *.txt files one should just do "prefer newer", then this would apply here, too (wouldn't it?)?
- What if not?
I think, this is a good start for a sync script. As you can see, I only need to understand the file structure of Dokuwiki a little more - what file for what purpose and how they are built. I couldn't find anything related to this in the FAQ/Manual/Wiki, so can anybody explain the files, their meaning and content to me or maybe just point me in the right direction?
Thanks a lot in advance
Wolle