Not logged in. · Lost password · Register
Forum: General Help and Support Syntax and Usage RSS
bulk import
dglp #1
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
Subject: bulk import
I'm just getting started with DokuWiki, and am unclear about how to create pages from existing documents, such as the folder full of .txt files I have on my hard drive. Do I need to create a wiki elsewhere and importfrom that format? Do I use the media manager to upload a bunch of files? Do I create an SQL query?

I've read a few pages of documentation - how to create pages, that sort of thing - but don't see anything about bringing in existing data and tree structures.

Can you give me a pointer to the info I need?

HansBKK #2
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
It's easiest if you have direct access to the DW's filesystem, and I suggest keeping this under version control when you're mucking around there directly as much as I do.

Create a couple of test pages and name spaces and you'll see where DW puts them. Just put your txt files in the same place with appropriate file and folder names, and when you navigate to them the first time DW will create indexing metafiles so the next time will be faster. You can use a spidering tool as well, or use the rebuild index command.

Sorry I don't have access at the moment to give you more details, hope this gets you started.
dglp #3
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
That sounds good!

At the moment, the few pages I've created are in /public_html/dokuwiki/data/pages, so it sounds like I can upload files there.

Whilst the page-by-page navigation sounds a bit piecemeal, it will work as I update a given page.

I haven't come acorss the rebuild index command - is ther a description about?

will report how I get on once I've tried it!

HansBKK #4
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
Here is the full contents of my notes on this topic as I was learning about DW - some might be a bit outdated by now. Googling will help with further details, or post more specific questions here if you like.

Note that much of this applies to a specific use case, where I completely wipe and regenerate my DW's content with outside tools. This assumes that all the meta-data is expendable, so if it's not in your case you will not be able to use that workflow.

filesystem locations and meanings

Don't rely on the version control - possible to turn it off? (asked in the forum)

Do frequent commits to SVN

HTTRACK as a spidering tool

rebuild the searchindex

Erase entire cache and/or old wiki revisions

flat files data storage

Datafiles are stored in plain-txt, so:
    can easily be read, modified, compared, auto-generated via external (non-wiki) programs - owner must be www-data, or set permissions to  666?

    are readable even if your server goes down
    easy to back up, through server scripts or FTP/sFTP

moving from staging/development to production/publishing server

copy the files inside conf/ and data/ to your new DokuWiki-installation

> Just copy the files over. DokuWiki will automatically build the search and meta indexes.

backups are really simple. All you need to do is copy the files. The easiest way is just to backup your whole DokuWiki directory.

If you want to save some space you should at least backup the following directories:

Remember to back up all raw text files and data.

renaming/moving txt files in the file system or gz files in the attic
  will break links and metadata. prevents DokuWiki from keeping a coherent page history (old revisions). 

Via FTP using wget

FTP-Login stored in $HOME/.netrc

machine login USER password PW

#!/bin/sh -e
# backup data from wiki (FTP)
wget -q --no-cache -nH -c -t0 --mirror -P$backup -i- <<EOF

cache, locks, and index - no problem
  meta and attic if you're willing to lose those
don't delete the folders themselves

  purpose of empty files named ‘_dummy’ - empty at installation time, but still must exist

start from a clean start / blank state as follows (again under Unix) but be warned that this will mean losing all your historical information (i.e. recent changes):

    cat /dev/null > /dokuwiki_base/data/changes.log
    rm -Rf /dokuwiki_base/data/attic/*
    rm -Rf /dokuwiki_base/data/cache/?
    rm -Rf /dokuwiki_base/data/cache/*.idx
    rm -Rf /dokuwiki_base/data/cache/purgefile

Problem from full-text search - doesn't index short words (3 characters or less)
  Do not create pages with short names, including this: is_not_OK
  Backlinks search uses the full-text index, so short page names don't show up!

alternative ways to create pages ("standard" way being to create a link first)

enter the name of a new page directly in the search field
  Create this page

create a .txt file inside the directory (namespace) wherein you want it to reside. The name of the file is the name of the page.

URL in your Browser

plugins tagged with 'create'

NOTE: Make sure you properly link your newly created page from other pages, when using one of the alternative methods. Otherwise no one may find your page. (Though these pages may still be found by the index feature or through the search engine.)

Renaming pages

  Create a page with the new name, and copy the content. Instead of erasing the old page, it is better to replace the content by:
    This page has been moved to [[new_name]]
  Get a list of backlinks by clicking on the page name appearing at the top
    plugin for embedding list in page:

Just doing it in the filesystem will break meta information
  [Q?] Test, minimize impact - and history (/attic, no problem)

Helper Tools for renaming pages
  seems more robust than

but neither fix back-links to the old address to point to the new. Editx leaves a redirect page in place,

check out this for auto-redirect
  not sure if this works with newer versions
      but it will allow redirect to specific anchor/section!

and you can edit the backlink source pages manually over time

  bash script for handling inline backlinks

    for f in $(find . -type f -name "*.txt"); do sed -i 's/[[namespace1:namespace_old/[[namespace1:namespace-new:/g' $f; done

other examples here:
  you will lose your page revision history this way

Old Revisions, version control


$conf['mediadir'] = 'data/pages';
$conf['mediaweb'] = 'data/pages';
  allows ftp sync to the pages folder?

removed empty directories **under** these (not e.g. "attic" itself)

   e.g. find "$1"/{attic,cache,index,locks,media,meta,pages,tmp}/ \
          -mindepth 1 -type d -empty -print0 | xargs -0r rmdir
dglp #5
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
Lots of potentially useful info there.
Overall, it implies that there are various ways one might go about importing a set of existing documents.

In some respects that might also mean I could contrive a method that suits my purposes.
So I've tried the following, but took some guesswork to succeed, so there are some gaps in the way I did it.

1. I installed the SearchIndex plugin, and seen it recgonise new txt files I've uploaded to the Pages directory.
2. Then navigated to one of uploaded pages using a string like ...dokuwiki/doku.php?id=NewFile1, but getting an entirely new page instead.

This seemed like the fastest way of getting the job done, but clearly I had missed out on something.
What I expected to see was the uploaded pages indexed on the sitemap, or directly via the URL.

3. I renamed the txt file to newfile1, and reloaded the page. This worked!

So it seems there are naming conventions to be followed in the uploaded pages.
Is lowercase required? Are spaces allowed? What about other special characters like ( and ) ?
HansBKK #6
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
Re pagename conventions, just create pages with whatever names you like through the normal UI. Have a look in the filesystem for the resulting filenames and you'll see what DW expects.

Re navigation, the usual wiki way is to start at a home/start/front page with topic links and from there, reference pages from links via the page name. There are also navigation plugins and themes with automatic menuing based on namespace indexing.

Are you sure this stuff isn't documented anywhere? I recommend reading all the docs as opposed to trial and error, much less posting questions here that have already been addressed there. . .
dglp #7
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
Looks like I haven't been quite as clear as needed in my descriptions.

I'm looking at the mass indexing of files uploaded by FTP, and wondering why some are indexed but not others.
It looks like DW wants the uploaded filenames to contain only lowercase, no spaces, no parenthetic brackets.

I'd be happy if that could change, but will leave questions abot that aside for now.

I understand the usual navigation process - and that navigating to a 'new' page creates a temporary version of that page for further editing. However, if a page already exists, navigating by URL should bring up the existing page. When that doesn't happen, it's because the indexer has not registered the page. It is a test of the page status, in other words. Some pages don't return....

I agree about reading the docs, and try to use the search engine - but sometimes a human is needed to provide a clearer sense of direction! For which, thank you! A prompt reply is always good motivation!
HansBKK #8
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
I realize you're working directly in the filesystem, but rather than "bulk uploading" I would suggest just doing a few files at a time until you've finished learning and become used to the filename restrictions. If you create a link to an existing file it *will* come up, but only if you got the name right as per DW's rules.

What I usually do is work with a DW instance running locally as my "staging server", and then sync the appropriate folders up to the production one online. If you're on Windows, I highly recommend XAMPP, very easy to setup, even on a removable drive or USB stick.

This will also incorporate at least one layer of ongoing backups into your workflow.
dglp #9
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
Good points.
In essence, I've been looking for a pairing of desktop and web-based wikis. I need to share docs with a group, but dislike working with textarea / browser-based editors. So have tried to find a pair of web and desktop applications that work well together. I haven't found a pairing that uses the same syntax, but DokuWiki is fast, and as I'm finding, relatively easy to manage file transfers and indexing. Whilst I could set up XAMPP on a stick, it's still using a textarea editor, so I'm holding out for my preferred option!

For the desktop, I'm currently using Zim, and doing a bunch of syntax conversion things to make it DW-ready. That creates the backup I need. So whilst I'm trying to work out the intricacies of DokuWiki on one side, I'm also trying to find ways to set up a desktop wiki editor on the other. In doing that, I'm having to make compromises in the way I want to work, given the conditions required by each editor/system.

Under some ideal scheme, I'd be able to email the edited pages to DW, because then the other group members would feel comfortable making their own changes...

By the way - if I get this system to work, I'd be happy to write up a step-wise account for posting somewhere...
This post was edited 3 times, last on 2012-02-25, 09:34 by dglp.
HansBKK #10
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
I didn't mean so much for the editing side of things, as the "getting to know DW", I find it very liberating to know I can experiment to my heart's content, wipe the whole thing and start over if need be, while the production server isn't touched until I know everything's OK on the scratch/staging side. I also use a real VCS rather than relying on DWs' version management.

For me, the whole point in both Zim and DW is being able to use whatever text tools, including editors, I prefer. I don't need WYSIWYG rendering myself, and set up my hotkeys for markup patterns at the OS level so they work in any app - including textareas if needed.

I'd also be **very** interested in anything you do to enable syntax conversion between Zim and DW - please take detailed notes on any differences you find.
dglp #11
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
Zim is idiosyncratic - and I would be happy to find a more consistent editor.

There are also some environmental differences between a desktop application and the web.
DW (textarea) doesn't like working with tab characters, and Zim seems to like removing leading space characters.
So when I'm trying to do a bulleted list in one and preserve it in the other, it's a constant battle.

Zim also does a wierd thing with headings.
After one types in the desired numebr of equal signs, and does a carriage return, ZIM applies the heading style, but leaves the trailing symbols visible.

So in some respects, there's no consistent way of reformatting a ZIM doc to WD.
HansBKK #12
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
Again, the beauty for me of both Zim and DW is that users never have to use the Zim/DW UI for doing any editing if they prefer vim, emacs whatever coding editor is best for them. Both tools work just fine as a "transparent carrier" platform for plaintext files no matter what arbitrary syntax they're marked up with.

It would just take a little work to create tools to convert from one syntax to the other, again, not talking about doing this within their UI or even interactively, just take a folder tree of .zim.txt files and convert it to one populated by .dw.txt files via scripting, basically a bulk import/export tool to go in either or both directions with round-trip fidelity.

For such a tool to be of maximum utility, IMO it would be done as a pair of reader/writer modules within Pandoc. If I had the money I'd sponsor such a project, or if I were really rich, I'd learn to do it myself 8-)
dglp #13
Member since Feb 2012 · 7 posts
Group memberships: Members
Show profile · Link to this post
I am pretty sure I see what you mean... and it's one of the reasons I like the flat-file, txt-based systems.

For me, the appeal of things like Zim is the instant appearance of results.
When I do **some text**, or '-some list' I see the result by the next line.
In contrast, many editors require a laborious process of saving, scrolling and searching for results among the larger work.

So while a bulk conversion and transfer of txt files is easy and effective, each file still requires some editing to make sure it conforms.
Once I havefully memorised Dokuwiki syntax I should be able to edit directly in a text editor (Notepad++), but I still don't see the results of the markup until I go to a viewer/browser.

I found a MS Word macro that does some conversion to wiki format, but it's not set up to do the particular conversions I need.
I had also thought of trying a  desktop script editor (AutoHotKey, MacroToolworks), but these become far too convoluted.
This post was edited on 2012-02-27, 12:54 by dglp.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2020-04-10, 08:49:38 (UTC +02:00)