Not logged in. · Lost password · Register
Forum: General Help and Support Installation and Configuration RSS
Large Wiki Setup-Advice needed, Server Caching/Indexing slows over time
Avatar
jon_nfc #1
Member since Feb 2015 · 6 posts · Location: Australia
Group memberships: Members
Show profile · Link to this post
Subject: Large Wiki Setup-Advice needed, Server Caching/Indexing slows over time
Hi I’m Jon and I run nofusscomputing.com, no I am not a huge company quite the contrary, just an individual with an over obsessive hobby with computers. Firstly I would like to apologize for the long post but bare with me I believe that all of it is necessary....

I use DokuWiki (V: 2014-09-29) on my website–http://nofusscomputing.com/wiki/, mainly to document my own network (Mainly Closed), or to assist people with whatever service or feature I am providing (Read Only). all wikis generate their own sitemap which is referenced by my sites sitemap index. now some of these wiki’s will be over 50,000 pages and as far as I’m aware DokuWiki doesn’t conform to the sitemap protocol for these large wikis, in that case I will cross that bridge later. I use Search index Manager extension for the indexing and caching of this site to speed up load times.

My setup consists of the main wiki (http://nofusscomputing.com/wiki/) a Farm and sub wiki’s as information databases setup as animals i.e. http://nofusscomputing.com/database/application/home. This setup enables me to open/close and release each component of my information databases when required. Current Size is around 40 Pages for the main wiki and the above animal with 65. the main reason for using a wiki instead of creating my own site pages is quite simple.... to keep costs down and DokuWiki is simple, well developed and basically suited my needs nicely.

Background
In my spare time I write windows software that eventually will be of assistance to any PC user. One part of this software collects system inventories of the machine it is installed upon; this data in turn is stored within a MySQL database and then parsed by my own software to dynamically create wiki pages. Currently I have no issues with getting the data from the database and programmatically creating the wiki pages therefore creating presentable and hopefully useful information.

Issue
This new wiki animal will provide file  information. Initial start size will be around 200,000 pages and I’m expecting it to grow into the millions.

Over the weekend I started to migrate data to a new information database, I wrote the scripts required to pull the data from MySQL to create the pages (initial page creation was only 5000 pages) it was a simple enough task....but..... indexing/caching was a little slow.

ok... narrowed it down very quickly to the web server having very small hardware specs (QNAP TS-110... don’t laugh too loud it has served me well) so I pulled out an HP DC7900 SFF (Intel core2 Duo E8400 3.00Ghz, 4GB RAM, 1TB HDD (NON-Raid) and WIN7 w/WAMP2) and my Intel Bare bones server (Intel XEON old probably 2.8Ghz (Quad Core), 8GB RAM, RAID 1 WIN SVR 2003+WAMP2) after many hours of testing the Intel Server won hands down but I was still happy with the performance of the HP DC7900. All machines shared the same common degradation of indexing/caching over time the caching/indexing time decreases to a point that if I was to index much more than 5000 pages my wiki would be in a constant state of indexing/caching the new wiki pages, surely I have missed something.......

Initial namespace setup was based off the filename which is a SHA256 hash of the file in question, the hash was broken up into 2 chars per subfolders ending around 19-20 folders deep, this is a no no by the way..... too many folders to scan for indexing and is slow, but on a positive not there would be no more than 256 objects per folder. in the end and just for testing i decided to go first two hex digits of the hash as the sub folder and then place the file inside. still don’t know how I’m going to set up the folder structure apart from try to find a way to keep it no higher than around 16,000-17,000 objects per folder as this is when I started to notice the machines start to slow down heaps.

Results
Times to index/cache the 5000 pages, by counting the web server log entries
Qnap TS110 after 8 hours I stopped indexing, started out indexing around 1 page every 2-3 seconds.
HP DC7900 first 10Mins was around 1.381Pages/Second. 5000Pages takes 4630Secs or 1.0799Pages/Second, page load time nearly instant.
Intel Server only slightly faster and finished around the same time. First 10Mins was around 3.121Pages/Second. 5000Pages takes 2015Secs or 2.4813Pages/Second, page load time nearly instant.

Question
Both servers degrade over time is there any way to fix this? or does anyone know of a way to improve what i am trying to achieve?
This post was edited on 2015-02-02, 08:27 by jon_nfc.
Avatar
andi (Administrator) #2
User title: splitbrain
Member since May 2006 · 3450 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
for completeness: http://stackoverflow.com/questions/28274286/large-wiki-set…
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Avatar
andi (Administrator) #3
User title: splitbrain
Member since May 2006 · 3450 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
You mention in your stackoverflow comment that search will be handled by Google CSE anyway. In that case it might be simplest to completely disable the creation of the search index.

However I'm not quite sure how to best go about that. The page index is also used by other non-search related indexing tasks (eg. meta data).

Maybe others have an idea.
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Avatar
jon_nfc #4
Member since Feb 2015 · 6 posts · Location: Australia
Group memberships: Members
Show profile · Link to this post
Quote by andi:
In that case it might be simplest to completely disable the creation of the search index.

ok, so pulling the info out of both your responses  (Stack Overflow, Above) it reminded me that no I don’t need to use indexing so why am I indexing when the end user can index the page on first use, which would cache the page for me.... I think the trade off for this will elevate my initial issue, I don’t foresee this being an issue but I will monitor this.
Basically I won’t be using any plugin that will require any meta data, or indexed data; all pages including the wiki home URL will be dynamically created. by the page creation scripts.

 
Quote by andi:
However I'm not quite sure how to best go about that. The page index is also used by other non-search related indexing tasks (eg. meta data).
 

What is the metadata used for on a vanilla install of DokuWiki? (yes I'm lazy, just kidding I will be looking in the docs as-well).

If there is no config option to disable this I may have to go source code diving to create a patch where it can be disabled via a config option. would anyone recommend this course of action? or is there a better way of doing this?
Avatar
Michitux #5
Member since Apr 2008 · 377 posts · Location: Karlsruhe, Germany
Group memberships: Members, Wiki Managers
Show profile · Link to this post
DokuWiki uses the metadata index for the backlinks feature and for media usage (displayed on the details page afaik at least in the development version). However the text index could be deactivated independent of that metadata index (however there is no such configuration option), but I'm not sure if that will help solving all issues. Furthermore I think it would be a nice feature to make the search index replaceable so it could be replaced by a more scalable index like Apache Lucene (also for the metadata index) - which is an option with enough resources like on a dedicated server but probably not on a small shared webspace.

A nicer option would be to implement a simple but still more powerful index in pure PHP - I have ideas for that but currently no time in order to realize them.
Did you like my help or work for DokuWiki (plugins)? Consider giving something back.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2019-07-23, 01:26:57 (UTC +02:00)