Hi I’m Jon and I run nofusscomputing.com, no I am not a huge company quite the contrary, just an individual with an over obsessive hobby with computers. Firstly I would like to apologize for the long post but bare with me I believe that all of it is necessary....
I use DokuWiki (V: 2014-09-29) on my website–
http://nofusscomputing.com/wiki/, mainly to document my own network (Mainly Closed), or to assist people with whatever service or feature I am providing (Read Only). all wikis generate their own sitemap which is referenced by my sites sitemap index. now some of these wiki’s will be over 50,000 pages and as far as I’m aware DokuWiki doesn’t conform to the sitemap protocol for these large wikis, in that case I will cross that bridge later. I use Search index Manager extension for the indexing and caching of this site to speed up load times.
My setup consists of the main wiki (
http://nofusscomputing.com/wiki/) a Farm and sub wiki’s as information databases setup as animals i.e.
http://nofusscomputing.com/database/application/home. This setup enables me to open/close and release each component of my information databases when required. Current Size is around 40 Pages for the main wiki and the above animal with 65. the main reason for using a wiki instead of creating my own site pages is quite simple.... to keep costs down and DokuWiki is simple, well developed and basically suited my needs nicely.
Background
In my spare time I write windows software that eventually will be of assistance to any PC user. One part of this software collects system inventories of the machine it is installed upon; this data in turn is stored within a MySQL database and then parsed by my own software to dynamically create wiki pages. Currently I have no issues with getting the data from the database and programmatically creating the wiki pages therefore creating presentable and hopefully useful information.
Issue
This new wiki animal will provide file information. Initial start size will be around 200,000 pages and I’m expecting it to grow into the millions.
Over the weekend I started to migrate data to a new information database, I wrote the scripts required to pull the data from MySQL to create the pages (initial page creation was only 5000 pages) it was a simple enough task....but..... indexing/caching was a little slow.
ok... narrowed it down very quickly to the web server having very small hardware specs (QNAP TS-110... don’t laugh too loud it has served me well) so I pulled out an HP DC7900 SFF (Intel core2 Duo E8400 3.00Ghz, 4GB RAM, 1TB HDD (NON-Raid) and WIN7 w/WAMP2) and my Intel Bare bones server (Intel XEON old probably 2.8Ghz (Quad Core), 8GB RAM, RAID 1 WIN SVR 2003+WAMP2) after many hours of testing the Intel Server won hands down but I was still happy with the performance of the HP DC7900. All machines shared the same common degradation of indexing/caching over time the caching/indexing time decreases to a point that if I was to index much more than 5000 pages my wiki would be in a constant state of indexing/caching the new wiki pages, surely I have missed something.......
Initial namespace setup was based off the filename which is a SHA256 hash of the file in question, the hash was broken up into 2 chars per subfolders ending around 19-20 folders deep, this is a no no by the way..... too many folders to scan for indexing and is slow, but on a positive not there would be no more than 256 objects per folder. in the end and just for testing i decided to go first two hex digits of the hash as the sub folder and then place the file inside. still don’t know how I’m going to set up the folder structure apart from try to find a way to keep it no higher than around 16,000-17,000 objects per folder as this is when I started to notice the machines start to slow down heaps.
Results
Times to index/cache the 5000 pages, by counting the web server log entries
Qnap TS110 after 8 hours I stopped indexing, started out indexing around 1 page every 2-3 seconds.
HP DC7900 first 10Mins was around 1.381Pages/Second. 5000Pages takes 4630Secs or 1.0799Pages/Second, page load time nearly instant.
Intel Server only slightly faster and finished around the same time. First 10Mins was around 3.121Pages/Second. 5000Pages takes 2015Secs or 2.4813Pages/Second, page load time nearly instant.
Question
Both servers degrade over time is there any way to fix this? or does anyone know of a way to improve what i am trying to achieve?