Not logged in. · Lost password · Register
Forum: General Help and Support Features and Functionality RSS
Improved search for bigger wiki?
Avatar
molefunk #1
User title: molefunk
Member since Jul 2010 · 50 posts · Location: Mayapur, India
Group memberships: Members
Show profile · Link to this post
Subject: Improved search for bigger wiki?
I have a wiki at http://iskconpress.com with approximately 60.000 pages. It should accommodate growth at least ten times that size, with support for multilingual content. The main challenge I have is search. I tried installing two different search plugins (both sphinx), and although I was able to get it working in the end, they were not exactly what I was looking for.

After working on this project as a part-time hobby for the past five years, I'm getting closer to an official launch. But with a dead slow search, I will have to either disable the search completely, or limit its functionality to quicksearch and pagetitles.

I am curious to learn what others have done for search in larger DokuWiki installations. Right now I'm looking at Elasticsearch, but I have no clue how to interface with DokuWiki's search API (if there is such a thing). My initial wish list for improved search is small, but I can imagine it would grow exponentially with a more powerful search engine.

These are some of the questions that come to mind:

  * Would moving DokuWiki's search indexes to sqlite improve performance (like, drastically)?
  * For a multilingual DokuWiki, will it be possible to keep separate indexes for each language?
  * Is there a way to customise quicksearch, ie. using something else than Pagetitle for search?
  * How about a list of search terms that will go directly to specified pages, without any waiting?
  * How can I limit search to only quicksearch and pagetitles, just in case it doesn't work out?
  * And how to turn off fulltext page indexing? Will turning this off have unintended effects?
  * And where, if any-, can I read up about integrating eg. Elasticsearch into DokuWiki?

I will be happy to hear if anyone has some input or output on these.

Thanks for giving this non-programmer a chance to build something meaningful!
This post was edited on 2016-12-06, 07:40 by molefunk.
Avatar
andi (Administrator) #2
User title: splitbrain
Member since May 2006 · 3179 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
Search seems to be disabled at your site so is the check command. I would be interesting to see how the search performs currently.

Also it would be interesting to see more data about your wiki. What PHP version? What extensions are available? How large are the various index files etc.
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Avatar
molefunk #3
User title: molefunk
Member since Jul 2010 · 50 posts · Location: Mayapur, India
Group memberships: Members
Show profile · Link to this post
Hi Andi.

Thanks for giving your time to respond. I have enabled search, as well as check. I also dispatched the latest Popularity Feedback report (anon_id 0d7cab9251549e76de34453368a161d1). If you haven't access to that data for some reason, here are the index highlights:

index_count      152
index_size       87117170
index_biggest    35700404
index_smallest   5
index_avg        573139.27631579

I am running Nginx on Ubuntu 14.04 (soon upgrading to 16.06 as part of a datacenter change). PHP version 5.5.9. Active plugins:

acl, addnewpage, authplain, blockquote, bureaucracy, changes, comment, config, confmanager, croissant, csstimeline, data, definitionlist, description, extension, folded, gallery, hidepages, htmlmetatags, icons, imagebox, include, info, message, notfound, nslist, pagenav, pagequery, pagetitle, popularity, purplenumbers, qna, redirect, revealjs, safefnrecode, searchtext, simplenavi, sqlite, starred, styling, text, toctweak, translation, upgrade, usermanager, userpagecreate, vshare, wrap

While testing Sphinx search engine I experienced that larger searches took very long to display in DokuWiki, although the search itself was carried out quickly on the server. Unfortunately I didn't have the time to troubleshoot at the time, otherwise it could perhaps have worked out nicely with Sphinx. This is the one I tried last: https://github.com/abiliojr/bettersearch

One reason for looking into separate search engines is the ability to do advanced things like stemming (mainly for diacritical characters). There are many other areas of search I want to improve as well, but that will have to come later when I can afford someone to help me. For now I am very grateful for any insights you can share with me, or if the inbuilt search can be improved in any way before I launch.

Thanks.
Avatar
andi (Administrator) #4
User title: splitbrain
Member since May 2006 · 3179 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
Thanks for the feedback. The search is indeed a bit on the slow side. A newer PHP version might improve the speed slightly. The searchtext/text plugin might be responsible for some speed loss. Also quicksearching in titles is probably somewhat slower than in pageids.

The easiest way to increase your search speed without any (or much) programming is replacing the built in search with a Google Custom Search Engine (CSE).

Otherwise one of the mentioned sphinx plugins might be a good idea. Or building something completely new.

Re the speed displaying vs. actual searching, I guess adding pagination would be a must.
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Avatar
molefunk #5
User title: molefunk
Member since Jul 2010 · 50 posts · Location: Mayapur, India
Group memberships: Members
Show profile · Link to this post
Thanks. I will revisit this thread again, after upgrading to PHP 7 and experimenting with the few things you mention.

I'm about to commit 40,000 more pages (sanskrit dictionary files), which will push the total to well over 100,000. I guess I'm in for an even slower search. So then, disable it is, for now at least.

CSE is unfortunately not a good option for us. I wrote to Goggle 10 years ago regarding several issues we had with their search engine, related to diacritical characters (similar project on MediaWiki). First of all, they gave me a semi-automated answer. Secondly, they didn't seem very much interested to change their search engine behaviour! I believe I will not have better luck trying again today.

When I have launched my initial offering, I will invest some more time and energy into search engines. Sphinx seems fairly easy to work with, but I also keep hearing good things about ElasticSearch. Does anyone here have experience with these, or other search engines? And especially related to connecting them with DokuWiki. (@Andi: Is there a dokuwiki page for search engine API/integration?)

I'm planning to make it one of our Fundraiser goals to outsource development and integration of a better search engine. Anyone have experience with outsourcing DokuWiki development to "cheaper" countries, like India (where I'm living, btw)?
Avatar
andi (Administrator) #6
User title: splitbrain
Member since May 2006 · 3179 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
I integrated DokuWiki with ElasticSearch for a customer a while ago. If paying a German company (not as cheap as India) is an option for you contact me at dokuwiki [at] cosmocode.de

Sanskrit may be also problem with the DokuWiki search engine depending on how the language works (I don't know much about it). Does it separate words by spaces? If not, the search index will be less effective.

For integrating your own engine, you will probably disable the normal indexing and do it by yourself (using an action plugin) and you will also replace the normal search with your own command (also an action plugin). the existing plugins should provide some starters.
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2017-10-24, 00:46:22 (UTC +02:00)