robmacl wrote
I think people are specifically targeting Dokuwiki. I've also seen searches that are clearly targeting Dokuwiki. On Google webmaster tools, I have 700 impressions on "dokuwiki supports some simple markup language"
If this is actually how they find DokuWiki installations and this is the most effective way of finding them there should be an easy way to disable this: Change the robots header to disallow indexing of all pages that are shipped with DokuWiki. This is possible using the
nftr plugin. However I've just had a look at the Google webmaster tools and what I can see are 600 impressions (average position 430) for "Driven by DokuWiki" which is the alt text of the DokuWiki logo in the footer of the default templates. In addition to that I have 35 impressions for "dokuwiki.txt" (position 740). While disabling indexing of the default pages should prevent the second result it won't help with the "Driven by DokuWiki". Unfortunately according to
Wikipedia (noindex) Google doesn't support any way to suppress indexing of certain parts of the page. One could of course remove that "Driven by" but I don't think that's a general solution.
robmacl wrote
Maybe this bot is targeting more than one wiki, but yesterday I had about thirty bogus new users who created various spam "profile" pages in random locations. That requires some knowledge of Dokuwiki syntax. I have captcha turned on too, though all my versions are old. I've turned off registration, and am updating now.
I think the easiest and most effective solution is using recaptcha. It will annoy users but you can be sure that Google is working on making sure that spam bots can't easily solve the captchas. I fear that it's really difficult to achieve the same thing with the captcha plugin if somebody is specifically targeting DokuWiki/the captcha plugin (I haven't tried image only yet but at least image + audio didn't solve the problem for me).
A solution without annoying users might be to use IP blacklists and to add support for user registration approval (yes, it's already possible by not enabling edit access for new users but one could definitely make that easier for admins).
For world-editable wikis and wikis where moderating registrations is no solution some intelligent techniques that recognize spam edits and giving users scores might be worth a try. I imagine something like simple content analysis if the content matches the rest of the wiki and punishing external links. One can of course make that arbitrarily complex using classifiers that are trained on spam/non-spam words and word-pairs by manually marking edits as spam/non-spam (and do some additional checks if chunks of more than 5 words are copied from other pages). Users would then get positive scores for non-spam edits and negative scores for spam-edits and under a certain threshold saving would be denied or only possible after solving a strong captcha.