Fake registrations

gushippo

I am noticing an increase in the number of people registering on my read-only website. Everything is shut down so they are not doing any damage - yet. Is there a way to strengthen captcha to make it more effective? Failing that does anyone have advice as to which of the five options that one has is the strongest or most effective. Thank you.(Just as an aside, my little problem is very, very little compared to that of my university where things are sometimes like a war zone in dealing with spam attacks!)

turnermm

If it's read only, why don't you just disable registration using the configuration manager? Go to "disable actions".

sihmaster

I have the "gushippo" problem too, but users need to post (write) on my wiki (version Release 2012-01-25 "Angua").
Any idea?

Untill no will solve, I'll follow "turnermm" suggest.

diwa-ffm

Since I have the same problem here (about 100 registrations yesterday), I wonder if there is a way to stop this?
Maybe a capcha-plugin?

To disable the registration is no option for me.
I need my users ;-) And I have to scan the fakes for real new users...

Ciao

dirk

michitux

Have a look at this post on the mailinglist: http://www.freelists.org/post/dokuwiki/spam-registrations

I can also confirm the problem, I had a couple of spam registrations a week ago even though the captcha plugin was installed and enabled with the image + audio option. I've simply disabled the registration back then as I don't expect any real registrations anyway (and I put a note on the page to contact me by mail). I've also experienced an increasing number of spam trackbacks through the blogtng plugin, for now I applied the same solution - I simply disabled them.

At least here in Germany using such external services as suggested in the post on the mailinglist is a bit problematic from the privacy/data protection point of view (though I'm wondering if one could let the users agree to that during the registration). I think a proper solution shouldn't only rely on external services but should use different techniques like fake inputs that are hidden by CSS (but filled out by some bots), strong captchas like the ones from reCaptcha and maybe also manual account approval (i.e. provide an additional input where a note for the admin can be left and then send an email to the admin and let him decide if the account shall be approved or not).

ach

Michitux wrote I can also confirm the problem, I had a couple of spam registrations a week ago even though the captcha plugin was installed and enabled with the image + audio option.
[...]
I think a proper solution [...] should use different techniques like fake inputs that are hidden by CSS (but filled out by some bots)

... which the captcha plugin provides. It would be interesting to know if enabling that in your wiki would make any difference!?

diwa-ffm

I installed the captcha-plugin yesterday - and there were no fake-registrations since then.

I use image + audio and I have the "protect registration form" marked.

Ciao

dirk

michitux

ach wrote
Michitux wrote I think a proper solution [...] should use different techniques like fake inputs that are hidden by CSS (but filled out by some bots)
... which the captcha plugin provides. It would be interesting to know if enabling that in your wiki would make any difference!?

No, what the captcha plugin does is a bit different. The captcha plugin expects that a certain input is filled using some JavaScript code. What the technique I suggest does is rather to add an input of which it is expected that it is *not* filled in any way. The input should be a regular text input that is just hidden using CSS (i.e. fully visible to a bot unless it does interpret CSS) and is thus filled by some bots. This definitely doesn't work for all bots and I don't know how well it works at all, the German Antispam Bee Wordpress plugin is offering this technique as part of a lot of other measures.

[edit] Though it could be that the bots that fill out inputs that are hidden by CSS are already detected by the captcha plugin as it's possible that they change the value that is filled in by the captcha plugin or don't execute the JS code at all.

I'm just worrying that relying on just one single solution means that as soon as somebody implements a bot which is a tiny bit more intelligent or specifically targeted at DokuWiki we will have a problem.

ach

Michitux:1365153192 wrote
ach wrote
Michitux wrote I think a proper solution [...] should use different techniques like fake inputs that are hidden by CSS (but filled out by some bots)
... which the captcha plugin provides. It would be interesting to know if enabling that in your wiki would make any difference!?
No, what the captcha plugin does is a bit different. The captcha plugin expects that a certain input is filled using some JavaScript code. What the technique I suggest does is rather to add an input of which it is expected that it is *not* filled in any way. The input should be a regular text input that is just hidden using CSS (i.e. fully visible to a bot unless it does interpret CSS) and is thus filled by some bots.

Ah, I see. I think it would be really easy to add that to the current js option. Just add a second field with the label explaining "Don't enter anything here" and hide it with CSS.

hitch

I'm also getting bombarded with these fake registrations, all day, every day. I'd really appreciate an update to the captcha plugin, even if it's just to test to see if some of these ideas work, or if there's some other security measure the DokuWiki can apply.

andi

ach:1365252734 wrote
Michitux:1365153192 wrote No, what the captcha plugin does is a bit different. The captcha plugin expects that a certain input is filled using some JavaScript code. What the technique I suggest does is rather to add an input of which it is expected that it is *not* filled in any way. The input should be a regular text input that is just hidden using CSS (i.e. fully visible to a bot unless it does interpret CSS) and is thus filled by some bots.
Ah, I see. I think it would be really easy to add that to the current js option. Just add a second field with the label explaining "Don't enter anything here" and hide it with CSS.

FYI I just added this to the CAPTCHA plugin. I also made it randomize it's field names so it should be harder to run the same bot against multiple wikis. And because I had a run I also added a new Math mode, even though I personally hate these because I suck at doing math in my head ;-)

rpschmitz

Thanks andi,

I've had this problem too and recently installed captcha and I think it's working but, since I'm an engineer, I like the math problem option and just updated this plugin. Right now I put all new registrations in the default group "newbie". One thing I'd like to do when folks register is to send them an automated email from the "webmaster" informing them they have "read only" permissions and if they would like to edit, create or contribute to the discussion they'll need to contact the webmaster to get their permissions upgraded.
Is there a way to do this, I haven't seen a plugin that would work. Thanks.

Bob

robmacl

I think people are specifically targeting Dokuwiki. I've also seen searches that are clearly targeting Dokuwiki. On Google webmaster tools, I have 700 impressions on "dokuwiki supports some simple markup language"

Maybe this bot is targeting more than one wiki, but yesterday I had about thirty bogus new users who created various spam "profile" pages in random locations. That requires some knowledge of Dokuwiki syntax. I have captcha turned on too, though all my versions are old. I've turned off registration, and am updating now.

ach

robmacl wrote I think people are specifically targeting Dokuwiki.

I really don't think that's the case! DokuWiki will just be another tool added to the targeted mix of tools. MediaWiki suffers the same, see e.g. https://wiki.mozilla.org/index.php?title=User:BeulahYiq[/m] or [m]http://sswlink.spb.ru/kb/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:AlisonSim (not added as real links, because I don't want search engines to follow) and https://wiki.mozilla.org/Special:RecentChanges. Those are clearly coming from the same source.

robmacl wrote Maybe this bot is targeting more than one wiki

Most definitely. During the last two months there have been at least 50 users or so complaining about this same issue. But I think that's because spam in general got more aggressive and increased.

robmacl wrote That requires some knowledge of Dokuwiki syntax.

No, it doesn't. None of the many spam entries I've seen uses DokuWiki syntax except for the link.
And the little knowledge that is required can easily be obtained by one single developer in about an hour, adding this into his spamming software for thousands of others to use without the need to know what they are targeting and what the syntax is.

michitux

robmacl wrote I think people are specifically targeting Dokuwiki. I've also seen searches that are clearly targeting Dokuwiki. On Google webmaster tools, I have 700 impressions on "dokuwiki supports some simple markup language"

If this is actually how they find DokuWiki installations and this is the most effective way of finding them there should be an easy way to disable this: Change the robots header to disallow indexing of all pages that are shipped with DokuWiki. This is possible using the nftr plugin. However I've just had a look at the Google webmaster tools and what I can see are 600 impressions (average position 430) for "Driven by DokuWiki" which is the alt text of the DokuWiki logo in the footer of the default templates. In addition to that I have 35 impressions for "dokuwiki.txt" (position 740). While disabling indexing of the default pages should prevent the second result it won't help with the "Driven by DokuWiki". Unfortunately according to Wikipedia (noindex) Google doesn't support any way to suppress indexing of certain parts of the page. One could of course remove that "Driven by" but I don't think that's a general solution.

robmacl wrote Maybe this bot is targeting more than one wiki, but yesterday I had about thirty bogus new users who created various spam "profile" pages in random locations. That requires some knowledge of Dokuwiki syntax. I have captcha turned on too, though all my versions are old. I've turned off registration, and am updating now.

I think the easiest and most effective solution is using recaptcha. It will annoy users but you can be sure that Google is working on making sure that spam bots can't easily solve the captchas. I fear that it's really difficult to achieve the same thing with the captcha plugin if somebody is specifically targeting DokuWiki/the captcha plugin (I haven't tried image only yet but at least image + audio didn't solve the problem for me).

A solution without annoying users might be to use IP blacklists and to add support for user registration approval (yes, it's already possible by not enabling edit access for new users but one could definitely make that easier for admins).

For world-editable wikis and wikis where moderating registrations is no solution some intelligent techniques that recognize spam edits and giving users scores might be worth a try. I imagine something like simple content analysis if the content matches the rest of the wiki and punishing external links. One can of course make that arbitrarily complex using classifiers that are trained on spam/non-spam words and word-pairs by manually marking edits as spam/non-spam (and do some additional checks if chunks of more than 5 words are copied from other pages). Users would then get positive scores for non-spam edits and negative scores for spam-edits and under a certain threshold saving would be denied or only possible after solving a strong captcha.

ach

Michitux wrote
robmacl wrote I've also seen searches that are clearly targeting Dokuwiki. On Google webmaster tools, I have 700 impressions on "dokuwiki supports some simple markup language"
If this is actually how they find DokuWiki installations and this is the most effective way of finding them there should be an easy way to disable this: Change the robots header to disallow indexing of all pages that are shipped with DokuWiki. This is possible using the nftr plugin. However I've just had a look at the Google webmaster tools and what I can see are 600 impressions (average position 430) for "Driven by DokuWiki" which is the alt text of the DokuWiki logo in the footer of the default templates. In addition to that I have 35 impressions for "dokuwiki.txt" (position 740).

I don't think that's a solution, or if it is, it's only a temporary one, because there will always be some things by which your wiki will acknowledge to the world that it is based on DokuWiki. You will never be able to hide everything.

FYI, I just checked my webmaster tools as well. The only suspicious term I have in *one* of the wikis is "dokuwiki.txt", but none in the other.

Although I've had those fake user registrations for over a year, somehow I had no spam at all until last week. So far I've found one reliable technique to stop them: I made all pages in the root namespace read-only. That's not a solution for all wikis, because it depends on how you've structured your wiki, but in my case most pages are in namespaces anyway.
And, of course, it will prove to be ineffective as soon as spammers discover there are such things as namespaces...

djsupport

As previously recommended I enabled the captcha plugin it stopped my spam for two months now I'm getting hit daily again! 1. has anyone got the knowledge to do a 'sortables' captcha? I added that to my phpbb3 forum and spam dropped like a brick! (The user sorts a few fields out into their respective areas and can only register if correct) or 2. like I have previously suggested... these bot's know what they need to fill out to get to complete the registration form but if you could 'disguise' the registration form perhaps that would be more effect for example (I'm learning Javascript as we speak and I could be completely off in how web sites work so ignore me if I'm off the beat completly!) everything is tied to 'ids' etc like the elements on a page if these 'ids' are the same in every dokuwiki installation the bot needs to knows what box needs filling based on knowing what 'id' it is if that makes sense so perhaps if we could randomize the 'id's' of the elements on the registration page specific to someones site the bots won't be able to auto fill boxes nor know which 'button' to press to complete the registration! does this make sense? I'm learning I know but as stated in a previous topic I created the 'Elgg' open source social networking software had issues with bots and we stopped it dead by simply changing the filename of the registration form!

andi

The current CAPTCHA plugin already randomizes the fieldnames and also adds a honeypot field that should *not* be filled.

Also this thread is too much speculation and comparing apples to oranges. We need good data if we want to improve the captcha plugin.

* are registrations done manually or through a bot?
* is spam done manually or through a bot?
* which version of the CAPTCHA plugin is installed?
* which method of the CAPTCHA plugin is used?

Best would be to get our hands on the used Bot/Spam tool. Anyone here who likes to dig through Blackhat forums and find out which software is advertised to spam DokuWiki?

ach

andi wrote Also this thread is too much speculation and comparing apples to oranges.

Yeah, I've noticed (not just in this thread) people often say "I used tool X on software Y and never got spam again". That is no evidence or proof at all, because that was a different bot, a different software, a different time, a different website, and it doesn't account for the whole time (and therefore the "learning ability" of bots).
What would be much more helpful is if you have a DokuWiki currently under attack and tried different anti-spam mechanisms, say a different one each day. That's something you can compare! (Although that cannot be generalised, it will at least be true for this specific bot.)
I find it weird that people haven't even tried the different captcha options. Because I'd be really interested in how their efficiency compares to each other (again, only for this specific bot).

djsupport

Hey Andi,
please relax while this is your software many of us have limited or basic knowledge of 1 web sites as a whole and 2 how dokuwiki works.
I was just trying to use what I think I currently understand about web sites perhaps in a way to trigger an idea in someone's head, we're all in this together plus multiple people aiming for a common goal is better than no-one! If I can helpful. In anyway I will just tell me what you'd like me to do if it involves adding extra code to understand these spam registrations and make it useful for us to diagnose a soloution then let me know!

Global DokuWiki Links