Not logged in. · Lost password · Register
Forum: General Help and Support Features and Functionality RSS
Not public pages, search-index renew, search engines, (re-)build index tipps&tricks
Avatar
Samana Johann #1
Member since Jun 2018 · 174 posts · Location: Aural/Cambodia
Group memberships: Members
Show profile · Link to this post
Subject: Not public pages, search-index renew, search engines, (re-)build index tipps&tricks
Valued DW-Team and user,

it would be of interest to know whether not public pages are found and searched by machines and spiders and if such setting could be made.

Given that touch of links keeps also the internal search index fresh it primal for this purpose.

Are not @ALL visible pages searchable by machines? Can such be adjusted easily if not?

What ever hints are of course appreciated.
This post was edited on 2019-03-26, 12:17 by Samana Johann.
Avatar
andi (Administrator) #2
User title: splitbrain
Member since May 2006 · 3497 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
Quote by Samana Johann on 2019-02-28, 08:22:
Are not @ALL visible pages searchable by machines? Can such be adjusted easily if not?

No. A page has to be visible to anonymous users to be crawled and indexed by public search engines such as Google.
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Avatar
Samana Johann #3
Member since Jun 2018 · 174 posts · Location: Aural/Cambodia
Group memberships: Members
Show profile · Link to this post
Sadhu for the gift of clearancy. Knowing that spiders are able to register in forums, and by settings, they could be allowed to take not accessible pages into their index, there are no easy ways to do such for DW, aren't there, Mr. Andreas?

(Also and manly in regard of touching and maintaining a good own search index. New Indexing for wikis of 10.000 pages + is hardly ever to make, requiring 36h good access and no disturbances.)
Avatar
schplurtz (Moderator) #4
Member since Nov 2009 · 493 posts · Location: France, Finistère
Group memberships: Global Moderators, Members
Show profile · Link to this post
Hi,

New Indexing for wikis of 10.000 pages + is hardly ever to make, requiring 36h good access and no disturbances.
Maybe this post https://forum.dokuwiki.org/post/65161 will help.
Avatar
Samana Johann #5
Member since Jun 2018 · 174 posts · Location: Aural/Cambodia
Group memberships: Members
Show profile · Link to this post
Appreciation.

It may be of good help in regard of manual indexing (thought 20.000 need now good 36h), but not sure in regard of out indexing without any or much touches.
Avatar
Samana Johann #6
Member since Jun 2018 · 174 posts · Location: Aural/Cambodia
Group memberships: Members
Show profile · Link to this post
Since nearly impossible to rebuild or update index, would there be a way using such as "crownjob" (yet it is of course also only a mystic word for my person for now). Such, as far as understood, could run periodical in automatically way.
Avatar
Samana Johann #7
Member since Jun 2018 · 174 posts · Location: Aural/Cambodia
Group memberships: Members
Show profile · Link to this post
Manifold issue as it seems. One reason is that Thai-script will be broken into "one character one word" and seems to make it hard that larger pages can be indexed. Just if somebody is generally interested in that issues, one may follow the case topic here.

Quote from inc/indexer.php file, line 18 and following:
// Asian characters are handled as words. The following regexp defines the
// Unicode-Ranges for Asian characters
// Ranges taken from http://en.wikipedia.org/wiki/Unicode_block
// I'm no language expert. If you think some ranges are wrongly chosen or
// a range is missing, please contact me
define('IDX_ASIAN1','[\x{0E00}-\x{0E7F}]'); // Thai

It's not clear why that. Of course it seems to be that there are no single words sometimes, but they are usually separated by zero width spaces if written proper in certain Asian scripts. One issue is of course the inter-punctuation, "special" characters.
This post was edited on 2019-04-01, 14:27 by Samana Johann.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2019-11-12, 05:58:36 (UTC +01:00)