Not logged in. · Lost password · Register
Forum: General Help and Support Server Setup RSS
Server load for creating search index? [SOLVED]
Avatar
waltergr #1
Member since Jun 2007 · 39 posts
Group memberships: Members
Show profile · Link to this post
Subject: Server load for creating search index? [SOLVED]
I have a few questions about the search index:

o  The Search page () says, "Information about a page's content is added and updated when a page is viewed by a user.  Each page includes an invisible image which calls the index update process if needed."  How is it determined if an index update is needed?
o  What's the server load for updating the index?
o  Some shared hosting companies have policies on running cron jobs, and techniques like this may be interpreted as attempting to circumvent their policies.  Is there a way to do index updates via cron rather than via a hidden image?

Thanks,

Walter

P.S. I'll add the answers into the DokuWiki documentation wiki...
My e-mail address: waltergr@aol.com
This post was edited on 2007-06-08, 14:45 by waltergr.
chi #2
Member since Jun 2006 · 1851 posts · Location: Munich Germany
Group memberships: Members, Super Mods, Wiki Managers
Show profile · Link to this post
Quote by waltergr:
o  The Search page (http://wiki.splitbrain.org/wiki:search) says, "Information about a page's content is added and updated when a page is viewed by a user.  Each page includes an invisible image which calls the index update process if needed."  How is it determined if an index update is needed?
By determining if the timestamp of the viewed page is newer than the one of the corresponding file that keeps the metadata for the index. For in depth details have a look at http://dev.splitbrain.org/reference/dokuwiki/ -> lib/exe/indexer.php.

Quote by waltergr:
o  What's the server load for updating the index?
You can call lib/exe/indexer.php directly, providing a page id via the url and measure it yourself.

Quote by waltergr:
o  Some shared hosting companies have policies on running cron jobs, and techniques like this may be interpreted as attempting to circumvent their policies.  Is there a way to do index updates via cron rather than via a hidden image?
You can use the <dokuwiki>/bin/indexer.php commandline script along with cron jobs. To disable the automatic background indexing you only have to remove the tpl_indexerWebBug() call from the main.php file of your template.
Please add [SOLVED] to the initial thread subject if you feel your question has been answered.
If my answer doesn't make sense maybe your question didn't either - just visit http://facepalm.org.
Avatar
waltergr #3
Member since Jun 2007 · 39 posts
Group memberships: Members
Show profile · Link to this post
Great, thanks.  Will add this information to the wiki.
My e-mail address: waltergr@aol.com
Avatar
andi (Administrator) #4
User title: splitbrain
Member since May 2006 · 3500 posts · Location: Berlin Germany
Group memberships: Administrators, Members
Show profile · Link to this post
In reply to post #2
Quote by waltergr:
o  Some shared hosting companies have policies on running cron jobs, and techniques like this may be interpreted as attempting to circumvent their policies.  Is there a way to do index updates via cron rather than via a hidden image?

Huh? Running cron instead of circumventing the the cron nonavailability? Doesn't make any sense to me.

Quote by chi:
You can use the <dokuwiki>/bin/indexer.php commandline script along with cron jobs. To disable the automatic background indexing you only have to remove the tpl_indexerWebBug() call from the main.php file of your template.

Don't do that. DokuWiki uses the webbug for all kind of things that need to be done by an automated background job (eg. sitemap generation). Removing the webbug voids your warranty ;-)
Read this if you don't get any useful answers.
Lies dies wenn du keine hilfreichen Antworten bekommst.
Avatar
waltergr #5
Member since Jun 2007 · 39 posts
Group memberships: Members
Show profile · Link to this post
Huh? Running cron instead of circumventing the the cron nonavailability?

Not cron nonavailability - policies about running cron jobs.  For example, cron jobs may be run at most once an hour, the job must be "niced", the nice value must be 10 or greater.  My hosting company's policies are available at . 

I can't find the page that said kicking off cron-like actions when a user accesses a web page is a violation of the policy.  It may have been another hosting company's policies I was reading.
My e-mail address: waltergr@aol.com
This post was edited on 2007-06-09, 13:52 by waltergr.
Avatar
fusibal #6
Member since Mar 2009 · 3 posts
Group memberships: Members
Show profile · Link to this post
In reply to post #4
Does it matter if we output the ending html tag and then call indexer?

<html ...>
  ... all the stuff to dump out to browser
</html>

<?php
  ob_flush(); flush(); //force everything to be sent to browser.
  tpl_indexerWebBug();
?>

Andi, isn't this better? and if not, then please explain.
chi #7
Member since Jun 2006 · 1851 posts · Location: Munich Germany
Group memberships: Members, Super Mods, Wiki Managers
Show profile · Link to this post
Quote by fusibal:
and if not, then please explain.

It's not because the tpl_indexerWebbug outputs basically a 1x1 pixel give which "calls" the indexer in the background. If you'd put that outside of </html> you'd end up having a invalid XHTML document.
Please add [SOLVED] to the initial thread subject if you feel your question has been answered.
If my answer doesn't make sense maybe your question didn't either - just visit http://facepalm.org.
Avatar
fusibal #8
Member since Mar 2009 · 3 posts
Group memberships: Members
Show profile · Link to this post
Quote by chi:
It's not because the tpl_indexerWebbug outputs basically a 1x1 pixel give which "calls" the indexer in the background. If you'd put that outside of </html> you'd end up having a invalid XHTML document.

Chi, thanks for the answer.  I have two issues with this:
1. Does it not delay the output of </body></html> tags for the duration of the time server runs tpl_indexerWebbug?
2. I don't see a need for any output by the tpl_indexerWebbug at all.  It does not have to be flanked by div element.  The server outputs nothing after </html> tag.  We can just remove the 1x1 dummy pixel call from the function.

I expect that neither will affect performance much except possibly for very large pages.  Let me know what you think.
chi #9
Member since Jun 2006 · 1851 posts · Location: Munich Germany
Group memberships: Members, Super Mods, Wiki Managers
Show profile · Link to this post
Quote by fusibal:
1. Does it not delay the output of </body></html> tags for the duration of the time server runs tpl_indexerWebbug?

No, your browser will "download" the fake image in the background, and therefore call the indexer in the background. That's the whole reason for doing it this way, to not delay the output of the page.
Please add [SOLVED] to the initial thread subject if you feel your question has been answered.
If my answer doesn't make sense maybe your question didn't either - just visit http://facepalm.org.
Avatar
fusibal #10
Member since Mar 2009 · 3 posts
Group memberships: Members
Show profile · Link to this post
Quote by chi:
No, your browser will "download" the fake image in the background, and therefore call the indexer in the background. That's the whole reason for doing it this way, to not delay the output of the page.

I see.  I looked at the code and the page output a bit more closely, and now I understand a bit better how the indexer works.  Thanks for the explanation Chi.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2019-12-08, 04:06:11 (UTC +01:00)