Not logged in. · Lost password · Register
Forum: General Help and Support Plugins Plugin Wishlist RSS
insert links to new page everywhere
Avatar
clemo #1
Member since Jun 2016 · 5 posts
Group memberships: Members
Show profile · Link to this post
Subject: insert links to new page everywhere
Dear forum,
I am looking for a way to automatically insert links to new pages on every page where the new pages name is mentioned (just for the first occurrence).

I didn't find anything on this except for https://forum.dokuwiki.org/thread/11827 which I don't fully understand. Is there a plugin by chance?

If no plugin exists, I'd write myself a script that greps through all pages and updates the necessary ones. Probably not in PHP, but I'd share it if there's interest.

I expect this to get computationally costly if the wiki in question exceeds a certain size, but for my purpose I expect it to work.
Avatar
turnermm (Moderator) #2
Member since Oct 2009 · 4690 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
I'm not sure I understand understand the rationale for this.  If you are mentioning a page, isn't it a simple matter to embed the first (or any) occurrence in a link?
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Avatar
og #3
Member since May 2006 · 436 posts · Location: Bayern
Group memberships: Members
Show profile · Link to this post
Hey turner, i think what he is looking for is some kind of automatically linking of words who match a page's name. Something like acronyms but with real linking.

@clemo: If you follow this, think about performance. This kind of dynamic linking implies that everytime a new page is created, all pages need to be scanned and updated, or everytime a page is rendered, each word must be checked all over the wiki for matching pages. Maybe this can be done void by smart indexing and linkbuilding, but your get definetivly gets slower the larger it gets.

What i could think of is a action which can be called additionally in the context of a page which will do some find and replace. E.G. you open page named "flyingbot" and click on "Replace word by link everywhere". Then the Plugin searches and replaces all words in all wikipages which match the current pages name with a standard dokuwiki-link to that page. This may take some time, and maybe will not change pages currently editing, but can work.
Oli...
Avatar
clemo #4
Member since Jun 2016 · 5 posts
Group memberships: Members
Show profile · Link to this post
yes, og, I want to automatically link to words that match a new page's name.

Of course this can get costly, as I mentioned, but for starting up a small company wiki with a lot of technical pages that should link to each other, the computational overhead is much better than the alternative, namely manual overhead or incomplete linking (at least in my plan). I also just want the first mention of the new page's name on another page linked, not every single one.

As I have full access to the server, the respective job can as well run in the background. Using grep to find the first occurrence of the new page's name in all current pages shouldn't be expensive at all. Depending on the number of pages returned, the insertion of the links might take a while, but running dwpage.php in the background, this won't change a users experience surfing the wiki. All I need is a trigger to start the job.
One can also reduce the number of pages returned by using only the difference to a grep that looks for links.

Having an additional button as og suggests is not my favorite solution because I want to prevent users from forgetting to link. If I'd use an additional button, they might forget pressing that plus if the plan is that the button has to be clicked on every new page, it should be a called autoamtically.
The advantage of the button would be that everyone can use it again anytime. Which is also a possible disadvantage, because lazy people might abuse the button and restart the costly process every time they are too lazy to insert links in the pages they're editing. I'd rather keep that possibility to myself to periodically update the links.
Avatar
clemo #5
Member since Jun 2016 · 5 posts
Group memberships: Members
Show profile · Link to this post
In reply to post #2
Hi turner,
I want to create links to new pages on the existing ones. So that i a user creates a page called 'OrganizationZ' and there are 100 pages that contain 'Organization Z' somewhere in the text, the creator of the new page (or, more probably, the wiki admin in the context of a company wiki), doesn't have to visit each of the 100 pages by hand and change the first occurrence of 'Orgnaization Z' to a link to the page OrganizationZ.
I hope that clarifies my intentions?
Avatar
og #6
Member since May 2006 · 436 posts · Location: Bayern
Group memberships: Members
Show profile · Link to this post
Hi clemo,
you make yourself clear and i think i fully understand what you want to do. This sounds not so bad at all. Lets see how we can do this...

First decision to make is, if it should change pages source or just look like links. First one needs to update the pages having matching words many times (for each word found to have a corresponding page). The page needs to be stored ab rendered. This process needs huge ammount of ressources and takes long time.

Saving a NEW page trigger the update process, which will be sufficient.

There is gap of existing pages at the time the plugin gets installed. This gap can be filled by doing ab initial scan of all pages after plugin installation. This will treat each existing page in turn. This process may be called from commandline at low usage times.

The process walk through all pages and replace all namings of the pages name on every page of the wiki (expect the saved page itself) with a link to that page. The pages name must be single word, surrounded by stop-word-characters to be matched.
To not scan each page everytime, we could use the existing word index.

To speed things up, the trigger should not scan at save time, but add a job for it into a queue and execute it later or in background. Conflicts must be handled also, because users nay edit pages meanwhile.

There must also be a trigger on deleting a page, because the links must be removed then. Same process again.
Oli...
Avatar
turnermm (Moderator) #7
Member since Oct 2009 · 4690 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
Conflicts must be handled also, because users nay edit pages meanwhile.

See: https://www.dokuwiki.org/devel:locking

As I see it, there are three stages to this process. 
1. The first is initialization: the entire site has be parsed for page links.  The only feasible way to do this is from a script at the command line.
2.  The  second is to update links when new pages are created.  When DW saves a page it creates or updates a metafile;  new files are marked as C (created), edits are marked as E.  When a page is loaded into the browser., the $REQUEST global will indicate whether it was a  save. If so, the metafile can be checked for newly created, and if it is newly created, then you can implement a means of processing, probably by saving the wiki id of the file and using it in a cron job.
3. When a file is edited, marked as E in the metafile, it will have to be updated.  For this, I guess, you could keep a list of recently edited files and run a cron job against a list of all pages in your wiki.

This looks to me like a big undertaking.
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Avatar
og #8
Member since May 2006 · 436 posts · Location: Bayern
Group memberships: Members
Show profile · Link to this post
@turner: what about adding links at rendering time? Just like it is done with acronyms. Saving a page can add the pages name and link into a similar file. When rendering a page (which also stores it into the cache), all known names are changed to links.
There must be an initialization also, but it will be less overhead because the pagesource remains unchanged.

To keep DW's index intact, i strongly recommend to use DW-functions for changeing pagesource. Simply replace text in pagesource using Shelltools will result in corrupted indexes.

https://www.dokuwiki.org/devel:fulltextindex
https://www.dokuwiki.org/devel:metadata
Oli...
Avatar
turnermm (Moderator) #9
Member since Oct 2009 · 4690 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
Adding links before a save is ok for the current page.  But then there's the rest of the pages.  They all have to be updated.


As for the index, that can be handled with the searchindex plugin
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Avatar
clemo #10
Member since Jun 2016 · 5 posts
Group memberships: Members
Show profile · Link to this post
og and turnermm, thank you very much for your input. I'll start working on this sometime this week unless something urgent comes my way and get back to you with questions/a solution.
This post was edited on 2016-09-22, 09:52 by clemo.
Avatar
clemo #11
Member since Jun 2016 · 5 posts
Group memberships: Members
Show profile · Link to this post
I've managed to get something to run, but unfortunately can't share it due to company policy. Nevertheless, I'd like to share it in case somebody needs it.
The approach is not superfast, but I think it is not permissively slow when run regularly and the wiki is not too large.
I've implemented it in R using shell tools (that also come with the Rtools for windows), so it doesn't fit perfectly in the dokuwiki universe.

It goes like this:
-find all new pages np and extract their first header nh as well
-for each np:
  -grep all other pages for page name and nh
  -for each grepped old page op:
    -cut the text of op into a single-word list
    -from start to end of the word list:
      -keep track of environments (lie '[[.+?]]', '{{.+?}}', '<[a-z]', '</')
      -outside environments, look for name of np and nh and replace with a link
      -stop after first occurence found
  -if something changed, use dwpage.php to checkout op, insert changed text, and commit op again.

This of course does not look efficient at all and it isn't. In practice, the time strongly depends on the number of new pages and the corresponding pages grepped.
Running it on a weekends night for a company wiki should work fine, especially if one is runnning a group of maintenance scripts (cleaning up the attic, rebuilding the searchindex) anyways.

Finding all new pages is simple on R/windows - just use ctime from file.info(). It also seems to stay unchanged when modifying a wikipage (Yeah, this is the first time windows is useful to me). Finding all new pages on R/unixish is probably a bit more troublesome (but check http://stackoverflow.com/questions/23318695/how-to-get-tru…). Since you would expect the number of new pages to be much smaller than the number of new+edited pages, it would be a rather good idea to be picky here.

It would be a smart idea to check the list of new pages and headers against a page called something like 'the'. Maybe a length check or combined with a dictionary that contains word-probabilities in the respective language. Otherwise one could severely fuck up ones own wiki.
This post was edited 2 times, last on 2016-09-23, 16:02 by clemo.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2019-08-24, 03:02:59 (UTC +02:00)