linker (autolink)

c-jason-b

I know this idea has been batted around before, and I believe there is still no working solution. As such I would like to request an auto linker, and to suggest a way of implementing functionality for one. Also note that my php skills are barely above NULL.
1. In each namespace there must exist a file (autolinktext.txt or no extension so as to avoid being on the namespace radar)
A) this file has the following format:
[10 words]
list of
pages with
10 words
in the
name, not
including namespace
or extension

[9 words]
See Above.

[8 words]
etc etc.

B) This file is not limited to 10 word titles and less, but all titles of any length. They key is that they are sorted by length. (longest to shortest)

C) This file gets updated as new pages get created, or old pages get edited (if they aren't in the list already)

2. On page create, and edit (if not already in the list), a single line entry is added with the page's name (minus extension), in the relevant section.

3. A non integral php script (not part of docuwiki) gets written that can be used as a cron job.

4. This script then opens each document in a namespace, and begins matching word lists (from longest to shortest) and outputting links into the source page in the form of [[namespace:page title|original words]] according to configuration settings.
A) It must ensure that the words do not occur between <pre>&</pre> , <code>&</code>, and whatever other special tags exist (like header === tags)
B) The NS is locked as it is scanned for writing.

5. The configuration script for this uses the same interface as ACL, but sets 6 boolean flags for each namespace (bonus points if they get set for each file as well!)
A) AutolinkFlag True/False Does this namespace get autolinked within itself.
B) LinkToLowerNamespaces True/False Do we autolink files to pages in lower namespaces (sub-namespaces)
C) ExcludeFromAutolink True/False If a parent NS is set to use auto link, this can be used to omit a lower NS. (Flags inherit)
D) ExcludeFromAutolinkLower True/False This excludes a NS from being linked to a higher order NS, but doesn't exclude it from linking within itself.)
E) LinkToUpperNS True/False The opposite of LinkToLowerNamespaces
F) ExcludeFromAutolinkUpper True/False Isn't it obvious by now?

It is on the site administrator to set up the cron job. Perhaps the cron script could keep a separate list of namespaces it has scanned, so it could do only one or two at a time (for very large sites) and once all are done, reset the list and start again.

Knowing a little PHP I know this is doable, and apart from the exclusion portion fairly easy, but it is a lot of work. So I realize I am asking much. To make it worse I am not in a position to even offer to pay for this (In truth I could not even afford $5), but this would be realy nice, and I think it would be very useful to many.

EDIT: Had to fix the tags part. Not to mention my typos.

c-jason-b

Thought of this after the fact,

The autolinktext files should not be broken into sections, this complicates reading them. Simply add the file name if it is not presnet (read the file in, if $newFileName !in $arrayOfNames then append file name to file).

Let the maint script read in the list and sort it by length.

klap-in

What is the purpose of this?
I don't recognize what you will link with this system..

c-jason-b

Klap-In, sorry for the delay in responding, real life issues.

The links that are being made would be to the namespace's pages (if within namespace), or the namespace:pagename (if cross namespace).

The overall purpose of this would be to automatically alter the content of existing pages, so that words in the pages which match the names of pages would replaced with a link. Either within or accross namespaces as selected.

So if I had the following structure (config type page shown):

                                     ===============Boolean Flags==============
namespace        page name       autolink linkDown linkUp excludeAll excludeUp excludeDown
<default>          *             True     True      False   False    False     False
                        Welcome   **  Inherits values from namespace
<default>/test                    **  Inherits values from namespace
                                  **  Then alters:
                                                    True
                        ExamplePg ** Inherits and overrides none although it can override

when the main script runs it would add Welcome.txt to the <default_namespace>/autolinktext.txt file. This entry would look like:

welcome

and another entry to the <default_namespace>/test.autolinktext.txt file which would look like:

welcome<a tab seperator character>..:welcome

The entries are different because one is referencing a namespace above, and the other is referencing the same namespace. In fact the first entry should probably be fully qualified.

Thus when the main script opens welcome.txt if it finds ExamplePg it would replace that text with [[<default_namespace>:test:examplepg|ExamplePg]]

In fact, that first example should be fully qualified no matter what.

So the entries in an autolinktext.txt file could be in two forms:
1 <text_to_match><tab_character><link_form>, or
2 <text_to_match>

If the entry is in form two, then a same namespace type link would be the replacement. It would in fact be easier to simply always use form one I would think.

All matches should be case insensitive, but the visible part of the link should be in the same case as original.

Word part matching should be allowed:
locate the string word inside Words so the final result would be [[<namespace>:word|word]]s. (Note teh capital W as an example of the previous point)

c-jason-b

Klap-In, your question got me thinking about this in more depth.

Section: Configuration Script
Configure script (the one inside dokuwiki's configuration section) would write two output files, one for namespaces, and one for pages.

If a namespace isn't configured directly, it inherits the settings from it's parent namespace.
If there is no parent namespace it is set to all negative values. (see example 2 below)

Example 1: Namespace configured directly

namespace_name;autolink_flag_value;autolinkDown_flag_value;autolinkUp_flag_value;excludeDown_flag_value;
   excludeUp_flag_value

(should all be one line)

Example Two: an Un-configured Namespace with no parent to inherit from

namespace_name;0;0;0;1;1

Thus this namespace is excluded from all autolink procedures.

Example Three: A namespace inheriting from it's parent

namespace_name;;;;;

If a page isn't configured directly, it inherits the settings from the namespace in which it present.
Example Four : A file's configuration line

fully/qualified/file/name;autolink_flag_value;autolinkDown_flag_value;autolinkUp_flag_value;excludeDown_flag_value;
   excludeUp_flag_value

Not the file's extension is not included. This way we can use the file name as our search for parameter.

Example Five: a configured file's entry

path/to/file;1;1;1;0;0

This file is auto linked, autolinked to files in the directory(ies) above (if they link down), and linked to pages below(if they link up).

Example Six: A File inheriting from its namespace

path/to/file;;;;;

Section: Maintenance Script

The maint script opens the namespace config file to see which namespaces it needs to scan.

Then it opens the files config, to see which files are linked to, where they should be processed at.

It then opens the text files in a namespace(all of them, one at a time) which is flagged for autolinking, and scans the files contents for words that match the files in the same namespace. IF one is found, it checks that files entry in the file config to see if it is excluded or not. If it is excluded, execution continues. If it is not excluded, a fully qualified two-part link is created like [[path:to:file|original_text]], and execution continues.

It then re-scans the file for namespace above linking (if this namespace is flagged AutolinkUp=True), and repeats the same process as above. This time the link is omitted if the namespace or file above is flagged autolinkDown=False.

A third pass is made, this time linking downwards. The flags again are altered.

My mind is going a million miles a minute in three directions here, so I may have misstated the flags a bit. They simply need to check their opposites for exclusion.

Doing it this way requires two files rather than the config files, plus a file in each namespace, but the scripting might be a little more complex.

Personally I would use tabs as separators rather than semi-colons, but not being good with php, and not knowing Dokuwiki's code base, that choice would be to the author.

As you can see, this script would be a monstrous drag on the system, and a long one to run. That's why it would have to be a cron job. Probably even have to invoke php with some kind of over-ride on script time/resource limits.

Section: Afterthoughts
As written this only autlinks within a given namespace trunk. An additional flag (autlinkAllNamespaces) could be used with a fourth iteration to link across all namespaces.

klap-in

Is this not quite similar to CamelCase? https://www.dokuwiki.org/camelcase

c-jason-b

Sorry so long to reply. Been fighting with a database issue for days.

My understanding of CamelCase is that it creates links to documents which exist, and will create an open link to an as yet unexisting document. WHat I am after is something that can be periodically executed to re-examine existing documents, and update their content with links which didn't exist when they were created.

Global DokuWiki Links