Not logged in. · Lost password · Register

All posts by rschram (11)

topic: Navigating pages with the full text index  in the forum: General Help and Support Features and Functionality
Avatar
rschram #1
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: It could be possible... After all, Google....
Thanks, Andi,

The pagequery is a great plugin and very much suited to this purpose. With it, you can query the whole wiki in a variety of ways, e.g. tags, words, etc., and sort and group them by date. So, on every page in the wiki--in the side bar, even--you can create what I've been thinking of as 'presence' and 'recency'. Rather like stacks of paper on my desk right now, I can be working on one task and in my peripheral vision, I'm aware of other stuff, with more recent and more important stuff near me. Dokuwiki lets me brain-dump stuff I don't want to forget, and perhaps with pagequery, it can also help me 'remember to remember.'

However, based on my quick glance at pagequery, I think it does have one piece missing. Basically its a query of what's there, and just like any search query, you have to know what you are looking for (e.g. "stories about coconuts", "lecture notes on subjectivity") or at least the parameters of what you're looking ("lecture notes written in the last year, or this time last year...") in order to find it. Likewise, its report page of the full inventory depends in part on having filed things in a meaningful way. My desk in front of me is a living testament to my inability to do this... :) And, yet, my desk in front of me is actually very well organized! It's just that I didn't organize it! Much as Life shows us that cellular automata can bootstrap higher-order complexity, so too are the complex assemblages piled before my eyes also pregnant with an immanent ordering, indeed, one that is highly practical if a bit fuzzy. It has presence, recency and also relevance.

Would it possible then for pagequery to access a similar kind of implicit information about the temporal and conceptual relationships in a collection of wiki articles? Yes I think so. I understand now that this has been an area of active research for some decades, so I recognize that it's my ignorance that freeing me to speak boldly. Anyways...

So, the docs on the full text index at https://www.dokuwiki.org/devel:fulltextindex describe the contents of the index pages. Basically, i<wordlength>.idx contains a list of words, and for each word, a series of page IDs (stored elsewhere) and the number of occurences of the word in each of these pages. Thus for a page, we already know the term frequency, and a count of the number of entries would tell us the document frequency of the word. The total number of documents is also known. So the TF*IDF weight is easily calculable. Perhaps the search function already does this, or assigns some other weight. (The documentation seemed to imply that there was weighting but only based on frequency.)

A new development could be to take these weights for each term and pull together another idx table with page IDs and an array of the TF*IDF for each term. This can be treated as a vector, and for two article vectors, we can calculate the cosine of the angle of their intersection. Based on my 72 hours of frantic reading on this topic, this is apparently a good measure of how similar the information of two documents is. An acute angle means that the two documents have many words in common that are otherwise relatively rare in the context of the wiki. When you compute cosine similarities for every pairing of documents in a collection, you get a matrix of cosines for each pair, a relative score of their informational distance.

I used Perl modules to do this for a collection of 343 research notes. It took about five minutes for the computer to create the matrix, and then print out a HTML pages with the content of each note, and a list of related pages. I think it stood up really well. All of these notes also contained a list of key words in a separate field and, for a given note, the rankings of the pages based on the cosine similarity of shared tags and shared words was actually pretty close. If anything, the ranking of relevance based on the full text seemed better than my tagging of the documents. It brought together notes about the same people, which was not something I used as a key word. Proper names are relatively rare occurrences, and so they get a strong weighting. I found stuff I had forgotten about, or hadn't scrupulously tagged when I wrote it. It was awesome!

But, there was a lot of computational power being applied. So, much as the indexer bug silently updates the full text index idx pages, that would be the better way to go to implement this kind of search engine in DW. indexer.php updates the existing idx pages with new data on frequency and location. A modified indexer, if it does not do so already, could also be updating another set of records of the vector of term weights for each page. When it has a free moment, it could compute the cosine similarity matrix and write it to another idx file.

Then, when a user browses a page that's been indexed, something like pagequery, except its query would be based on these additional pieces of information from the full text indexes. Say we place the pagequery double-curly braces syntax on every page. Say also it can grab the id of the current page. It could then look this page in the cosine similarity matrix, sort it according to its criteria and return a list of other pages ranked by relevance. Each article would be accompanied by a list of 'related articles', their degree of relevance already having been calculated when the page content was added to the ful text indexes.

I will keep reading about this, and also try to look at the indexer.php and fulltext.php to see what's really going on there. Are weights assigned to words? Is this information stored? Are there other theories of the informational relationships between documents that would be worth considering, either from practical or conceptual perspectives?

Cheers,
Ryan
topic: Navigating pages with the full text index  in the forum: General Help and Support Features and Functionality
Avatar
rschram #2
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
In reply to post ID 50166
Subject: Clarification
Having now read up on NLP, I have a new question. Is it possible to use the full text index to calculate the cosine similarity of two articles based on the tf-idf vectors of their contents? For all the articles in a wiki?

Cheers,
Ryan
topic: Navigating pages with the full text index  in the forum: General Help and Support Features and Functionality
Avatar
rschram #3
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: Navigating pages with the full text index
Hi everyone,

A dokuwiki has a full text index of its pages. This is what it uses to find search results. Are there other ways to make use of it, for instance, could one generate a list of pages by word?

I'm interesting in finding or creating new ways to navigate through a wiki. Search is good when you know what you are looking for. Metadata like tags, namespaces and subject categories can give a structure to a wiki, but then only if the authors have thought of that. A key word index, like in a book, lets one get a quick overview of what's in a wiki, and perhaps see connections not yet considered.

If one pulled the complete wordlist out to make an index page, then of course, it'd be so long as to be useless, even in a small wiki of 100 pages. But in preparing the list it would be easy to exclude the most common words, and maybe not so hard to look for really uncommon words, boiling down the list of entries to meaningful ones. Further processing could combine related words, e.g. plurals, conjugations, etc.

In general I'm seeing the limits of what I can add as an author of wiki documents, and now want to develop or use tools that let me analyze what's there, or better yet, tools which generate various kinds of analysis automatically, and then can be used as new overviews and points of entry. The full text indexes seem like a really good place to start.

On that point, if DW maintains a full text index, are there date indexes too? That would be another good way to analyze the contents of a wiki.

I welcome your thoughts. I feel like I should mention that I only have a faint idea of how you could implement this in a plugin, but I've decided to use DW more, and train myself in extending it, so I'm thinking really big right now.

Cheers,
Ryan
topic: Smallest Federated Wiki and Dokuwiki (Is federation of dokuwikis possible or desired?)  in the forum: Community dokuwiki.org
Avatar
rschram #4
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
In reply to post ID 50153
Subject: Federation is a movement
I agree that once I started to get my head around it, it sounded more like a political statement more than anything. Everyone should individually maintain their own copy of wiki pages because FREEDOM!

My attraction to it was two-fold. One is just the experience of browsing through pages as side-by-side panels made sense to me. It made me think (1) this is kind of like what people imagined hypertext to be like in the olden days, and (2) I could really use this as a way to read through my own wiki of notes, especially because linked (related) information is presented as adjacent frames. You can read down to go in-depth on one topic, and scan back and forth for context. The other attraction was that my own use of wiki is mostly as a solo author producing text for an audience to read. For various reasons, I can't open it up to collaborative authoring, but for other projects I see myself needing something of a conversation among separate authors.

Ryan
topic: Smallest Federated Wiki and Dokuwiki (Is federation of dokuwikis possible or desired?)  in the forum: Community dokuwiki.org
Avatar
rschram #5
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: Smallest Federated Wiki and Dokuwiki
Hi everyone,

I have just been reading about a new approach to wikis developed by Ward Cunningham, the creator of the wiki concept, called federated wikis. A federated wiki, to the extent I understand it, is a collection of wiki instances which can draw upon each other's edits and changes in the same way that distributed version control systems allow multiple contributors to edit different parts of a collection of projects, taking up each other's changes and additions, and forking versions that differ substantially from the original. Any one user of a single member of a federated wiki can copy a page from the other wikis, edit and extend it, incorporating it into its local library of wiki pages, available to all as a separate parallel version. While centralized wikis were first created to bring people together to contribute to one document or library, a federated wiki suggests that people can borrow from each other as they develop their own wiki.

More info here:

  • http://fed.wiki.org/view/welcome-visitors
  • http://www.wired.com/2012/07/wiki-inventor/

In this current implementation, readers can browse, edit and fork pages through a Javascript client interface which presents a series of pages as parallel columns. Aside from the federation concept, I thought this was pretty appealing too, and helped me understand the different sources delivering the individual pages.

I was wondering if people have thought about bringing some of these ideas into Dokuwiki. Would it be possible for several Dokuwikis to share pages in this way? What would that require?

Cheers,
Ryan
topic: New Template Announcement: Benjamin (Announcing a new template based closely on Starter with a responsive multicolumn layout)  in the forum: General Help and Support Templates and Layout
Avatar
rschram #6
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: New Template Announcement: Benjamin
Hi everyone,

I'd like to announce a new template I've posted to the DW template directory, Benjamin: http://dokuwiki.org/template:benjamin. It's not all that original but it fills some of my needs. It extends the Starter template in two main ways. First, it rearranges the main.php file so the page regions fall in a logical linear order from most important to least important.[^1] Second, it puts @media queries into structure.css file and the design.css file to create an adaptive layout. As the viewport widens, less important information moves up and over to the right in columns.

One (so far, the only) site using the template is my own project site, _ Sea of Islands: http://seaofislands.org.

It is most certainly a work in progress and many people will probably find things that are missing, incomplete or inadequate. I'll be the first to admit I have no idea what I'm doing. I'm not sure I even set up my GitHub repository correctly or if it is set correctly for remote installation and upgrading. But I figured why not give it a go and see what happens. Feel free to chime in on any topic, either here or on GitHub.

Best wishes,
Ryan
http://rschram.org/

[^1]: http://accessibility.psu.edu/readorderhtml
topic: Anyone using DW/S5 for lectures and teaching? (New user is looking for experiences with DW and the S5 plugin(s) as a platform for slideshows for lectures.)  in the forum: General Help and Support Plugins
Avatar
rschram #7
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: Anyone using DW/S5 for lectures and teaching?
Hi everyone,

I'm an occasional user of DW and I've recently returned to it to solve a problem. I am offering a first-year introduction course in anthropology for the first time this year. Part of teaching a class this big is that I have to create slides for lectures, as well as notes and study guides, and posting this all online. I've resisted this for a long time. When I finally realized I needed it, rather than use PPT, I wanted to use an S5 approach, so that my slides on screen can be posted as an outline later. When I found that DW had several S5-type plugins, I got excited because I realized I could create a whole wiki of information on the subjects we cover, and my lecture notes could include links to more information and exploration of ideas.

The S5 plugins -- S5, S5reloaded, Deckjs and others -- all have different features, and nothing really seems to have all of what I want. I can be flexible since I'm not too hung up on slides in lecture as a teaching method anyway. It would be nice to be able to create good-looking PDF handouts of the slides to post to the University's Blackboard Learn site, since I think that will mollify people at my institution and placate some students. I am hosting the teaching wiki on my personal site, so I'm worried it will not be able to handle the load? What if I'm browsing my slides online in class, and 100 students connect to the server at once to follow along?

I'm curious if other people on the forum have experiences with using DW and S5 software for teaching. Do you have suggestions for me?

Sincerely,
Ryan Schram
topic: Wish: User-agent checker (A plugin that checks the user agent and displays an error message via html_msgarea() for nonstandard browsers.)  in the forum: General Help and Support Plugins Plugin Wishlist
Avatar
rschram #8
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: Wish: User-agent checker
Would it be possible to create a function that triggers an error to be displayed in the template html message area (html_msgarea();) when a user's browser is noncompliant with W3C standards?

What would the steps in that process be?

Ryan

PS: This is half serious. I know that all browser makers consider themselves to comply with the W3C 100% and you get different results of compliance for different tests, so any standard applied would be the creation of the plugin owner. I just thought it would be interesting to treat the user's browser the same as you treat any other user-submitted information, like a invalid page name or incorrect password, and notify them in the same way, rather than accommodating them. You could even give red cards to out-of-date browser and yellow cards to iffy ones, each with a link to an upgrade site.
topic: What does the empty div element (class=clearer) do?  in the forum: General Help and Support Templates and Layout
Avatar
rschram #9
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
In reply to post ID 39476
Subject: More specifically...
What I meant to ask was what is the purpose of the empty element with this style. Is it just a way of creating a specific presentation or design, or does have some other function that I might not know about as a new user? Some templates don't have them. But I did find one use of it in the core files, so that's why I asked.

Ryan
topic: What does the empty div element (class=clearer) do?  in the forum: General Help and Support Templates and Layout
Avatar
rschram #10
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: What does the empty div element (class=clearer) do?
Hi,

I am thinking about making a new template for my personal wiki and I am wondering what if anything the empty div elements (class=clearer) do? Do they ever have content in them? Why are they used?

They seem like they are presentational gap-fillers, like one-pixel transparent GIFs (ca. 1999). It makes sense to see them in the template file main.php, after the template for the main content box and a few other spots. But then as I was tracing the template functions back to the core functions I found another use of it: inc/html.php, ln 239, in html_show() [1]. It looks like when html_show is called on an empty file, it returns a boilerplate XHTML snippet, including some presentational elements. Is that right?

Cheers,
Ryan

[1]: http://xref.dokuwiki.org/reference/dokuwiki/_functions/htm…
topic: New user dreams of becoming a developer (An introduction)  in the forum: Community User Introductions
Avatar
rschram #11
Member since Jul 2013 · 11 posts · Location: Sydney, NSW, Australia
Group memberships: Members
Show profile · Link to this post
Subject: New user dreams of becoming a developer
Hi everyone,

I'm Ryan Schram. I just downloaded Docuwiki *ahem* Dokuwiki and few plugins and now have a testing site set up on my local network. I'm planning on using it as a personal wiki for now, but I hope to build a public wiki for researchers in the future. I've been tinkering on and off with Wordpress and Mediawiki, and years ago with Blosxom (a Perl blogging script). I have learned the basics of Perl and other scripting languages but I never done anything really sophisticated with them. I always wanted to find a project in which the software was well documented and not overly complicated so I could actually learn something from it and maybe contribute to it in a meaningful way. The code of Dokuwiki is so literate so I feel like I can learn a lot from reading it.

Cheers,
Ryan
Close Smaller – Larger + Reply to this post:
Special characters:
Special queries
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2019-05-25, 05:36:10 (UTC +02:00)