Not logged in. · Lost password · Register
Forum: General Help and Support Plugins Plugin Wishlist RSS
Syntax support for Txt2tags
taking DW to the next level in interoperability
Avatar
HansBKK #1
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
Subject: Syntax support for Txt2tags
My outfit has standardized all its docs-related data to use txt2tags markup syntax at the " snippet / chunk / node / posting / discussion" level, which once a given topic/dataset needs to be structured can get automatically transformed into AsciiDoc, which is the canonical format at the "article" through "book" levels (with parts / chapters / sections / subsections in between). Via DocBookXML, this format is easily output to just about all structured publishin formats, including both chunked and single static (x)HTML, HTMLhelp, EPUB, LaTeX, PDF etc etc.

We envision using DW as the "container / organizer" middleware to handle collaborative contributions, both within the group and with our various outside communities as well.

Ironically txt2tags outputs to DW syntax, so it's easy to output from our various editing/storage tools **to** DW, but I haven't found anything to go back the other way, so merging in new/edited nodes is currently very manual; even if the content doesn't change, the format changes trigger diff flags in merge tools.

We selected txt2tags over markdown because of its native ability to transform to so many targets, as well as its (IMO) more unobtrusive appearance when viewing plaintext source raw - its syntax is **very** close to DW's.

We've had a look at the Creole and Markdown plugins, but would still need to get a transform path back to txt2tags, and obviously would prefer to work with more current trunk/release versions of DW.

Thanks for your consideration.

PS As an aside, I would think DokuWiki's FOSS cred as well as corporate marketability would benefit from getting away from the "every wiki tool has its own syntax" syndrome, allowing for greater interoperability with other apps in your user-base's toolchains. It would also potentially eliminate the need for separate "publish website to xyz format" code.

PPS further background here: http://forum.dokuwiki.org/post/29467
This post was edited on 2011-12-10, 05:39 by HansBKK.
Avatar
ach (Administrator) #2
Member since May 2006 · 1334 posts · Location: London, UK
Group memberships: Administrators, Members, Super Mods, Wiki Managers
Show profile · Link to this post
Quote by HansBKK on 2011-12-10, 05:18:
Ironically txt2tags outputs to DW syntax, so it's easy to output from our various editing/storage tools **to** DW, but I haven't found anything to go back the other way [...]

We selected txt2tags over markdown because of its native ability to transform to so many targets [...]

We've had a look at the Creole and Markdown plugins, but would still need to get a transform path back to txt2tags [...]

There is a reason why all of those tools (txt2tags, MultiMarkdown, Pandoc, etc) export into many formats, but rarely import from any other format. It's generally easier to write one parser and let it output things in different ways than to write several parsers.

So, I would advice against adding a different syntax to DokuWiki (into the core or via a plugin). I'd rather recommend writing an *export functionality* instead. There are already a few export plugins and writing one that exports to txt2tags is much easier than writing an additional parser.

Quote by HansBKK on 2011-12-10, 05:18:
PS As an aside, I would think DokuWiki's FOSS cred as well as corporate marketability would benefit from getting away from the "every wiki tool has its own syntax" syndrome, allowing for greater interoperability with other apps in your user-base's toolchains. It would also potentially eliminate the need for separate "publish website to xyz format" code.

That's more a political question and much more complex.
The problem is that *there is no standard*. Creole is one attempt to create one, but it hasn't had much of a success so far. Yes, txt2tags would be a good choice, but so would a few others. Why choose txt2tags over Markdown or Textile, or even DokuWiki syntax for that matter?
Even if there was one authority which "decides" the standard to use, it would only be useful if all the main players would implement it...
Avatar
HansBKK #3
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
easier to write one parser and let it output things in different ways than to write several parsers - I'd rather recommend writing an *export functionality* instead. There are already a few export plugins and writing one that exports to txt2tags is much easier than writing an additional parser.


Great, thanks for that. I've been in touch with one of the main txt2tags contributors and gotten positive feedback about the idea of writing a plugin, I'll make sure he sees this.

Why choose txt2tags over Markdown or Textile, or even DokuWiki syntax for that matter? Even if there was one authority which "decides" the standard to use, it would only be useful if all the main players would implement it...

Yes there is no "one markup to rule them all", however to my mind, choosing **any** standardized format is better than continuing the current tower of babel scenario. The "main players" are usually the last to fall in with any movement to standards, but it can give the smaller players a key competitive advantage for decision-makers that value long-term archiving of data in open formats and interoperability.

Important criteria for choosing a syntax - for me:

Active community of users and developers.

Supported output paths to the most important targets (HTML single vs chunked, HTMLhelp, the tex family for pre-processing to pretty-print, then PDF, EPUB/mobi for ebooks and for larger structured texts, DocBook (preferably via AsciiDoc).

Source text remains readable, "natural looking" as possible.
Avatar
ach (Administrator) #4
Member since May 2006 · 1334 posts · Location: London, UK
Group memberships: Administrators, Members, Super Mods, Wiki Managers
Show profile · Link to this post
Quote by HansBKK:
Yes there is no "one markup to rule them all", however to my mind, choosing **any** standardized format is better

Yes, that's sort of true. Someone definitely needs to make the first step...
I would rather go for Creole then, as that's a markup specifically with that scenario in mind to create "one (wiki) markup to rule them all". But I can see how people would prefer others when they need it for non-wiki scenarios.

Quote by HansBKK:
Important criteria for choosing a syntax - for me:
[...]
Supported output paths to the most important targets (HTML single vs chunked, HTMLhelp, the tex family for pre-processing to pretty-print, then PDF, EPUB/mobi for ebooks and for larger structured texts, DocBook (preferably via AsciiDoc).

That shouldn't be a reason for choosing a *syntax*. The possibility to export into different formats has nothing to do with a syntax itself.
Yes, it helps if it is implemented with a parser that makes it easier to export into different formats. And when the syntax is popular (or has a dedicated developer who cares for the exporting feature), it will automatically get a lot of export options.

For example, the only reason why DokuWiki hasn't got similar export options as txt2tags, is, because no-one has developed them yet. ;-)
Avatar
HansBKK #5
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
That shouldn't be a reason for choosing a *syntax*. The possibility to export into different formats has nothing to do with a syntax itself.

Yes, it helps if it is implemented with a parser that makes it easier to export into different formats. And when the syntax is popular (or has a dedicated developer who cares for the exporting feature), it will automatically get a lot of export options.

For example, the only reason why DokuWiki hasn't got similar export options as txt2tags, is, because no-one has developed them yet.

Txt2tags (and reST) were created specifically as software projects that as part of their implementation created their own syntax - Pandocs I believe is the only big player that started off by using pre-existing syntaxes only as input (markdown + extensions, now reST and others).

Those of us who aren't programmers depend on what's already out there on the virtual shelf, but even for coders I would think choosing a syntax with wide interoperability tools already in place would make sense.

This whole idea of "Wiki" vs "non-Wiki" worlds doesn't resonate with me at all - to me, wikis are an excellent way to allow for collaborative editing of text. Surely in the end it's the content (data) that's important - isn't DW's focus supposed to be supporting documentation, and unlike many other FOSS tools supportive of corporate environments? Many many people only connect to the Internet occasionally, sometimes they are the majority of the audience you're trying to reach. Also, often the purpose of the content itself requires good usability in the offline world.

Anyway, as I said, even Creole would be a step in the right direction toward "open content", and your comments about an exporter meeting my needs were very helpful; it doesn't actually matter to me what the wiki-users use to edit the content as long as it backs into a usable format.
Avatar
ach (Administrator) #6
Member since May 2006 · 1334 posts · Location: London, UK
Group memberships: Administrators, Members, Super Mods, Wiki Managers
Show profile · Link to this post
Quote by HansBKK:
Those of us who aren't programmers depend on what's already out there on the virtual shelf
Yes, that's definitely a point.

Quote by HansBKK:
even for coders I would think choosing a syntax with wide interoperability tools already in place would make sense.

Not necessarily. In an ideal world, I think it would be best if all syntaxes had the option to export to one specific syntax which would be supported by all the others. I can imagine that any XML-based syntax (e.g. DocBook) would be best for that. I cannot imagine that a "simple" syntax (like txt2tags) would be good because it wouldn't support many extra features some syntaxes might have.
A simple syntax is good for users to learn and use, but it's not good for "re-translating" one syntax to the other. If you "translate" from one complex syntax into a simple one which translates it to another more complex one, certain things are bound to get lost.
(Although I'm not an expert and might be wrong. Maybe the amount of information that gets lost is not so different from a re-translation through a more complex syntax?)

All in all, these are two different problems:
  • How to choose the easiest, most usable syntax that most users are happy with?
  • How to choose the best format to make importing and exporting between different systems the easiest?

For an ideal solution, I believe, those two cannot be the same. If you're happy with a subset of possible text formattings, then it could be the same (and txt2tags looks good for that purpose).

Quote by HansBKK:
This whole idea of "Wiki" vs "non-Wiki" worlds doesn't resonate with me at all
That's good. :) I actually feel the same, but assumed that most other people distinguish more between the two.
Avatar
HansBKK #7
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
I can imagine that any XML-based syntax (e.g. DocBook) would be best for that.

Yes! I also feel that's the most mature and stable format, most suitable for long-term archiving of longer works which have had work put into "structuring" them. However it's not a good format for normal humans to actually work in. I personally think AsciiDoc is the way to go for that, readable diffs therefore practical to store in a VCS.

I cannot imagine that a "simple" syntax (like txt2tags) would be good because it wouldn't support many extra features some syntaxes might have. A simple syntax is good for users to learn and use, but it's not good for "re-translating" one syntax to the other.

But each piece is just one part of the toolchain, there isn't going to be "one tool to rule them all" to handle all your needs, which of course will be different from mine. The point is selecting tools that enable interoperability, allow us to create automated and integrated knowledge managment toolchains.

Regarding this particular issue, the key for me is to define an information architecture taxonomy and use the right syntax for the job. A fundamental distinction for me is between the 99% of content that is relatively unstructured "chunks" and "snippets" of useful reference information that need to be readily accessible. The next level up is "article", composed of structured "sections" and "subsections", with if substantial a TOC, perhaps and index and some cross-referencing, footnotes etc. "Books", which can be "volumes" in a "set" and can have "parts", almost certainly have "chapters", which are then broken down into sections as with articles above.

At the "chunk/snippet" level through to "section" and "chapters", the simpler syntax allows for all the inline formatting required.

It's only when a given writer or group wants to go to the trouble of assembling these into larger more formal works that the higher-level structuring, indexing etc needs to be added. My current thinking is the easier low-level syntax (txt2tags) for the 99% of data that remains at the chunk/snippet level, and convert "up" to AsciiDoc for the works in the process of being structured more formally perhaps for "publication" whatever that might mean in a given context.

Obviously any decent programming editor, usually vim or emacs (org mode!), can also be programmed to process such content - and a fundamental goal of mine is to always keep the data, even while it's "in process" (which it always is) open and accessible to these basic tools.

However there are two tools I'm currently investigating that add more value out of the box. One is the python outliner/programming editor Leo (search Leo-editor), and the other is DokuWiki. The former has great potential due to its flexible data model in handling multiple "views" of content via "cloned subtrees" and its data model's integration with scripting (I wish I were a programmer!). Coming from Python, it's native syntax for markup is reST/Sphinx, which is relatively "open" for transformation via the Pandocs project.

However it's basically a tool for individuals rather than groups, and that's where DokuWiki comes in, enabled by its "transparent container" data model. I can of course keep txt2tags or reST or markdown or whatever syntax files in DW, but when seeking collaboration from the larger group, many of whom are non-technical, it would of course be better if they could just deal with the normal UI rather than having to learn the syntax.

Sorry to go on, but I'm hoping this conversation will allow from continued cross-fertilization of ideas. . .
Avatar
farvardin #8
Member since Dec 2012 · 1 post · Location: France
Group memberships: Members
Show profile · Link to this post
Hello, I've created such a plugin:
https://www.dokuwiki.org/plugin:txt2tags

The parser is the already existing implementation of txt2tags in php, while the plugin in itself is 99% coming from the work done for the markdownextra plugin.

I agree with HansBKK, there should exist a rosetta stone for the lightweight markup langage. By itself this common point wouldn't really need to be a lightweight markup, but it should be able to handle all possible syntax so it would make import and export easier. And also it would be neutral, contrary to markdown which is found everywhere and contains many several flaws.

Creole or markdown couldn't be good candidate for this purpose because they really lack specific cases (creole can't underline), or rely on html for non existent syntax (for strikethrough, markdown uses <del>, instead of --word--).

txt2tags is great because you can add regex for adding extra syntax, and it will always work this way (for example if you don't like txt2tags by default exports **bold** to <b>, you can make a rule so it can export to <strong> instead).

Anyway, txt2tags couldn't be sufficient for a common langage. Even if xml is hideous, it could serve for this purpose, if there would be only one way to handle a specific case.
Avatar
HansBKK #9
Member since Nov 2011 · 104 posts · Location: Bangkok
Group memberships: Members
Show profile · Link to this post
This is great news, and I hope to have an excuse to make the time to check this out, thanks for your contribution.

Regarding the "Rosetta Stone" I have great hope for Pandoc; it supports just about every mainstream syntax, and LOTS of great output options. However it doesn't support either DW's syntax nor txt2tags, and customization requires programming in Haskell.

Edit - note the "Pandoc native" syntax is an extended markdown, but internally uses a syntax that can be exposed from the command line as jstor data.

Also the author of the most excellent Zim desktop tool ("zim-wiki"), which uses syntax quite similar to DW's, has implemented an "export to Pandoc" plug-in, and has expressed an intention to allow that and perhaps other syntaxes to be used internally via a yet-to-be-coded plugable architecture.
This post was edited on 2012-12-17, 03:19 by HansBKK.
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20120620-dev, © 2003-2011 by Yves Goergen
Current time: 2014-04-20, 02:45:05 (UTC +02:00)