Doxter: A Docs from Sources Generator.

Welcome to Doxter, a DRY Documentation from source generator leveraging Asciidoctor tagged regions to allow contents reuse across documents via selective inclusions, and a custom weights-based system to control the order in which regions should be rearranged in the final document.

Written in PureBasic for PureBasic, SpiderBasic and Alan IF (more languages coming soon).

https://github.com/tajmone/doxter

About Doxter

Doxter was conceived as a practical solution to simplify management of source code documentation. Specifically, its birth and growth are tied to the development of prototype tools for the PureBasic-CodeArchiv-Rebirth project, which has challenged me in keeping the documentation of multiple modules always up to date with their current code.

Working on separate documentation and source code files is both tiring and a cause of redundancy — you’ll need to include some documentation in the source file comments, and you also need to include some code excerpts in the documentation. Why duplicate the effort when you can keep it all in one place?

Code generators are not a new idea (and surely, not my original idea either); there are plenty of code generator tools and frameworks out there, but most of them are not language agnostic, don’t integrate well with PureBasic, or require a complex setup envolving lots of dependencies.

Doxter was originally designed to work with PureBasic, leveraging the power of AsciiDoc and with simplicity in mind. It now supports SpiderBasic and Alan IF sources too and, ultimately, it will be become a language agnostic tool usable with almost any language.

Who Needs Doxter?

Any PureBasic or SpiderBasic programmer who knows AsciiDoc and wants to include documentation of his/her code directly in the source files can benefit from Doxter by automating the task of producing always up-to-date documentation in various formats (HTML5, man pages, PDF, and any other output format supported by Asciidoctor backends).

Acknowledgements

Although quite different in design, Doxter was inspired by Lou Acresti’s Cod, an unassuming doc format (Documentus modestus). The simplicity of Cod sparkled the idea that I could implement something similar, but exploiting AsciiDoc tagged regions instead.

My gratitude to Lou Acresti (aka @namuol) for having created such an inspiring tool like Cod.

Doxter Engine

At the core of the command line Doxter tool lies the Doxter Engine, which is available as an independent module that can be used other applications too.

For more information see the Doxter Engine API Documentation.

Features

Doxter is a command line tool that parses a source file and extracts from it tag-delimited regions of code, these regions are then processed according to some very simple rules in order to produce a well formed AsciiDoc source document which can then be converted to HTML via Asciidoctor (Ruby).

Cross Documents Selective Inclusions

Every tagged region in the source file becomes an AsciiDoc tagged region in the output document. The following PureBasic source comments contain a simple Doxter region:

;>
;| I'm a Doxter _region_.
;<

... which, in the final document, Doxter will render as AsciiDoc:

// tag::region1[]
I'm a Doxter _region_.

// end::region1[]

Regions can be named in the source file, by providing an identifier after the ;> marker, allowing you to control regions' tag names in the AsciiDoc output:

;>intro
;| == Introduction
;|
;| This a _named_ region.
;<

// tag::intro[]
== Introduction

This a _named_ region.

// end::intro[]

This is a very practical feature for it allows other AsciiDoc documents to selectively include parts of a source file documentation using tag filtering.

For example, when documenting an application that relies on imported Modules, the main document can selectively include regions from the Doxter-generated modules' documentation, thus allowing to maintain both independent documentation for every Module API, as well as having a main document that extrapolates parts from the modules' docs in a self-updating fashion.

Ordo ab Chao: Structured Docs from Scattered Comments

Each tagged region in the source file can be assigned a weight, so that in the final document the regions will be reordered in a specific way, forming a well structured document that presents contents in the right order.

;>sec1(200)
;| == Section One
;|
;| And this is Sec 1.
;<
For i= 1 To 10
  Debug "i = " + Str(i)
Next
;>premise(100)
;| == Premise
;|
;| This is an opening premise.
;<

// tag::premise[]
== Premise

This is an opening premise.

// end::premise[]
// tag::sec1[]
== Section One

And this is Sec 1.

// end::sec1[]

This feature allows to keep each paragraph near the code lines that it discusses, making the source code more readable and freeing the documentation from the constraints imposed by the order in which the code is organized.

Furthermore, regions with same tag names in the source code will be merged into a single region in the final document. Each region fragment (aka subregion) can be assigned a subweight which will be used to sort the order of the fragments before merging them together. This allows you to control the number of regions in the final document, and keep related topics under a same region.

In the following example:

;>even_macro_intro(.2)
;| The following macro performs a bitwise AND operation to determine if an
;| integer is even or not.
Macro IsEven(num)
  (num & 1 = 0)
EndMacro
;<

;>macro_test(200)
;| Let's test that the macro actually works as expected.
For i = 1 To 5
  If isEven(i)
    Debug Str(i) +" is even."
  Else
    Debug Str(i) +" is odd."
  EndIf
Next
;<

;>even_macro_intro(100.1)
;| === The IsEven Macro
;|
;| Using bitwise operations insted of modulo (`%`) is much _much_ faster -- in
;| the order of hundreds of times faster!
;<
;>even_macro_intro(.3)
;| This works because `IsEven = ((i % 2) = 0)` equals `IsEven = ((i & 1) = 0)`.
;<

... all the regions named even_macro_intro are merged into a single region after being sorted according to ther subeweights (.1, .2 and .3):

// tag::even_macro_intro[]
=== The IsEven Macro

Using bitwise operations insted of modulo (`%`) is much _much_ faster -- in
the order of hundreds of times faster!

The following macro performs a bitwise AND operation to determine if an
integer is even or not.

[source,purebasic]
--------------------------------------------------------------------------------
Macro IsEven(num)
  (num & 1 = 0)
EndMacro
--------------------------------------------------------------------------------


This works because `IsEven = ((i % 2) = 0)` equals `IsEven = ((i & 1) = 0)`.

// end::even_macro_intro[]
// tag::macro_test[]
Let's test that the macro actually works as expected.

[source,purebasic]
--------------------------------------------------------------------------------
For i = 1 To 5
  If isEven(i)
    Debug Str(i) +" is even."
  Else
    Debug Str(i) +" is odd."
  EndIf
Next
--------------------------------------------------------------------------------


// end::macro_test[]

Keep your comments next to the code they belong to, allowing the source file to follow its natural course and provide meaningful snippets of in-code documentation, and use weighed tag regions to ensure that these out-of-order fragments will be collated in a meaningful progressive order in the output document.

Mix Text and Source Code in Your Documentation

Regions can contain both of AsciiDoc comments markers and source code, allowing to include fragments of the original source code in the final documentation, along with AsciiDoc text.

AsciiDoc markers are comment lines with special symbols after the native language comment delimiters, which will be treated as normal comments by the source language, but which Doxter will strip of the comment delimiter and turn into AsciiDoc lines in the output document.

Any source code (i.e. non-AsciiDoc comments) inside a tagged region will be rendered in the final document as an AsciiDoc source code block set to the source language (e.g. PureBasic).

;>macro_test(200)
;| Let's test that the macro actually works as expected.
For i = 1 To 5
  If isEven(i)
    Debug Str(i) +" is even."
  Else
    Debug Str(i) +" is odd."
  EndIf
Next
;<

// tag::macro_test[]
Let's test that the macro actually works as expected.

[source,purebasic]
--------------------------------------------------------------------------------
For i = 1 To 5
  If isEven(i)
    Debug Str(i) +" is even."
  Else
    Debug Str(i) +" is odd."
  EndIf
Next
--------------------------------------------------------------------------------


// end::macro_test[]

Command Line Options

To invoke Doxter via command prompt/shell:

doxter <sourcefile>

… where <sourcefile> is a source of one of the languages supported by Doxter (PureBasic, SpiderBasic or Alan).

Command Line Usage

Doxter is a binary console application (a compiled executable). There is no installation, just place the binary file (Doxter.exe on Windows) in your working folder, or make it availabe system-wide by adding it to the system PATH environment variable.

Input File Validation

Doxter checks that the passed <sourcefile> parameter has a valid extension, and then sets the Doxter Engine language accordingly:

`pb`, `pbi`, `pbf`	→ PureBasic
`sb`, `sbi`, `sbf`	→ SpiderBasic
`alan`, `i`	→ Alan

If the file extension doesn’t match any of the supported extensions, Doxter will report an error and abort with Status Error 1.

Doxter will also check that the file exists, it’s not a directory, and it’s not 0 Kb in size; and abort with Status Error if any of these are met.

Depending on whether the source file contains or not an AsciiDoc Header, the output file will be named either <sourcefile>.asciidoc or <sourcefile>.adoc, respectively. At parsing completion, Doxter will inform the user wther it found a Header or not, and print the output filename to the console.

This differentiation in the extension used in the output file is due to the conventions and needs of the PureBasic CodeArchiv project, where files with .asciidoc extension are considered stand-alone documents, which are subject to script-automated conversion to HTML; whereas files with .adoc extension are considered snippets file which are imported by other docs. Beside the different file extensions, both type of output files are formated as standard AsciiDoc documents (with Asciidoctor Ruby in mind).

This is inline with the AsciiDoc standard which demands the presence of a document Header in a source file for it to be buildable as a standalone doc; and with the common practice of splitting large documents in smaller files, which are then imported into the main document and therefore don’t need a Header of their own.

Parsing Live Preview During Execution

During execution, Doxter will output to the console a preview of the parsed lines that belong to tagged regions, showing their ADoc processed version, together with extra lines added by the parser (eg. source code delimiters, blank lines, etc).

Although the shown lines are just an aproximation of the final document (the regions will be postprocessed, merged and reoderdered before writing them to file), this feature is very useful to visually trace the source of problems when the ouput results are not as intendend, as the log provides a human friendly insight into Doxter’s parser.

Here’s an example of how the console output looks like:

|0099|4100|   1|region tag, which would split the text in multiple paragraphs in the final (1)
|0100|4100|   1|document.
|0101|4100|   1|// end::Comments_Marks[] (2)
|    |4100|   1| (3)
|0169|4101|  10|// tag::CLI_Usage[] (4)
|0170|4101|  10|=== Command Line Options

1	Continuation lines of a region with weight `4100` and subweight `1`.
2	AsciiDoc tagged region `end::` generated by Doxter when it encountered a `;<` marker.
3	Blank line added by Doxter; note that there is no corresponing line number, for it is not found in the source file.
4	Region Being marker found ad line 169, with wieght `4101` and sebweight `10` (probably the continuation of a fragmented region).

There are four columns in the preview, representing the line number in the source file, the region’s weight, its subweight, and a preview of the line converted to AsciiDoc.

The absence of line number in the first column indicates that what you are seeing on the right hand side is a line generated by Doxter, and added to the output document for formatting purposes (e.g. a blank line, source code block delimiters, etc.).

The weight colum is very useful when looking at the logged output for it allows to easily spot where regions start and end, as each region should have a different weight (although not mandatory). Header lines will always show the text head in the second and third columns, instead of numbers, because the Header has no weight or subweight.

Documenting Your Source Files

Now comes the juicy part, how to incorporate documentation into you source files. The good news is that the system employed by Doxter is very easy to learn and simple to use.

Doxter Markers Primer

The way Doxter decides which parts of your sorce code to treat as documentation is by means of PureBasic’s comment delimiter (;) immediately followed by a character which, combined with the delimiter, comprise one of Doxter’s markers:

Table 1. Doxter’s Base Markers
`;=`	ADoc Header	Marks beginning of doc Header. (first line only)
`;>`	Region Begin	Marks beginning of a tagged region.
`;\|`	ADoc Comment	Treat line as AsciiDoc text.
`;~`	Skip Comment	The whole line will be ignored and skipped.
`;<`	Region End	Marks end of a tagged region.

SpiderBasic uses the same comments delimiter as PureBasic. As for Alan, just use its native comment delimiter (--) instead of ; — i.e. -→, --|, --~, etc. The same applies to any other languages that will be supported in the future.

In PureBasic and SpiderBasic you can freely use the special comments marks (;{/;}/;-) within Doxter’s markers (e.g. ;{>, ;}~, ;-|, etc.) execpt in the ADoc Header marker, which can only be ;=. This allows you to create regions which are foldable in PureBasic’s and SpiderBasic’s native IDEs.

The Tagged Region End marker has an alternative syntax to prevent Doxter from adding an empty line after closing the region:

Table 2. Region Markers Modifiers
`;<<`	Unspaced Region End	Don’t add empty line after closing tag.

This is useful when splitting a paragraph across multiple regions, in order to keep its text lines next to the code they belong to. Without the < modifier, Doxter’s default behavior would be to add an empty line after the closing region tag, which would split the text in multiple paragraphs in the final document.

That’s about all you’ll have to learn: memorize those five base symbols, their variants and modifiers, and learn how to use them correctly and wisely.

Doxter is a “dumb” tool — it doesn’t try to interpret or validate what comes after these markers, it just uses them to delimit and manipulate lines from your source file according to some simple predefined rules. It’s your responsibility to ensure that the contents of the tagged regions are AsciiDoc compliant.

But as you shall see, these five simple markers empower you with great freedom to document your source code. Thanks to some simple rules devised on common sense expectations of how text and source code should blend in documentation, Doxter will parse smartly your source files, with little effort on your side.

Doxter’s Parser

Understanding how Doxter’s parser works will help you grasp a clearer picture of how source files are processed, and gain insight into the proper use of its markers. You can read a full dscription of the parser’s workflow in the Doxter Parsers section of Doxter Engine’s documentation. For the time being, you should just bare in mind a couple of things.

Doxter uses a two-steps parsing approach when processing documents:

Header Parser — Scans the first lines of the source file looking for an AsciiDoc Header. Whether or not it found an Header, once finished its job the Header Parser relinquishes control over to the Regions Parser.
Regions Parser — Scans the reaminder of the source file looking for tagged regions to extract.

These are two different parsers altogether, and Doxter always runs the fed source file against both of them, in the exact order specified above.

Each of these parsers obeys its own rules, and the way they interpret the comment markers (or ignore them) is slightly different. What you should keep in mind is that the two parsers are independent from each other, and so are their rules.

AsciiDoc Header

The very first line in your source code is special for Doxter. The Header Parser will look if it starts with a ;=. This marker is the telltale sign for Doxter that the first lines contain an AsciiDoc Header. Here’s an example from the very source of Doxter:

;= Doxter: A Docs from Sources Generator.
;| Tristano Ajmone, <tajmone@gmail.com>
;| v0.2.5-alpha, 2018-11-29: PureBASIC 5.62
;| :License: MIT License
;~------------------------------------------------------------------------------
;| :version-label: Doxter
;| :toclevels: 3
#DOXTER_VER$ = "0.2.5-alpha"
;{******************************************************************************
; ··············································································
; ······························ PureBasic Doxter ······························
; ··············································································
; ******************************************************************************

As you can easily guess by looking at the first 4 lines in the above code, these represent a standard AsciiDoc Header, followed by a custom attribute (:License: MIT License), a Skip line (ignored by Doxter) used as horizontal ruler divider, and some more Asciidoctor settings attributes (:version-label: and :toclevels:).

Everything is as it would be in a normal AsciiDoc Header, except that the Header lines are inside PureBasic comments. The remaining lines are just normal (non-Doxter) PB code and comments.

When Doxter encounters a ;= on the very first line, it will then parse all consecutive lines starging by ;| (the ADoc comment marker) as part of the Header, adding it to the stored Header data. Lines starting with ~| (the Skip comment marker) are simply ignored, and they are not considered as the end of a Header. As soon as line not starting by ;| or ~| is encountered, Doxter will stop parsing the Header.

Separate handling of the Header is important for two reasons:

Documents which don’t contain an AsciiDoc Header will not be treated as standalone documents (and saved with .adoc extension).
The Header lines must always be injected at the very beginning of the output file, before any of the tagged regions extracted from the source file (and regardless of their weights).

The latter point is important because it’s in compliance with how AsciiDoc looks for a Header in source files.

Whether or not Doxter found and Header in the source file, once it has dealt with it it will carry on to the next parsing stage: scanning the source for tagged regions. The Header and Regions parsers are two distinct parsers that coexist in Doxter, and the latter takes on where the former left.

The Header parser doesn’t consume those lines that didn’t match its criteria, and as soon as it encounters a non Header line it rolls back the parser to the last file position, so that the regions parser can parse them instead.

Working With Regions

The full syntax of a Tag Region Begin mark is:

;>tagname(<region weight>.<region subweight>)

To Be Continued…

The documentation is not complete yet, as it lacks the part on practical examples.

The provided documentation should be enough to get started using Doxter; for examples, study its source code in the mean time, and use it, use it, and use it again, for it’s easier to use than it might seem by reading its documentation.

Also, by using it you can benefit from the live parsing preview log, which is an invaluable tool for learning.

Roadmap

Doxter it’s still a young application, and there is always room for improvements. Here is a list of upcoming features, waiting to be implemented.

Support More Languages. The ultimate goal is to make Doxter a language agnostic tool, usable with any language, by extending the set of natively supported languages and by allowing to specify via command line options a custom comment delimiter and set the default language name to be used in AsciiDoc source code blocks.
Configuration Files. When a doxter.json file is found in the current folder, Doxter will use it to extract options and settings to use when doxterizing sources. Once this feature is implemented, it will open the door to more advanced features:
- Automated Documentation Tasks. If Doxter is invoked without specifying an input file, the settings file will be scanned to find "doxter tasks" — i.e. a list of files which should be doxterized and (optionally) converted via Asciidoctor, allowing custom options and settings for both Doxter and Asciidoctor. This will allow to fully automate maintainance of Doxter documentation in projects.

License

Doxter is released under MIT License.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Doxter Changelog

Changelog of Doxter command line application.

v0.2.5-alpha (2018/11/29) — Documentation outsourcing: Move to external AsciiDoc files the documentation regions that are not strictly commenting parts of the source code:
- Move CHANGELOG region to CHANGELOG.adoc.
- Move to README.adoc the following commented regions: Acknowledgements, Roadmap.
The goal is to keep the source files shorter and that doxtering each source module will produce a reference document strictly focusing on the module technicalities.

From now on, Doxter documentation will be handled in external AsciiDoc files that will then import from the doxterized sources the relevant tagged regions. Since the documentation is starting to grow larger, this approach is more flexible and allows reusing the same regions contents in multiple documents.
v0.2.4-alpha (2018/11/25) — Engine optimizations:
- The engine code has been slightly optimized to improve performance and code maintainability.
v0.2.3-alpha (2018/10/11) — BUG FIX:
- Read Alan sources as ISO-8859-1. Add dox::fileEnconding var to allow setting file read operations for Alan sources to Ascii and avoid breaking special characters that were being read as if encoded in UTF-8.
v0.2.2-alpha (2018/10/11) — BUG FIX:
- Corrupted filenames. A bug was corrupting output filenames of Alan source files with .i extension. Now fixed.
v0.2.1-alpha (2018/10/10) — Add Alan IF support.
- Now Doxter will detect from the input file’s extension whether it’s a PureBasic, SpiderBasic or Alan IF source file, and set the comment delimiter and base language (to use in ADoc source blocks) accordingly.
- The supported extensions, and associated languages now are:
  - pb, pbi, pbf → PureBasic
  - sb, sbi, sbf → SpiderBasic
  - alan, i → Alan
v0.2.0-alpha (2018/10/03) — Move Doxter Engine to separate module:
- Now the core engine of Doxter is in a separate module, so that it will be usable by other applications too (still needs some fixes to be usable in non-console applications).
v0.1.4-alpha (2018/10/01) — Documentation:
- AsciiDoc examples now syntax highlighted.
v0.1.3-alpha (2018/09/29) — Doxter engine improved:
- PureBasic special comments markers (;{, ;} and ;-) can now be used in all Doxter markers, except ADoc Header (;=).
- Regions merging feature introduced:
  - Tagged regions with same tag identifier are merged into a single region in the output document:
    
    All region fragments will be sorted by subweight before merging.
  - Region subweight:
    
    New subweight parameter (optional) introduced in Region Begin marker, (e.g. ;>tag(100.99)` or ;>(.99), where subweight is 99).
    
    If the marker doesn’t provide a subweight, the last subweight value used with that tag will be automatically employed after incrementing it by 1.
  - When a weightless Region Begin marker is encountered, if a region with the same tag already exists, that region’s weight will be used for the new region fragment, otherwise it will be given weight 1.
  - If multiple weight definitions are given for a same region tag, the last one encountered will override the previous ones.
- Parsing Live Preview now shows subweight in new third column.
v0.1.2-alpha (2018/09/25) — Aesthetic changes.
v0.1.1-alpha (2018/09/25) — Created Doxter repository on GitHub.
v0.1.0-alpha (2018/09/21) — First public released Alpha: https://github.com/tajmone/PBCodeArcProto/blob/83c32cd/_assets/Doxter.pb