Doxter Engine

The Doxter Egine contains the core functionalities behind the Doxter CLI application, and can be included as a module in other applications and used to doxterize source files programmatically through its API. Released under MIT License.

https://github.com/tajmone/doxter

About Doxter Engine

Doxter Engine is the module used by Doxter to parse source files, extract the documentation from them and write it to an AsciiDoc file — this process is also known as «doxterizing».

This module can also be included in other applications beside the Doxter CLI tool, so you can use it to create your own doxterizing apps.

Currently, the module is still in Alpha and there is a long way to go before it will be optimized to work with other apps — custom options still need to be exposed publicly to allow fine gain control over its behavior, and right now all output text is sent to the console, so it will only work in console applications.

This document deals with Doxter Engine’s API and internals.

Engine API

The Doxter Engine API is not yet documented. Until Doxter enters the Beta stage, you’ll have to refer to source code to learn about the API.

Doxter Markers Primer

The way Doxter decides which parts of your sorce code to treat as documentation is by means of PureBasic’s comment delimiter (;) immediately followed by a character which, combined with the delimiter, comprise one of Doxter’s markers:

Table 1. Doxter’s Base Markers
`;=`	ADoc Header	Marks beginning of doc Header. (first line only)
`;>`	Region Begin	Marks beginning of a tagged region.
`;\|`	ADoc Comment	Treat line as AsciiDoc text.
`;~`	Skip Comment	The whole line will be ignored and skipped.
`;<`	Region End	Marks end of a tagged region.

SpiderBasic uses the same comments delimiter as PureBasic. As for Alan, just use its native comment delimiter (--) instead of ; — i.e. -→, --|, --~, etc. The same applies to any other languages that will be supported in the future.

In PureBasic and SpiderBasic you can freely use the special comments marks (;{/;}/;-) within Doxter’s markers (e.g. ;{>, ;}~, ;-|, etc.) execpt in the ADoc Header marker, which can only be ;=. This allows you to create regions which are foldable in PureBasic’s and SpiderBasic’s native IDEs.

The Tagged Region End marker has an alternative syntax to prevent Doxter from adding an empty line after closing the region:

Table 2. Region Markers Modifiers
`;<<`	Unspaced Region End	Don’t add empty line after closing tag.

This is useful when splitting a paragraph across multiple regions, in order to keep its text lines next to the code they belong to. Without the < modifier, Doxter’s default behavior would be to add an empty line after the closing region tag, which would split the text in multiple paragraphs in the final document.

Doxter Parsers

Understanding how Doxter’s parser works will help you grasp a clearer picture of how source files are processed, and gain insight into the proper use of its markers.

Doxter uses a two-steps parsing approach when processing documents:

Header Parser — Scans the first lines of the source file looking for an AsciiDoc Header. Whether or not it found an Header, once finished its job the Header Parser relinquishes control over to the Regions Parser.
Regions Parser — Scans the reaminder of the source file looking for tagged regions to extract.

These are two different parsers altogether, and Doxter always runs the fed source file against both of them, in the exact order specified above.

Each of these parsers obeys its own rules, and the way they interpret the comment markers (or ignore them) is slightly different. Here follow the simple rules by which each parser abides.

Header Parser Rules

The Header Parser has one single task, detect if the source contains an AsciiDoc Header and, if there is one, extract it and store it in memory.

Check if the very first line of the source file starts with ;= (no leading space allowed):
- No? Reset file pointer position to beginning of file and relinquish control to the Regions Parser. (Quit Parsing)
- Yes? Then an AsciiDoc Header was found; strip away the ; and store the line in the Header’s data storage, then:
  - (loop entrypoint) Store current file position pointer and parse the next line:
    
    If an ADoc Comment line (;|) is found, strip it of the marker and add it to Header’s data storage, then carry on with parsing loop.
    
    If a Skip Comment line (;~) is found, ignore it and carry on with parsing loop.
    
    If the parsed line is none of the above, restore previous file position from stored pointer and relinquish control to the Regions Parser. (Exit Loop, Quit Parsing)

Regions Parser Rules

The task of the Regions Parser is to extract and process all lines that are enclosed between Region Start and Region End tags, and store them in memory.

The Regions Parser alternates between two mutually exclusive modalities: Seeking, and InsideRegion.

When in Seeking modality, the parser will scan every source line until it finds a line whose first non-whitespace characters are a Region Begin marker (;> or ;{>), and it will ignore anything else. Once it finds the Region Begin marker the parser switches to the InsideRegion modality.

When in InsideRegion modality, the parser behavior changes, as every line which has not a Skip Comment marker (;~) will be processed and become part of the output document, until it find a Region End marker (;< and variants), in which case it reverts to Seeking modality, and so on, until the end of file is reached.

Furthermore, in InsideRegion modality the parser can be enter and exit the InsideCode state. This is used to track inclusion of source code lines in the region, as opposed to ADoc comment lines, for in the final document source code must be enclosed in an AsciiDoc source block, using source delimiters and setting the syntax to PureBasic. This will ensure that code is shown as a verbatim block and enable syntax highlighting (if supported).

(Seeking Modality) this is the modality the parser starts off in:
- (loop entrypoint) Parse line and check if its first non-white space characters are a Region Begin Tag (;>):
  - No? Ignore line and carry on with parsing parsing loop in Seeking mode.
  - Yes?
    
    Process line and extract tag, weight and subweight (if present):
    
    if no tag was provided, use default fallback Id instead: region followed by a counter that increases at each use (e.g. region1, region2, etc.).
    
    if no weight was provided: if a region with same tag already exists in memory, retrive its weight and use it, otherwise assign the last used weigth incremented by one (assume that the users wishes the new region to be continguos with the preceding one).
    
    if no subweight was provided: if a region with same tag already exists in memory, retrive its last used subweight, increase by 1 and use it, otherwise use value 1.
    
    Create new entry in memory for this region fragment and store its weight and subweight values.
    
    Enter InsideRegion modality (Switch Loop).
(InsideRegion Modality):
- (loop entrypoint) Parse line and check if its first non-white space characters are one of Doxter markers or not:
  - No? Then the user wants to include source code lines in the region:
    
    Set parser’s state to InsideCode.
    
    Add to current region’s stored data a blank line followed by AsciiDoc markup to open a source block ([source,purebasic]) followed by a line with source block delimiter (---, 80 chars long).
    
    Add parsed line to current region’s data, as is.
    
    Carry on parsing loop in InsideRegion modality.
  - Yes? Depending on the found marker:
    
    It’s an ADoc Comment marker (;|):
    
    If parser is in InsideCode state, add to current region’s stored data an AsciiDoc line containing a source delimiter to end source code block, followed by a blank line. Carry on parsing loop.
    
    Strip marker away (together with following space character, if present) and add line to current region’s data storage in memory.
    
    Carry on parsing loop in InsideRegion modality.
    
    It’s a Skip Comment marker (;~):
    
    Ignore line and carry on parsing loop in InsideRegion modality.
    
    It’s a Region End marker (;<):
    
    If parser is in InsideCode state, add to current region’s stored data an AsciiDoc line containing a source delimiter to end source code block, followed by a blank line. Carry on parsing loop.
    
    Check if the Region End marker contais the < modifier (;<<); if not, add a blank line to current region, otherwise not.
    
    Revert to Seeking modality (Switch Loop).

During the parsing stage no AsciiDoc tagged region begin/end lines are added to the regions stored in memory, because regions with same tag still need to be sorted and merged together (the parser stores each region fragment separately, regardless of its tag). It will be the postprocessor’s job to handle all that, and once fragmented regions are merged together the AsciiDoc // tag:[] and // end:[] lines will be added at their start and end, respectively.

The AsciiDoc // tag:[] and // end:[] lines shown in the Live Preview are just for debugging purposes, so to speak, but they are not actually stored in memory at that point.

The above rules are going to be a useful reference when you’ve began learning Doxter, and by studying them you can get the full picture of its inner workings. But studying Doxter’s main documentation and examples is a better starting point if you’re new to Doxter — also, don’t forget to look at source code of Doxter, for it’s self-documenting by its own system, and you can compare the source to the AsciiDoc output and study it, if you like to learn by examples.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

CHANGELOG

Changelog of Doxter engine module.

v0.0.5-alpha (2018/11/29) — Documentation outsourcing: Move to external AsciiDoc files the documentation regions that are not strictly commenting the Engine and its API.
- Move CHANGELOG region to CHANGELOG.adoc.
v0.0.4-alpha (2018/11/25) — Engine optimizations:
- The engine code has been slightly optimized to improve performance and code maintainability:
  - Reduced the number of RegExs used by the engine by optimizing reusability.
  - Source lines parsing has been optimized in both the Header Parser and the Regions Parser.
  - Deleted internal procedures:
    
    IsAdocComment()
    
    IsSkipComment()
    
    StripCommentLine()
v0.0.3-alpha (2018/10/11) — BUG FIX: Read Alan sources as ISO-8859-1:
- Add fileEnconding var to allow setting file read operations for Alan sources to Ascii (#PB_Ascii) to avoid breaking special characters that were being read as if encoded in UTF-8.
v0.0.2-alpha (2018/10/10) — Add support for Alan language, and improve SpiderBasic support:
- The Engine now exposes a dox::SetEngineLang(lang.s) procedure to allow setting the comment delimiter and the language of source blocks according to the selected language ("PureBasic", "SpiderBasic" or "Alan").
v0.0.1-alpha (2018/10/03) — First module engine release.