User:PerfektesChaos/js/WikiSyntaxTextMod/flow/tag
WikiSyntaxTextMod → Syntax polishing → Step 2
The second step in the syntax polishing exercise standardizes tags like <tag>
(also comments) and detects errors.
Scope
[edit]The common and unique appearance of tags is accomplished. Human authors shall not be confused by various formatting styles. Bots and scripts may identify structures in a reliable and simple manner.
Only well known elements will be processed:
a applet area audio b base bdi big blockquote body br button center code command dfn div em embed font form frame frameset gallery h1 h2 h3 h4 h5 h6 head hiddentext hiero hr html i iframe imagemap img includeonly input inputbox isindex kbd layer link map math meta noinclude nowiki object onlyinclude option pages poem pre rb rbc ref references rp rt rtc ruby s samp score script select small source span strike strong style sub sup syntaxhighlight templatedata textarea timeline title tt u wbr xml
Comments are considered here, too.
All unknown tags will be ignored.
Formatting
[edit]The following format is expected after polishing:
- A known tag opened by
<
is to be closed by>
and no other<
or>
is permitted inside. - After and before the limiting
< >
there is no whitespace. - All known tags as enumerated above consist of lowercase letters only.
- If a backslash
\
is detected just after<
or before>
a manual mistake is assumed and this one is turned into a regular slash. - An end tag is written in compact notation:
</sup>
. - An unary tag (like
<references />
) is written with exactly one space between name (or attribute) and slash. - Elements which are permitted in HTML unary only (
br
,hr
andwbr
) are enforced to be a unary tag whereever what kind of slash might be present. - Empty elements (like
<nowiki></nowiki>
and<references></references>
) will be turned into one unary tag.- If there is only whitespace (spaces or linebreaks) between the tags they are regarded as empty, too. There is an optical effect of
<pre>
\n</pre>
but not meaningful except for the Whitespace language. However,<syntaxhighlight>
keeps any content unchanged. In other cases an empty tag pair is to be filled with some content. - For
<div></div>
an exception is made.
- If there is only whitespace (spaces or linebreaks) between the tags they are regarded as empty, too. There is an optical effect of
- All attribute names are turned into lowercase letters.
- Every attribute is permitted only one time, multiple occurrence causes an error message.
- Attribute assignments are written as
attr="Val"
in compact notation:- Whitespace around the equal sign will be removed.
- The value is encosed in quotation marks
"
. - If inside the value a
"
has been identified, the apostrophe'
is kept. - It is not possible that both quotation mark and apostrophe shall occur in a wikitext and a syntax error (missing delimiter) is assumed, triggering an error message.
<
or>
enclosed in quotation marks are not accepted.- Leading and trailing whitespace within the value encosed by quotation marks will be removed.
- Assignments of empty values are invalid and cause an error message. This goes not for occasional single attributes without equal sign (which are quite rare).
- Before and ahead an attribute assignment there is exacly one space.
- In case of multi-line tags line breaks are kept.
Nesting
[edit]Associated opening and closing tags are identified.
Correct nesting is checked; if end tags are missing or superfluous in a level an error message is thrown.
Some elements are processed immediately from opening until closing tag.
Content analysis
[edit]nowiki
ranges and some (unary) elements will be protected immediately after regions which are commented out.syntaxhighlight
areas will be protected next and entirely.- If possible (key word „syntaxhighlight“ not within range) the obsoleted
source
is turned intosyntaxhighlight
. By the way, thestrike
tag is standardized as<s>
.
- If possible (key word „syntaxhighlight“ not within range) the obsoleted
- For security reasons HTML elements with URL links out of wiki projects (like
<a href=
or<img src=
) are blocked in the generated HTML page. Within wikitext the script will deactivate them by transformation of the leading<
into<
, which yields the same optical appearance. - If typographical tags are met in unary shape, which is meaningful in binary mode only (like <b />, <em />, <i />, <span /> etc.), a certain bad habit is assumed and they are turned into
<nowiki />
. Parameters would be pointless and will be removed. - On activities in
<br />
, which use the CSS propertystyle="clear:
… or contain the non-standardclear=
…, only the block element<div />
is possible andbr
will be transformed respectively. Non-standard forms in<div />
are interpreted and according to the intention properstyle="clear:both"
etc. will be assigned.- In order to ensure valid HTML
<div … />
is written as empty<div …></div>
.[1]
- In order to ensure valid HTML
- If an attribute assignment is mandatory or might not be permitted, an error message is shown.
- With elements
gallery ref references
well-known parameters are tolerated only.
- With elements
- If the kind of element suggests more specific processing, whitespace formatting, syntax analysis or possibly content protection, this is done or prebooked.
Comments
[edit]- For the beginning of a comment
<!--
the adjacent end-->
is searched. If the end cannot be found or there is a space detected within the beginning of a comment an error message is displayed. - A comment may be subject to a user defined comment modification.
- All comments will be protected against any further searching and replacement.
Remarks
[edit]- ^ The inner tags of wikisyntax are not kept in the HTML document and may be provided as unary XML.
[ German page ]