User:Cscott/Opportunities for Content Transform Team
Appearance
This is a list of future "features" that the content transform team has considered or discussed for the MediaWiki wikitext parser infrastructure. These could be the basis of proposals for future annual planning, or as seeds for milestone or other planning for wikitext as a product.
This list is in no particular order. I've tried to provide a brief "product pitch" for each and a link to further discussion.
- Native Script Editing: For languages with multiple writing systems, MediaWiki already allows users to read the article in their preferred writing system, but this would allow them to edit the article in their preferred writing system as well. First presented at Wikimania 2017.
- Global Templates: Everyone means something slightly different by this, but the product pitch from the wikitext perspective usually revolves around better semantic information ("types") for wikitext functions (templates/parser functions/extensions) to allow mapping function and argument names across languages. Multilingual extensions to Scribunto to support "localizable modules" could possibly also be considered within the parsing team's scope (see next item).
- Scribunto refresh: Scribunto is running an ancient version of Lua and has effectively been unmaintained for years. A scribunto refresh could include updating Lua, adding Visual Editor support (T205197, T54607), adding DOM-based manipulation methods or input/output types (T133543), improving multilingual programming, adding alternative module languages (say, JavaScript), or both, or integrating Scribunto and templates via Visual Template Editing.
- Fragment Rendering: There are two variants of this product feature. Both share an essential underlying framework, namely a fragment cache and a composition service.
- Incremental Parsing. This variant focuses on parsing performance: by allowing reuse of parsed fragments we can more efficiently reparse pages when edits are made. This could allow freer editing of certain templates that are currently locked because edits to them would invalidate too many pages; instead only the template would be re-rendered (instead of the entire page) for every article where it occurs. There are some subtleties here around whether it is template expansions that are cached and the main article wikitext is reparsed but with the insertion of cached templates, or else the article is cached and a reparse inserts new template expansions into the cached main article, or both. This variant's main driver in any case is parse performance, with eased editing restrictions incidental. (See T352518 for a specific example where incremental parsing would help.)
- Asynchronous Fragments. The other variant is driven by Abstract Wikipedia and the need to allow "long" computation of fragments. The composition service is separated in time from the initial parse, but many of the fundamental mechanisms are shared. The main driver here is allowing the inclusion of expensive fragments, which is not currently possible in our current parsing framework.
- Parser function refresh: Some options here include unifying orthogonal syntax for parser functions, magic words, and extensions (T204370, T204283, T204371), improved types or TemplateData for parser functions, adding an implicit data context for parser functions to act on, etc.
- Typed templates: Using TemplateData or other means to expand the type system used for template arguments and results. See Typed_Templates.
- Balanced templates: Prevents accidental "leakage" of unbalanced template contents into the containing article, and implicitly allows improved editing, faster parsing, etc. T114445
- Heredoc arguments: Better argument quoting for wikitext. T114432
- New Wikitext for discussions: Improved list syntax, a parser function for signatures, etc. Aims at making discussions easier to author and easier for tools like DiscussionTools to parse and build on top of. More in this brief Wikimania talk: New Wikitext for Chat Pages
- Semantic mapping between revisions: Parsoid's selective serialization mechanism is based on a DOM-level diff between revisions. Many tools for understanding article history and authorship can be based on this foundation; it is also useful in remapping annotations as an article is edited.
- Annotations: Many types of content possible on Wikipedia are discouraged because the resulting annotations make the article wikitext "too noisy" for normal editors. Some examples are: inline discussions, phonetic markup for text-to-speech, cross-language mapping, and even certain reference information. Detailed lists are among the links at Amazing Article Annotations. A general-purpose annotation service would unlock many areas of content authorship without "disturbing" the canonical readable wikitext for the article.
- TemplateData extensions: Adding additional semantic/type/UX information to the information available about a given template/scribunto module/parser function/extension/etc; mentioned interalia among many of the other items on this list.
- Real time collaboration: Mostly an editing-team project (T112984) but some of the ideas about authorship and expanding the history model of wikitext could involve our team.
- LanguageConverter / Glossaries: Native script editing was discussed above. In addition, fundamental work on language converter is overdue. (In fact, this component is currently unowned by any team.) The mechanism by which new writing systems are added is difficult and ad-hoc. We could adopt a rule-based engine such as that used by libicu in order to expand the set of writing systems we support; we could also add more robust support for "glossaries" of topic-specific rules used extensively in written Chinese.
- This work should be done jointly with the Language team, as our team doesn't have the language expertise or the connection with the user communities needed to independently drive this work.
- Wikitext 2.0: This is a broad topic including a number of disparate proposals to remove corner cases from wikitext processing and make authoring wikitext more predictable. Inspired by the success of Parsoid/VisualEditor, most proposals include a round-trip mapping from existing wikitext, often via the HTML-based MediaWiki DOM Spec, so that pages can be edited/stored in either dialect.
- Markdown support: In the broader ecosystem of markup languages, wikitext has been eclipsed by Markdown. Markdown lacks certain features widely used in Wikimedia projects (especially support for a template-like inclusion mechanism), but with the addition of a suitable extension it may be possible to reduce the barrier of entry for new editors by allowing the use of markdown as an alternative authoring language. This could involve bidirectional translation, as with the "wikitext 2.0" proposals, or else new content model which certain pages could opt in to.
- Visual Template Editing: Visual Editor currently allows editing the main wikitext for an article, but does not allow editing pages in the
Template:
namespace, even for very simple uses of templates. T114454 proposed extensions to allow VisualEditor to edit mustache-like templates. - Implicit data objects: Inspired by the "arrayfunctions" work of Semantic MediaWiki, this would associate every page with an implicit data object, perhaps created by one or more wikidata queries or Scribunto module invocations. Parser functions on the page would operate on the implicit data object to extract properties or iterate over components. T122934#9196348, talk at SMWCon 2022.
- Media/layout refresh: Figure formatting and layout, as well as the "intrusion" of UX elements into the main article content area, have been recurring issues. Among the possible product solutions might be stronger media formatting options or even the inclusion of a separate "page layout specification" which is composed with multiple "slots" of article content to create the final page. T113004.