Jump to content

Wikipedia:Prosesize

From Wikipedia, the free encyclopedia
Prosesize
DescriptionAdd a toolbox link to show the size of and number of words in a page
Author(s)
UpdatedAugust 18, 2024; 2 months ago
Skins
  • Vector
  • Monobook
  • Modern
  • Timeless
  • MinervaNeue
Source
GitHub repositoryprosesize

Prosesize is a gadget for adding a toolbox link to show the size of and number of words in a page. It is a rewrite of User:Dr_pda/prosesize.js.

Installation and removal

[edit]

Like most Wikipedia tools, you must be logged in to install or use the Prosesize gadget.

To install it, select it at Preferences → Gadgets → Browsing → Tick Prosesize: add a toolbox link to show the size of and number of words in a page (direct link), and then save. To remove the gadget, disable the gadget in your preferences (if you installed User:Dr pda/prosesize.js in your Special:MyPage/common.js or Special:MyPage/skin.js, remove that too).

Usage instructions

[edit]

Once you are logged in and have installed the gadget, go to the left panel of the Wikipedia page (or the right panel if you use Vector2022), under "Tools" click "Page size" (size won't appear if you are not logged in) and you will see on the top left corner of the page − below the title of the article − the data from the app.

Sample output

[edit]
  • HTML document size: 270 kB
  • Prose size (including all HTML code): 88 kB
  • References (including all HTML code): 65 kB
  • Wiki text: 83 kB
  • Prose size (text only): 56 kB (9412 words) "readable prose size"
  • References (text only): 8241 B

Meaning of output

[edit]

Summary

[edit]
  • HTML document size: Size of the HTML downloaded by your browser.
  • Prose size (including all HTML code): Size of HTML within <p> tags
  • References (including all HTML code): Size of reference HTML
  • Wiki text: Size of wikitext (seen when editing)
  • Prose size (text only): Size and word count of text within <p> tags (called "readable prose size")
  • References (text only): Size of reference text

HTML document size

[edit]

This is the total size of the HTML document. If you went to View->Page Source (or the equivalent) in your browser, and saved the resulting output to your computer, the file size would be the size of this file. This number does not include any images.

Prose size

[edit]

The script counts the text within <p> tags in the HTML source of the document, which corresponds almost exactly to the definition of "readable prose". This method is not perfect, however, and may include text which isn't prose, or exclude text which is (e.g. in {{cquote}}, or prose written in bullet-point form). The text counted as prose is highlighted in yellow, so it is easy to see whether the prose size is over or underestimated.

Two numbers are given for the prose size: HTML and text only. The HTML size is the size of the HTML code contained within <p> tags. This number can be compared to the file size to see how much of the document consists of readable prose. The text-only size is the size of just the words, without any formatting. (This is what you would get if you copied and pasted the prose from the article into something like notepad, which strips out all the formatting). The word count is self-explanatory and is based on splitting the text by spaces.

References size

[edit]

The HTML references size is the size of what is produced by the <references/> tag, plus the size of the HTML to produce the markers (i.e. [1]). The text-only size is again just the text of the references, plus the text of the markers. Note that the contribution of the markers is explicitly subtracted from both prose size numbers. The markers also should not affect the word count, since there should be no spaces between them and the preceding word/punctuation.

Wikitext size

[edit]

In addition to the above numbers, which are calculated from the HTML source of the page, there is also the size of the text plus wiki markup which appears in the edit box when you edit a page. This number is shown next to each revision on the History tab. The script queries the API to retrieve this value for the current article.