Jump to content

User:TomTheHand/Unit tests for AWB regexes/General

From Wikipedia, the free encyclopedia

This section contains regular expressions that make general fixes, not limited to a particular topic or type of unit.

Replace incorrect or poorly supported characters

[edit]

Replace non-breaking hyphen with regular hyphen

[edit]
Description
Replace non-breaking hyphen with regular hyphen. The non-breaking hyphen is poorly supported in browsers, so it probably shouldn't be used on Wikipedia.
Find
Replace with
-
Regular expression? Case sensitive?
N N/A
Text this regex should modify: Intended result:

16‑inch guns

16-inch guns

Replace degree-like symbols with proper degree symbol

[edit]
Description
Replace Unicode ordinal indicator (º) or ring above (˚) with degree sign (°). Be careful about false positives! Ensure that the alternate symbols aren't actually intended. In most ship articles, they're probably a mistake, but they do have legitimate uses!
Find
[º˚]
Replace with
°
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:
  1. º
  2. ˚
  1. °
  2. °

Use correct Unicode symbol for micro

[edit]
Description
Use the correct Unicode symbol for micro-, and insert non-breaking spaces per MoS.
Find
\b(\d+)(?:\s| |-)*μ(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|M|l)\b
Replace with
$1 µ$2
Regular expression? Case sensitive?
Y Y
Text this regex should modify: Intended result:
  1. 5 μm
  2. 5μg
  3. 5-μW
  4. 5 μHz
  1. 5 µm
  2. 5 µg
  3. 5 µW
  4. 5 µHz

Format <br /> tags

[edit]
Description
Give <br /> tags proper XHTML format.
Find
</?br\s*/?>
Replace with
<br />
Regular expression? Case sensitive?
Y N
Text this regex should modify: Intended result:






Make general SI fixes

[edit]

Make k for kilo- lower-case

[edit]
Description
Make k for kilo- lower-case, and insert non-breaking spaces per MoS.
Find
\b(\d+)(?:\s|&nbsp;|-)*K(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|M|l)\b
Replace with
$1&nbsp;k$2
Regular expression? Case sensitive?
Y Y
Text this regex should modify: Intended result:
  1. 15 KV
  2. 15Kg
  3. 15 Km
  4. 15-KW
  1. 15 kV
  2. 15 kg
  3. 15 km
  4. 15 kW

There may be some cases where the text or HTML may be preferable to Unicode; be careful of those situations.

I feel that in many cases vulgar fractions from sources are worth retaining as Unicode symbols rather than converting to a decimal. For historical articles, vulgar fractions feel appropriate, and they give level of precision that is lost on conversion to a decimal. For example, converting 5⅞ to 5.875 implies precision to the thousandth when you only actually have precision to the eighth. If you were to convert to 5.9 instead, you're losing information and still implying higher precision than the measurement actually provides.

Unicodify 1/2

[edit]
Description
Replace 1/2 with the Unicode symbol ½
Find
\b1/2\b
Replace with
½
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

1/2

½

Text this regex should not modify:
  1. 1/20
  2. 11/2

Unicodify 1/3

[edit]
Description
Replace 1/3 with the Unicode symbol ⅓
Find
\b1/3\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

1/3

Text this regex should not modify:
  1. 1/30
  2. 11/3

Unicodify 2/3

[edit]
Description
Replace 2/3 with the Unicode symbol ⅔
Find
\b1/3\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

2/3

Text this regex should not modify:
  1. 2/30
  2. 12/3

Unicodify 1/4

[edit]
Description
Replace 1/4 with the Unicode symbol ¼
Find
\b1/4\b
Replace with
¼
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

1/4

¼

Text this regex should not modify:
  1. 1/40
  2. 11/4

Unicodify 3/4

[edit]
Description
Replace 3/4 with the Unicode symbol ¾
Find
\b3/4\b
Replace with
¾
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

3/4

¾

Text this regex should not modify:
  1. 3/40
  2. 13/4

Unicodify 1/5

[edit]
Description
Replace 1/5 with the Unicode symbol ⅕. Browser support for fifths isn't great, so you may not want to use this one.
Find
\b1/5\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

1/5

Text this regex should not modify:
  1. 1/50
  2. 11/5

Unicodify 2/5

[edit]
Description
Replace 2/5 with the Unicode symbol ⅖. Browser support for fifths isn't great, so you may not want to use this one.
Find
\b2/5\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

2/5

Text this regex should not modify:
  1. 2/50
  2. 12/5

Unicodify 3/5

[edit]
Description
Replace 3/5 with the Unicode symbol ⅗. Browser support for fifths isn't great, so you may not want to use this one.
Find
\b3/5\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

3/5

Text this regex should not modify:
  1. 3/50
  2. 13/5

Unicodify 4/5

[edit]
Description
Replace 4/5 with the Unicode symbol ⅘. Browser support for fifths isn't great, so you may not want to use this one.
Find
\b4/5\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

4/5

Text this regex should not modify:
  1. 4/50
  2. 14/5

Unicodify 1/6

[edit]
Description
Replace 1/6 with the Unicode symbol ⅙. Browser support for sixths isn't great, so you may not want to use this one.
Find
\b1/6\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

1/6

Text this regex should not modify:
  1. 1/60
  2. 11/6

Unicodify 5/6

[edit]
Description
Replace 5/6 with the Unicode symbol ⅚. Browser support for sixths isn't great, so you may not want to use this one.
Find
\b5/6\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

5/6

Text this regex should not modify:
  1. 5/60
  2. 15/6

Unicodify 1/8

[edit]
Description
Replace 1/8 with the Unicode symbol ⅛. Support for eighths is better than fifths or sixths, so this one is probably safe to use.
Find
\b1/8\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

1/8

Text this regex should not modify:
  1. 1/80
  2. 11/8

Unicodify 3/8

[edit]
Description
Replace 3/8 with the Unicode symbol ⅜. Support for eighths is better than fifths or sixths, so this one is probably safe to use.
Find
\b3/8\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

3/8

Text this regex should not modify:
  1. 3/80
  2. 13/8

Unicodify 5/8

[edit]
Description
Replace 5/8 with the Unicode symbol ⅝. Support for eighths is better than fifths or sixths, so this one is probably safe to use.
Find
\b5/8\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

5/8

Text this regex should not modify:
  1. 5/80
  2. 15/8

Unicodify 7/8

[edit]
Description
Replace 7/8 with the Unicode symbol ⅞. Support for eighths is better than fifths or sixths, so this one is probably safe to use.
Find
\b7/8\b
Replace with
Regular expression? Case sensitive?
Y N/A
Text this regex should modify: Intended result:

7/8

Text this regex should not modify:
  1. 7/80
  2. 17/8

En dash

[edit]
Description
Replace &ndash; HTML entity with the Unicode symbol –.
Find
&ndash;
Replace with
Regular expression? Case sensitive?
N N
Text this regex should modify: Intended result:

Em dash

[edit]
Description
Replace &mdash; HTML entity with the Unicode symbol —, and remove spaces from around em dashes.
Find
[ \t]*(?:—
Replace with
Regular expression? Case sensitive?
Y N
Text this regex should modify: Intended result:
  1. The em dash indicates a parenthetical thought — like this one — or some similar interpolation.
  1. The em dash indicates a parenthetical thought—like this one—or some similar interpolation.

Superscripts

[edit]

Please read this section of the Manual of Style on Mathematics before using these regular expressions. If the article you are editing uses higher powers as well, use <sup></sup> tags, because these Unicode symbols will not match superscripts for higher numbers. If the article only contains ² and ³, and will never contain higher powers, using Unicode symbols can be more compact and easier to understand. An article completely unrelated to mathematics which happens to include an area in km² has no need to support higher powers.

<sup>2</sup>

[edit]
Description
Replace <sup>2</sup> with the Unicode symbol ².
Find
<sup>2</sup>
Replace with
²
Regular expression? Case sensitive?
N N
Text this regex should modify: Intended result:

2

²

<sup>3</sup>

[edit]
Description
Replace <sup>3</sup> with the Unicode symbol ³.
Find
<sup>3</sup>
Replace with
³
Regular expression? Case sensitive?
N N
Text this regex should modify: Intended result:

3

³

Other Unicode

[edit]

Use times sign instead of x

[edit]
Description
Use the Unicode times symbol instead of the letter x for multiplication, and provide correct spacing. Some WikiProjects prefer the letter x for ease of entry; make sure you don't step on anyone's toes.
Find
(\d)\s*[x×]\s*(\d)
Replace with
$1 × $2
Regular expression? Case sensitive?
Y N
Text this regex should modify: Intended result:
  1. 6 x 5-inch guns
  2. 12×21" torpedo tubes
  3. 9 X 200-psi boilers
  1. 6 × 5-inch guns
  2. 12 × 21" torpedo tubes
  3. 9 × 200-psi boilers