Template talk:Strlen quick

This is the discussion/talk-page for: Template:Strlen_quick.

Created

The fast string-length counter, Template:Strlen_quick, was created by long-term user Wikid77 on 30 January 2011, to provide a very fast string-length template, optimized for improved performance with actual Wikipedia data. It is also optimized to use limited wiki-markup resources in the NewPP MediaWiki preprocessor, by using expansion depth of only 5 levels, rather than 9-to-14 levels used by other string-length templates. -Wikid77 10:09, 30 January 2011 (UTC)[reply]

Optimizing for actual string lengths

30-Jan-2011: The Template:Strlen_quick was created, as a faster alternative to {str_len}, by optimizing for real string data as used in articles. Using the actual string searches, from existing Wikipedia articles, it is possible to determine the most-likely string lengths, such as 17/18 characters for titles. Then, optimize to match those lengths faster: for example, suppose the top 1,000 articles all used an infobox code of 9 letters, in that case, checking for length 9, first, could avoid checking other lengths. In the case of 353,000 articles using {{Italic_title}}, the string lengths range from 2-99 letters, with the most-common lengths between 16-19 long, and 88% of all titles < 30 long. The distribution of lengths of titles has been as follows:

84% > 10, 12% < 10, 51% in 10-19, 25% in 20-29, 7% in 30-39, 1.7% in 40-49, 0.6% >50.

For lengths 0-9, the increase is dramatic: almost no titles are 1 or 2 characters, a few are 3, some are 4, then more have lengths 5, 6, 7, 8, with 9 as 19x times more common than length 3. In trying to match title-length quickly, then check for the most-common first, as length 9-to-1 in reverse order.
Among lengths 10-19, the most common are at 17/18, then fewer when farther away, with 10 being the least-frequent length among those. Above 20, the lengths decrease in frequency, 21-to-29, as the reverse of 9-1, so checking 21, first, is 3x times more likely to match than 29. Among 30-39, the titles are quite rare, with 31 being as rare as length 5, and 39 being 3x times more rare, as occurring only 43-per-10,000 titles. By optimizing for the actual lengths of titles, those lengths can be matched perhaps twice as quickly. A pure binary search would give unfair advantage to rare lengths, so the string-search should be prioritized in favor of the more common lengths.

The markup logic, below, uses prioritized steps (the actual markup handles length over 70):

LOGIC to match 1-to-60 lengths in order of most common real data:
{{
#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|20}}
| {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|30}}
  | {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|40}}
    | {{#switch: x{{{1}}}
      | {{padleft:|41|x{{{1}}}}} = 40
      | {{padleft:|42|x{{{1}}}}} = 41
      | {{padleft:|43|x{{{1}}}}} = 42
      | {{padleft:|44|x{{{1}}}}} = 43
      | {{padleft:|45|x{{{1}}}}} = 44
      | {{padleft:|46|x{{{1}}}}} = 45
      | {{padleft:|47|x{{{1}}}}} = 46
      | {{padleft:|48|x{{{1}}}}} = 47
      | {{padleft:|49|x{{{1}}}}} = 48
      | {{padleft:|50|x{{{1}}}}} = 49
      | {{padleft:|51|x{{{1}}}}} = 50
      | {{padleft:|52|x{{{1}}}}} = 51
      | {{padleft:|53|x{{{1}}}}} = 52
      | {{padleft:|54|x{{{1}}}}} = 53
      | {{padleft:|55|x{{{1}}}}} = 54
      | {{padleft:|56|x{{{1}}}}} = 55
      | {{padleft:|57|x{{{1}}}}} = 56
      | {{padleft:|58|x{{{1}}}}} = 57
      | {{padleft:|59|x{{{1}}}}} = 58
      | {{padleft:|60|x{{{1}}}}} = 59
      | #default= 60 <!--when >= 60 and none of the above-->
      }}<!--endsw 40's++ -->
    | {{#switch: x{{{1}}}
      | {{padleft:|31|x{{{1}}}}} = 30
      | {{padleft:|32|x{{{1}}}}} = 31
      | {{padleft:|33|x{{{1}}}}} = 32
      | {{padleft:|34|x{{{1}}}}} = 33
      | {{padleft:|35|x{{{1}}}}} = 34
      | {{padleft:|36|x{{{1}}}}} = 35
      | {{padleft:|37|x{{{1}}}}} = 36
      | {{padleft:|38|x{{{1}}}}} = 37
      | {{padleft:|39|x{{{1}}}}} = 38
      | #default= 39
      }}<!--endsw 30's-->
    }}<!--endifeq 40-->
  | {{#switch: x{{{1}}}
    | {{padleft:|21|x{{{1}}}}} = 20
    | {{padleft:|22|x{{{1}}}}} = 21
    | {{padleft:|23|x{{{1}}}}} = 22
    | {{padleft:|24|x{{{1}}}}} = 23
    | {{padleft:|25|x{{{1}}}}} = 24
    | {{padleft:|26|x{{{1}}}}} = 25
    | {{padleft:|27|x{{{1}}}}} = 26
    | {{padleft:|28|x{{{1}}}}} = 27
    | {{padleft:|29|x{{{1}}}}} = 28
    | #default= 29
    }}<!--endsw 20's-->
  }}<!--endifeq 30-->
| {{#ifeq: x{{{1}}}|x{{padleft:{{{1}}}|10}}
  | {{#switch: x{{{1}}}
    | {{padleft:|18|x{{{1}}}}} = 17
    | {{padleft:|19|x{{{1}}}}} = 18
    | {{padleft:|17|x{{{1}}}}} = 16
    | {{padleft:|20|x{{{1}}}}} = 19
    | {{padleft:|16|x{{{1}}}}} = 15
    | {{padleft:|15|x{{{1}}}}} = 14
    | {{padleft:|14|x{{{1}}}}} = 13
    | {{padleft:|13|x{{{1}}}}} = 12
    | {{padleft:|12|x{{{1}}}}} = 11
    | #default= 10 <!--when >= 10 and none of above-->
     }}<!--endsw 10's++ -->
  | {{#switch: x{{{1}}}
    | {{padleft:|10|x{{{1}}}}} = 9
    | {{padleft:|9|x{{{1}}}}} = 8
    | {{padleft:|8|x{{{1}}}}} = 7
    | {{padleft:|7|x{{{1}}}}} = 6
    | {{padleft:|6|x{{{1}}}}} = 5
    | {{padleft:|5|x{{{1}}}}} = 4
    | {{padleft:|4|x{{{1}}}}} = 3
    | {{padleft:|3|x{{{1}}}}} = 2
    | #default= 1
    }}<!--endsw 1's-->
  }}<!--endifeq 10-->
}}<!--endifeq 20-->

Tests of the above code show that it, in fact, processes actual title lengths about 2x times (twice) as fast as the binary-search markup logic which has been used in template {{str_len}}. -Wikid77 10:09, 30 January 2011, revised 01:21, 22 February 2011 (UTC)[reply]

Zero length string returns length=1

testing:

{{Strlen quick|aaa}} → 3
{{Strlen quick|aa}} → 2
{{Strlen quick|a}} → 1
{{Strlen quick|}} → 0
{{Strlen quick|1=}} → 0
{{Strlen quick}} → 0

I think the last three are in error. -DePiep (talk) 07:54, 15 June 2012 (UTC)[reply]