Wikipedia talk:WikiProject Oregon/Readership
Data extraction technique
[edit]Thanks to instigation by Pete, this script creates the table on this article.
#!/bin/bash wpor() { wget http://stats.grok.se/en/$2/$1 -O - 2>/dev/null | \ grep " has been viewed " | \ sed 's#.*</a> has been viewed \([0-9]*\).*#\1#;' } echo '{| class="wikitable sortable"' echo '! article !! importance !! rating !! Dec 2007 !! Jan 2008 !! Feb 2008 !! Mar 2008' dates="200712 200801 200802 200803" while read article importance rating do x="" for month in `echo $dates`; do y=$(wpor $article $month) x="$x || $y" done echo "|-" echo "| [[$article]] || $importance || $rating $x" done echo "|}"
It is fed input which came from the table generated by the automatic rating thingy which appears on the project page. I got the data from there, but for the life of me I can't figure out where that is now. The beginning of the data looks like
Oregon_State_Capitol Top FA 1980_eruption_of_Mount_St._Helens Mid FA 1984_Rajneeshee_bioterror_attack Mid FA D._B._Cooper Mid FA
It was mildly reformatted from the magically generated article. —EncMstr (talk) 03:55, 21 April 2008 (UTC)
June 2008 updates
[edit]summary
[edit]This is a summary of the steps detailed below which create an update of this (Readership) page:
- Edit Wikipedia:Version 1.0 Editorial Team/Oregon articles by quality/1.
- Copy and paste the wikitext into Vim hosted on Linux
- Execute the search and replace command (below), change "^I" to tab characters if necessary
- Remove header and trailer lines
- Save the resulting data as "file"
- Execute the script below, saved as "wpor", with
./wpor <file >result
- Copy and paste "result" into the article. Preview, then fix any UTF-8 character problems revealed as redlinked articles
gory detail
[edit]The format of the article containing assessments Wikipedia:Version 1.0 Editorial Team/Oregon articles by quality/1 has changed. The wikisource of that article is trimmed to exclude the header and trailer text, then fed through these vim commands to produce the article table (which is demonstrated above):
:%s/{{assessment | page=\[\[\(.*\)]].*importance={{\(.*\)-Class.*class={{\(.*\)-Class.*/\1^I\2^I\3 (for most entries) :%s/^{{assessment | page=\[\[\(.*\)]].*class={{\(.*\)-Class.*/\1^I#na^I\2 (for unknown importance entries)
The first vim command transforms a line like
{{assessment | page=[[Berkeley Lent]] [http://en.wikipedia.org/w/index.php?title=Berkeley_Lent&oldid=135741266 ] | importance={{Mid-Class}} | date=June 4, 2007 | class={{Start-Class}} | version= | comments= }}
into
Berkeley Lent Mid Start
The second command transforms a line which has ... | importance= | date=
... into
Berkeley Lent #na Start
The result is fed into the script below as stdin (that is, < file
):
#!/bin/bash wpor() { wget http://stats.grok.se/en/$2/$1 -O - 2>/dev/null | \ grep " has been viewed " | \ sed 's#.*</a> has been viewed \([0-9]*\).*#\1#;' } declare -a monthlist monthlist=(200712 200801 200802 200803 200804 200805) n=${#monthlist[@]} declare -a coltotals echo '{| class="wikitable sortable" style="text-align:right"' echo '! article !! importance !! rating !! Dec 2007 !! Jan 2008 !! Feb 2008 !! Mar 2008 !! Apr 2008 !! May 2008 !! Total' for (( m = 0; m < n; ++m )); do coltotals[$m]=0 done rowcount=0 while IFS=$'\t\n' read article importance rating do wikiarticle=`echo $article | tr " " "_"` #echo "article $article=$wikiarticle, importance $importance, rating $rating" x="" linetot=0 for (( m = 0; m < n; ++m )); do y=$(wpor $wikiarticle ${monthlist[$m]}) : $((linetot = linetot + y)) x="$x || $y" coltotals[$m]=$(( coltotals[$m] + y )) done echo "|-" echo "| [[$article]] || $importance || $rating$x || $linetot" : $((rowcount = rowcount + 1)) done linetot=0 x="" for (( m = 0; m < n; ++m )); do y=$(( coltotals[$m] )) : $((linetot = linetot + y)) x="$x || $y" done echo "|-" echo "| __Total__ $rowcount articles || || $x || $linetot" echo "|}"
This script is based on the old one, but calculates row and column totals. Also, its output is directly suitable for inclusion, whereas the old one needed some text tweakings. The only current glitch is that some extended UTF-8 characters are munged, about 6 article names presently. —EncMstr (talk) 21:14, 3 June 2008 (UTC)
Kudos!
[edit]Wow EncMstr, thanks for the major expansion of data, and all the documentation of how you did it! -Pete (talk) 23:59, 3 June 2008 (UTC)