Jump to content

User:Pier4r/ProposedContributions

From Wikipedia, the free encyclopedia

Overview

[edit]

Proposed contributions (mostly to talk pages) that got removed, so, instead of making an edit war (that for me takes only energy and is a unproductive. Especially when one removes first instead of talking), I prefer to put those here an maybe link them from the talk page.

For the ones wanting to surf the wiki through random articles

[edit]

I have made a collection of script that first collects the list of wiki pages and then generates random links every day. More info:

  • To collect wiki links
#!/bin/sh

#Do not forget to disable word wrap
#else you get with some long lines.

#####
#const
true_value=0
false_value=1

base_url="http://c2.com/cgi/wiki?"
wiki_page_extension_url="WardsWiki"
temporary_dl_filename="temp.html"
temporary_filename="temp2.txt"

working_dir=/tmp/wards_wiki_index
downloaded_pages_dir=${working_dir}/downloaded
cache_dir=${working_dir}/cache
cache_wiki_url_dir=${cache_dir}/links
cache_wiki_urls_file=${cache_dir}/to_check_links.txt
wiki_camelcase_urls_dir=${working_dir}/wiki_links
wiki_downloaded_urls_file=${wiki_camelcase_urls_dir}/wiki_links.txt

seconds_between_downloads=5

#####
#input var
logical_name_dir="not used"
if test -z "${logical_name_dir}" ; then
  : #nop
fi

#####
#var
wiki_urls_to_check=1
checked_wiki_urls_line=1
wiki_dl_errors=1
seconds_now=0
seconds_last_download=$( date +"%s" )

#####
# functions

download_web_page(){
  #parameters
  local web_page_url=${1}
  local output_file_path=${2}
  #internal
  local was_successful=${true_value}

  for counter in $(seq 1 3); do
    wget "${web_page_url}" -O ${output_file_path}  --limit-rate=1k
      #because wardswiki does not like too many wget too quickly
      #let's slow down the download.
    was_successful=$?
    if test ${was_successful} -eq ${true_value} ; then
      break
    fi
    sleep 2
  done

  if test ${was_successful} -ne ${true_value} ; then
    echo "error in downloading"
    echo "${web_page_url}"
    exit 1
  fi
}

#####
# script
mkdir -p ${working_dir}
mkdir -p ${downloaded_pages_dir}
mkdir -p ${cache_dir}
mkdir -p ${cache_wiki_url_dir}
touch ${cache_wiki_urls_file}
mkdir -p ${wiki_camelcase_urls_dir}
touch ${wiki_downloaded_urls_file}

cd ${cache_dir}

#####
#gets the pages, analyze them, get the links and further pages.
while test ${wiki_urls_to_check} -gt 0 ; do
  if test $( grep -c "^${wiki_page_extension_url}\$" "${wiki_downloaded_urls_file}" ) -eq 0 ; then
    #test if the wiki page was already visited or not, if not, continue
    #in terms of load this has not a big impact, because the file will have a maximum
    #of 50'000 lines that are no so big for, let's say, an asus 904hd with
    #celeron 900 and a not fast hd.

    seconds_now=$( date +"%s" )
    if test $( expr ${seconds_now} "-" ${seconds_last_download} ) -lt ${seconds_between_downloads} ; then
      sleep ${seconds_between_downloads}
    fi
    seconds_last_download=$( expr ${seconds_now} "+" ${seconds_between_downloads} )
      #before the download that could require a lot of time.
      #calling again 'date' does not always work. Dunno why.
    download_web_page "${base_url}${wiki_page_extension_url}" ${temporary_dl_filename}

    # grep -o 'title.*/title' wardsWiki | cut -c 7- | cut -d '<' -f 1
    # grep -o -E 'wiki\?[A-Z][a-zA-Z0-9]*' wardsWiki
    wiki_page_title=$( grep -o 'title.*/title' ${temporary_dl_filename} | cut -c 7- | cut -d '<' -f 1 )
      #grabbing the content within '<title>Camel Case </title>

    if test -z "${wiki_page_title}" ; then
      wiki_page_title="wiki_dl_errors.${wiki_dl_errors}"
      let wiki_dl_errors+=1
    fi

    cp ${temporary_dl_filename} "${downloaded_pages_dir}/${wiki_page_title}.html"
      #copy the page to the 'downloaded page' with the title name

    echo "${wiki_page_extension_url}" >> "${wiki_downloaded_urls_file}"
      #save the wiki link name as downloaded

    #save the wiki_urls in the page
    grep -o -E 'wiki\?[A-Z][a-zA-Z0-9]*' ${temporary_dl_filename} | cut -d '?' -f 2 > ${temporary_filename}
      #grabbing something like 'wiki?CamelCase

    #the following part could be compressed in a single 'cat >>'
    #but the impact for now is lower than other statements.
    while read wiki_link_line ; do
      echo "${wiki_link_line}" >> "${cache_wiki_urls_file}"
      let wiki_urls_to_check+=1
    done < ${temporary_filename}
  fi

  #get the next page to visit
  echo "${wiki_page_extension_url}" >> "${cache_wiki_urls_file}.checked"
    #put the checked line in a 'check' file db.
  wiki_page_extension_url=$( head -n 1 "${cache_wiki_urls_file}" )
    #extract the new line to check
  let wiki_urls_to_check-=1
  tail -n +2 "${cache_wiki_urls_file}" > "${cache_wiki_urls_file}.tailtmp"
    #http://unix.stackexchange.com/questions/96226/delete-first-line-of-a-file
    #remove the next page to visit from the remaining list
  mv "${cache_wiki_urls_file}.tailtmp" "${cache_wiki_urls_file}"


  #if test -z "${wiki_page_extension_url}" ; then
    #if not wiki pages are retrieved, it means that we are finished
    #even in case of loops, since we checked the url at least once
    #it will get deleted and never downloaded again.
    #Other pages can readd it, but it will be bypassed.
  #  wiki_urls_to_check=0
  #fi
done

<<documentation

Todos {
  - gets a wiki page, extract useful links according to observed patterns,
    then continue the explorations.
}

Assumptions {
  - wards wiki links have the characteristic part 'wiki?CameCaseName'
}

Tested on {
  - cygwin on win xp with busybox interpreter
}
documentation


  • To create random pages
#!/bin/sh
# actually i should use env
# documentation at the end

#######
# constants
set -eu
  # stop at errors and undefined variables
wiki_links_filepath='/sda3/c2_wiki_links/wiki_links.txt'
file_lines_num=36631
random_links_number=20
  # note that repetitions can appear

html_result_random_links_filepath='/sda3/www/pier_pub/c2_wiki_links/random_links_c2wiki.html'

c2_wiki_base_url_string='http://www.c2.com/cgi/wiki?'

double_quote_string='"'

paragraph_open_html_string='<p>'
paragraph_closed_html_string='</p>'

hyperlink_closed_html_string='</a>'

#######
# variables
awk_command=""
wiki_page_selected_string=""
wiki_url_string=""


#######
# functions

generate_random_line_numbers() {
  awk_command='
    BEGIN{
      srand();
      for (i=0; i < draws; i++) {
        print( int(max_num*rand() ) + 1 );
      }
    } '
  awk -v draws=${random_links_number} -v max_num=${file_lines_num} "${awk_command}"
}

#######
# script

#clear the previous file
echo "" > "${html_result_random_links_filepath}"

#fill the file with new random links
for line_num in $( generate_random_line_numbers ) ; do
  # not efficient but first effective then efficient
  wiki_page_selected_string=$(awk -v line_number=${line_num} 'NR==line_number' "${wiki_links_filepath}")
  wiki_url_string="${c2_wiki_base_url_string}${wiki_page_selected_string}"
  echo "${paragraph_open_html_string}<a href=${double_quote_string}${wiki_url_string}${double_quote_string}>${wiki_page_selected_string}${hyperlink_closed_html_string}${paragraph_open_html_string}" >> "${html_result_random_links_filepath}"
done

<<documentation

Purpose {
- given a file with page names of the c2.com wiki
  (the original wiki), with the page names in
  camel case (see other mini project to download the wards wiki).

  Create a random selection of those every day to use to navigate that
  wiki, that for me has a lot of interesting "frozen discussions",
  in a random way. Since if i navigate according to interests i need
  to rely heavily on bookmarks and after a while i need organization
  that is not easy to achieve on devices like the nook.
}

Tools used {
- written in vim on openwrt 12.09
  over an asus 500 v2
  Normally i would have used a windows notepad using winscp
  but training vim skills is useful, not only to appreciate
  it but to use it better in case of need and maybe
  to consider it as new main lightweight plain text / code editor
- for a more efficient version i may use directly awk, because in busybox
  or bash i coordinate optimized tools but when i use mostly one optimized
  tool i may use only that instead of using a uneeded wrapper.
}

Notes {
- i really have to write a function, and then maintain it, to
  generate arbitrary long random integer number from
  /dev/urandom . Could be that someone did it already
  and just maintaining this type of stuff will take a lot
  of time (during the years) for a person like me,
  but really having /urandom almost everywhere and
  not having a function to quickly copy to use it
  is annoying. I used a very approximate function
  in the past but i need to improve it and i do not
  want to use it now.

  i will use the srand from awk (thus loading the system
  with a lot of small processes starting and stopping)
  but i have to be careful on the seed to use.
}

documentation