User talk:Gnosygnu/Archives/2017/August

This is an archive of past discussions about User:Gnosygnu. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Xowa sql to html

I've been looking for almost a year for an easy solution to building an offline static, bound, HTML version of wiki sites, eg en.Wikipedia. What I'm looking for is the old wikimedia style bound HTML directory tree. Html-images/html-a/b/c etc. I know it's possible. And so far your use of an sql ball looks best for this use.
my intention is to build an HTML directory from a relatively recent version of the complete en.wikipedia and a few other "fandom" media wikis including images and then batch the HTML to ePub where I could then build an epdb file, a library of linked ebooks. I've figured out the HTML to ePub side and have it working for other site dumps.
My stumbling block has been getting a Wikipedia dump from the dump files (xml/sql/etc) into useable HTML. Any suggestion?Lostinlodos (talk) 14:59, 23 August 2017 (UTC)

Hey, sorry for the late reply. There isn't an easy way to do a mass extraction of the HTML in XOWA's SQL database. You basically have one of two approaches:

Use the command-line option to get the HTML. For example:

java -jar xowa_windows_64.jar --show_license n --show_args n --app_mode command --cmd_text "app.shell.fetch_page('en.wikipedia.org/wiki/Earth', 'html');"

Write some Java code around the XOWA jar to extract the pages. See http://xowa.org/home/wiki/Dev/Parser/Embeddable

Let me know if you need more information on either approach. Thanks! gnosygnu 00:59, 25 August 2017 (UTC)