User:RexxS/GCI-2018-Task07
Lua Task 7 - Date formatting
[edit]Prerequisite: Lua Task 6 - MediaWiki libraries. This task requires a lot of research and independent learning and is considerably more difficult than the introductory six tasks. It is not suitable for beginners to programming, although students new to Lua with previous experience in other programming languages should be able to produce acceptable work.
Background
[edit]On the English Wikipedia, we find 5 types of allowed date formats:
- "dmy" – e.g. 31 August 2013
- "mdy" – e.g. August 31, 2013
- "iso"-style – e.g. 2013-08-27
- year – e.g. 2013
- month and year – e.g. August 2013
Text that is imported from outside of Wikipedia may be in any of large number of formats, or sometimes malformed but understandable. There is often a need to take a piece of text and extract a date from it, displaying it in the required format.
Examine the table below:
Text | Format | Date |
---|---|---|
31 august 2013 | 31 August 2013 | |
31 August 2013 | mdy | August 31, 2013 |
August 27, 2013 | iso | 2013-08-27 |
31 August 2013 (uncertain) | year | circa 2013 |
31 August 2013 | iso | 2013-08-31 |
29 February 2004 (uncertain) | mdy | circa February 29, 2004 |
29 February 2005 (uncertain) | mdy | Invalid entry |
27/08/2013 | 2013-08-27 | |
2013-08-27 | mdy | August 27, 2013 |
2013 (uncertain) | circa 2013 | |
27 | 27 | |
27 December | 27 December | |
27 2017 | 2017 | |
sometime around 27th August 2013 | circa 27 August 2013 | |
on the 16th of December in the year of our Lord 1770 | 16 December 1770 | |
99 red balloons | 99 | |
20/04/2013 | mdy | April 20, 2013 |
sometime around 3rd August 2013 | circa 3 August 2013 | |
31 August 103 AD | 31 August 103 AD | |
31 August 2013 BC | 31 August 2013 BC | |
31 August 2013 BCE | 31 August 2013 BCE | |
31 August 103 CE | 31 August 103 CE | |
2013-08-31 | 2013-08-31 | |
31 August 213 | 31 August 213 | |
213 | 213 | |
31 August 13 | 31 August 13 | |
31 August 13 BC | 31 August 13 BC | |
30 BCE | 30 BCE | |
3 may 2017 | 3 May 2017 | |
3 Jan 2017 | 3 January 2017 | |
3 jan 9 AD | 3 January 9 AD | |
31 February 2013 | mdy | Invalid entry |
the quick brown fox | Invalid entry | |
4 and 20 blackbirds ... | Invalid entry |
Notes:
- The first column has examples of text that may contain dates.
- The second column contains the requested formats (from a parameter like
|format=mdy
). If there is no format parameter given, then the date output attempts to match the format of the text supplied. - The third column shows the expected output. Certain words, e.g. "around", "uncertain" indicate that the date is approximate; we normally add "circa" before approximate dates.
Requirements
[edit]This task requires you to create your own function which can take text such as may be found in the first column and an optional format parameter. It will output a date either in the requested format or in a format matching that of the text supplied. You should test your function against all of the text shown in the table above, at least.
To complete this task you will need to make use of the techniques you learned in the first six tasks, as well as doing further research on string-handling functions and patterns, and possibly making use of other libraries.
You must work in a fresh module sandbox and user sandbox. If I were doing the task, I would use Module:Sandbox/RexxS/Dates and User:RexxS/Sandbox/Dates.
Meeting the requirement to add "circa" may prove to be difficult, so get a function working without it to start with, and consider adding it in later if time permits. If you can get a function working without it, that is the minimum needed to successfully complete the task if you have no more time, but please try to accomplish the entire task if time permits.
Hints and tips
[edit]- A. Plan your work in two parts: (i) extracting day, month, year and format from the text; and (ii) validating and re-assembling those four values into the required format.
- B. You already know how to extract a simple dmy or mdy date from Task 5. Work out how to match an ISO-style date. Try these first, and only look at more complicated routines if they don't return a date.
- C. One technique is to have a list of month names and to look for those and their three-letter abbreviations.
- D. You may assume that if you can match three numbers or two numbers and a month name in the text, then there is probably a date. Numbers more than 31 can only be years; numbers greater than 12 could month or year; smaller numbers could be any of day, month or year.
- E. You may wish to use the os.date function to validate dates or you may prefer to write your own routines.