User:Chicagohil/sandbox

Call Details

Date: 2023-03-21
Topic: LD4P3 (Linked data for production: closing the loop) team presenting on their integration of Wikidata information around musical works into a Cornell library catalog prototype with a focus on the Wikidata aspects of work
Presenters: Huda Khan, Cornell University; Steven Folsom, Cornell University, Kevin Kishimoto, Stanford University, Astrid Usong, Stanford University

Presentation Materials

Notes

Wizard of Oz example as theme! Is a journey taken with brain, heart & nerve

Background

LD4P3: Closing the loop: aims to create a working model of a complete cycle for library metadata creation, sharing & reuse.
Discovery:
- Using linked data to support & enhance discovery
- Work on integrating into production
- BAM-WOW: Trying to leverage work of MLA LInked Data Working Group
- Capturing thematic catalog identifiers in Wikidata: information not usually found in catalogs.

Music Library Association LInked Data Working Group (LDWG):

Emphasis on exploring & experimenting with linked data
15 members, mostly MLA music catalogers
Goals include
- build practical skills
- learn & understand concepts & theory
- build connections
Initially focused on BIBFRAME but then branched out into Wikidata
Projects focused on individual or institutional interest

One project: thematic catalog number concordance in Wikidata

Thematic catalog is a music reference book that aims to be a comprehensive list of composer’s works–like catalogues raisonnés in art
Each work includes other info, such as historical information, musical characteristics
Most important: thematic authors often assign a number (identifier) to each work
Different authors assign numbers that do no line up
Antonio Vivaldi
- wrote more than 800 compositions and most instrumental (500 concertos with title concerto and 100 sonatas with title sonata)
- Different catalogs for Vivaldi have different numbering depending on who published
Example of one work: Vivaldi, Antonio, $d 1678-1741. $t Estro armonico. $n N. 6 (uniform title). Known by many designations! (op. 3, no. 6 ; RV 356, etc.)
Title pages for the same work show differing numbering systems–a frustration of many music catalogers!
Needed way to look these up easily in a structured data manner, so can use them easily

Data Harvesting & conversion to Wikidata

Found list of Vivaldi works by RV on Internet: copy & paste into spreadsheet #1
Searched id.loc.gov for works by Vivaldi and exported results (Atom to CSV)
Batch searched LCCNS in OCLC Connexion & export authority records
Import batch NAR file into MARCedit & converted fields/subfields into spreadsheet (spreadsheet #2)
Queried Wikidata to find what items existed for Vivaldi works and exported to a spreadsheet (spreadsheet #3)
Imported thee spreadsheets into OpenRefine to join data from on various matchpoints
Humans verified/corrected data in final spreadsheet
Created/edit Wikidata items using OpenRefine Wikidata extension (most of them didn’t have items)
Wikidata item for Violin concerto in A minor (RV 356): https://www.wikidata.org/wiki/Q116050394
Started working from spreadsheet from Kevin
Screenshots from catalog grabbing some information from Wikidata and supplementing the catalog
- Included works have work info buttons next to them that give you knowledge panel with some data–can click through to author/work browse page
- How does this work?
  - Solr index that sits behind the catalog has a field where you can obtain the author title headings associated with an item.
  - Then query id.loc.gov with LCCN to find Wikidata entity
  - On Wikidata works, have catalog codes number and which edition it is in
  - Path to choose number: LCCN → Qnumber → P528 (catalog code) → PS: P528 catalog code (414)
  - Path to choose label: LCCN → Qnumber → P972(catalog code) → Label

Usability testing: Process

Goals:
- Design for incorporation information
- Usefulness of properties
Participants: 5 total (3 grad students, 2 staff members)
Timeline: December 2022
Think alouds with feedback questions
Virtual
Given tasks to find specific information

Usability testing: Outcomes:

Very much appreciated & received
Easy to find included work knowledge panels
Wanted identifying info & labels higher up

Usability testing: Wikidata properties

Which would you find interesting?
- Catalog numbers (all)
- Instrumentation (three)
- Librettist (three)
- Tonality (two)
- Opus (one)
Caveat: not a survey, but relied on participants’ memories but provide starting points to ask more questions

Lessons & questions: Design

Generating use cases: Would be great to display information, but what about search?
- Would require more focus and work: entire design/indexing work cycle on its own
- Generic (catalog numbers very helpful) vs distinctive title (incorporating multiple languages would be useful)
- Typical design: bio panel and then holdings to the right
  - Existing author buttons are driven by presence of an authority record, then they can look for equivalencies in external data
- Did design brainstorming
  - Add expandable section for each included work
  - Inline option?
  - Landed on work info button

Lessons & questions: Models

Prototype makes it look easy, but is jumping across multiple sources of information.
- Library catalog item
- LC item
- Wikidata item
There’s often not a one-to-one correspondence while jumping to multiple data sources: how to deal with discrepancies
Tag your WEMI levels: Follow the yellow brick road…To Where?

Lessons & questions: Data

Data connections are like yellow brick road
“Selections” in music uniform title is often a catch-all: a bucket that doesn’t always map to LC heading.
Goal: Find points of connections
Somewhere over the rainbow…

BAMWOW into production: models

Everything seen here is in prototype and the next step is to bring it in production
Is there ever a need for a work info button for the main entry or is inline integration preferable?
If in-line is preferred, should we commit to sorting fields into a WEMI-like order?
What will happen when expanding to non-music works (such as Wizard of Oz uniform title example)

BAMWOW into production: Data Quality

Catalogs are built in ways to disallow connections
Wikidata has qualifiers on many statements that we may want to take advantage of
Trying to figure out how to drop questionable statements if there are constant violations

Questions

Is it accessing Wikidata dynamically?
- Yes, it is accessing Wikidata dynamically. No Wikidata is stored in Cornell’s catalog; when page is rendered, there’s a call to Wikidata and it’s brought into the page
Have you determined the core set of statements for sheet music, in this case, Vivaldi? Every work item should/must have? Have you considered when you have the physical sheet in the collection to add a statement to highlight Cornell library’s collection? Or is this not a focus of the project?
- We’re not focusing on published manifestations, but rather the intellectual works themselves
Have you run up against inconsistently used qualifiers or properties in Wikidata? Did that create challenges for querying?
- Yes! Some of the inconsistencies we ‘correct’ and others we just leave and add our own data. Often the inconsistencies are misinterpretations of the properties constraints
- Regarding the catalog numbers, the most inconsistent thing is which prefix is used. Our group is trying to use a language-agnostic form–take the number that is in the book (in most cases)
Is the question of useful information solely based on the question of searching/identifying for the work? I find all of the information useful, though not always necessary for searching
- Regarding Wikidata properties that were chosen, we basically had a group discussion in one of our LDWG meetings: “Which Wikidata properties would be cool to add to a MARC-based catalog?” These choices were based on properties we were using in our own project
- Don’t intend to be any sort of authority on topic, but are what they came up with
Can you show again how your catalog displayed the Wikidata info retrieved by the query?
How was the info button inserted in your catalog? Is it on the discovery layer only?
- It’s client-side code in the prototype, so yes. It checks for information in the Solr index and then matches with the included works list and places it appropriately
Did the question arise of what people want a library catalog to do? Some of these examples suggest to me that a kind of mini-Wikipedia page would be more useful. (And is my question constrained by the relatively primitive nature of current library catalogs?)
- A lot of what we focused on is identifying/disambiguation use cases, but there are some properties that lean more toward a broader context for the work
Some of these attributes are also in MARC authority records. Are you just ignoring them if they are there? What if they are there and not in Wikidata?
- Prototype just displays everything right now, but we considered similar questions fr Discogs. In that case, the catalog actually checks for equivalent fields that are already populated, and only shows supplemental Discogs info when those fields are not populated
- There’s also Syndetics info in the Cornell catalog, it’s interesting how much smarter we can be with the Discogs vs. that
Would love to know how you got Discogs in there!
- Documentation!!

Call details

Date: 2021-02-23
Topic: Adding Bibliographic Data to Wikidata
Presenters: Jason Evans, Wikimedian in Residence at the National Library of Wales

Presentation Materials

Library data as linked open data (Article)
Slides
Some queries used for visualisations -
- https://w.wiki/4u6 - subject by genre (Peniarth MSS collection)
- https://w.wiki/4u7 - scribes connections (Peniarth MSS collection)
- http://tinyurl.com/y74vkfuw - University of Wales press books on Wikidata
- https://w.wiki/FbG - Publisher works count

Notes

Bibliographic/book data into Wikidata
- At National Library of Wales for ~6 years
- Started sharing data for artworks, artists, etc.
- In 2019 had an opportunity to share catalog data
- Trial on a sizeable scale to determine what is possible
- Has about 100,000 items on Wikidata for items about or connected to the collection
- 50,000 items are about books in the Welsh bibliography
Example of book in bib in library catalog (MARC-21), fields and text strings
- Subjects and authors can link to authority files, but no guarantee it’s a unique item
- A lot of publishers (for example) are text strings, which need to be mapped to Wikidata “things”
- In Wikidata text strings => items (places, publishers, language, etc.) (example: Cardiff)
- Three editions in a library catalog are three records, no guarantee they’re presented the same way
- In Wikidata there is a central literary work, then editions or translations are separate items, and all editions connect to same data, connectivity and structured data compared to MARC-21 catalog records
Mapping Book Data
- Chose easiest data, 50,000 most popular authors
- Exported to CSV
- Needed to disambiguate authors and publishers
- Use OpenRefine to match authors, multiple ways to do this Also used OpenRefine to match authors to existing authors in Wikidata (names, titles, dates, etc. all possible matching points)
- VIAF etc. also allowed them to create items for authors
- Not possible to match for all authors, some items still have “author string”
- Then created Wikidata item for Works
- They then need to be connected to Editions/Translations/Etc.
- Did with a combo of uploading directly and QuickStatements to add additional data
- Lot of help from Simon Cobb, visiting Wikidata scholar
Challenges
- Finding information on authors and publishers
- Looking at national bibliography of Wales (lots of modern books, earlier books don’t have unique identifiers, obscure publishers, etc.)
- Finding good information challenging, focused on 100 most common publishers
- Potential copyright and license issues with catalog data
- Some catalogers nervous about re-using data purchased from OCLC or other sources
- Third party data may have copyright issues (gray area)
- There may also be licensing issues
- Scale of project, Welsh bibliography has 1,000,000+ works
- Data maintenance? How do we automate catalog updates => wikidata
- If we want to round trip that data how do we do it, and how do we monitor quality of data added in Wikidata
Benefits
- Having richer, more accurate, structured data
- Easily accessible and reusable data
- People can interact and explore a huge collection
- Connecting with and building a larger dataset
- Can crowdsource improvements to data to community
- Working towards Wikicite and sharing data
Some quite lovely diagrams and visualizations about publishers and publishing can be created
- Can begin to explore relationships between authors and items
- And authors/items/the world/history of publishing in Wales
Identifiers
- Can connect identifiers from different datasets
- One of the most powerful/useful things you can do, especially when advocating sharing data
- Useful for round-tripping, pulling identifiers back into own catalog
- The more institutions that match data to external datasets the more we can share and enrich catalogs
- Already seeing data being enriched in this way
- People adding identifiers
Wikidata is multi-lingual
- Can be very powerful for people working in a country with more than one language
- Encourages reuse of data
Potential future projects
- Recently created rich metadata for ~1000 manuscripts in a separate project
- Added subject and genre, which can allow visualization of manuscripts organized by subject and genre
- Shows how you could very powerful search and discovery tools by linking to entities for particular genres and subjects
- A lot of books shared on Wikidata will be digitized and OCR’ed
- Once you have OCR data you can use AI to determine things like subject and genre
- Can also pull out entities from text (names, places, events, etc.)
- National Library of the Netherlands has done this for their newspapers
- Text can then be tagged and connected to items and external identifiers
- Use of IIIF can allow you to overlay information onto text
- Structured data can transform how libraries look at data

Questions

Saving time in MARC to Wikidata workflows
- People want a programmatic way to do this, but creating mapping for authors without unique identifiers or works can be difficult. Make the initial cataloging as clean as possible (example: adding ISNI identifiers)
Is modelling the manuscript extensively (slide) labor intensive?
- Were able to semi-automatically take out names and match them to people, many already in Wikidata. Fairly labor intensive but there may be ways to automate much of the work to a good degree of accuracy. Would be tricky for a giant collection of books.
Any plans to apply process to materials beyond books?
- Always trying new datasets, discussing musical scores. Would love to do sound recordings or video, but you can’t share the actual recordings (likely to have copyright issues) which takes away some benefits of sharing data.
Any advice on thesis and subject headings in Wikidata?
- University of Edinburgh has shared a thesis collection (Ewan McAndrew)
- Adam: Looking at converting EDTs into Wikidata, both proposing new subjects to Library of Congress, and also creating Wikidata items (often already items, but creating when needed). Trying to figure if LCSH headings can be mapped, especially free floating subdivisions (use main part, use entire field). If Wikidata URIs can go in MARC fields, that would document exact Wikidata item).
- Library of Congress has done mapping work, some have links some don’t. So many subject headings don’t have items that could be created.
- National Library of Wales has volunteers tagging photographs, would be cleaner and easier with identifiers.
- When do you switch to Wikibase for things you can’t describe with Wikidata? Is this all scalable? Think about what you’re trying to achieve, are Wikibase or Wikidata the best option?
Advice for mapping books not in English?
- Tried to make sure the language was there, and labels were correct (English versus Welsh)
MARC to Wikidata mappings?
- Will do a lot of heavy lifting when it’s done, people are working on it. Universal mapping would be very useful, take care of basic stuff.
Any pushback from Wikidata folks?
- No pushback, asked in several areas if it would be acceptable to add that much information. No pushback or complaints, but uploads will get bigger and bigger and changes may be needed. One of the reasons this was done was to advocate for structured data generally and Wikidata can be used to take a sample and show how what can be possible for the future at a larger scale. That doesn’t mean everything in Wikidata, but it’s a fantastic showcase.
Are some of the visualizations online?
- Shared slides in the agenda, a couple may be on commons
Some charts could be interpreted as music
- Did a hackathon where someone turned data into music
Has anyone considered putting preferred terms (over problematic subject headings) in Wikidata?
- Not something Jason has had to deal with on Wikidata
- Jim: would be interested in collaborating on how to open up these preferred labels. It seems the lists for preferred labels are closed or internally managed currently.

Working on Alison Turnbull:

She studied from 1975-1977 at the Academia Arjona in Madrid, from 1977-1978 at the West Surrey College of Art and Design, and from 1978-1981 at the Bath Academy of Art in Corsham.^[1]

"Ten Dollar Faces: On Photographic Portraiture and Paper Money in the 1860s". History of Photography. 45 (1). 2021. doi:10.1080/03087298.2021.1989904. ISSN 0308-7298. Wikidata Q112147132.

Update the list now |SPARQL |Find images

This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
Edits made within the list area will be removed on the next update!

End of auto-generated list.

^ Jeffrey T. Williams; Kent E. Carpenter; James L. Van Tassell; Paul Hoetjes; Wes Toller; Peter Etnoyer; Michael Smith (21 May 2010). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLOS One. 5 (5). Bibcode:2010PLoSO...510676W. doi:10.1371/JOURNAL.PONE.0010676. ISSN 1932-6203. PMC 2873961. PMID 20505760. Wikidata Q15625490.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[Plos-1] Jeffrey T. Williams; Kent E. Carpenter; James L. Van Tassell; Paul Hoetjes; Wes Toller; Peter Etnoyer; Michael Smith (21 May 2010). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLOS One. 5 (5). Bibcode:2010PLoSO...510676W. doi:10.1371/JOURNAL.PONE.0010676. ISSN 1932-6203. PMC 2873961. PMID 20505760. Wikidata Q15625490.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[1]

Call Details

Presentation Materials

Notes

Wizard of Oz example as theme! Is a journey taken with brain, heart & nerve

Background

Music Library Association LInked Data Working Group (LDWG):

One project: thematic catalog number concordance in Wikidata

Data Harvesting & conversion to Wikidata

Usability testing: Process

Usability testing: Outcomes:

Usability testing: Wikidata properties

Lessons & questions: Design

Lessons & questions: Models

Lessons & questions: Data

BAMWOW into production: models

BAMWOW into production: Data Quality

More possibilities and questions:

Questions

Call details

Presentation Materials

Notes

Questions