User:Spinster/A Wikimedians' guide to good collection websites

This is an essay.

It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints.

Wikimedia volunteers^[1] - or Wikimedians - are expert users of collection websites^[2] of GLAMs (Galleries, Libraries, Archives, Museums). We intensively use collection websites to find and verify information about art, history and culture. We often link to collection websites, in all relevant Wikimedia projects, most importantly in Wikipedia, but also in Wikimedia Commons, Wikidata, Wikisource and more.

Wikimedians tend to use information from collection websites on a larger scale, too, for instance when mass uploading media to Wikimedia Commons or when adding whole datasets to Wikidata.

Over the years, we have found that we really like it if collection websites have the following characteristics. These will usually help making a collection website more findable, discoverable and re-usable by humans and machines - and, as a pleasant side effect, probably more in line with your country's (upcoming) open data legislation, too.

What to include in a collection website?

Publish as much as you can.

Even better: publish everything. As soon as possible.

Collection databases are messy and many of your objects have not yet been approved by your curators. This is the case at every GLAM - it happens to the best, including MoMA. Research has clearly proven that end users - especially researchers - do want to see everything.^[3] Give us everything. Do you have records that have not been checked yet? Show them as soon as you can, and simply tell us that they still need to be looked at. Who knows, a Wikimedian might pass by, do some of that checking for you, and inform you about it!

The Museum of Modern Art publishes all its collection data on GitHub. This includes many records that have not been checked by curators yet. In a short article on medium.com, MoMA's Director of Digital Content & Strategy explains why and how this decision was made.
What do Wikimedians do with this information? A part of the MoMA collection is also uploaded to Wikidata (mostly curator-checked paintings and installations), interconnected with the rest of the information there, and MoMA has received its own Wikidata properties - MoMA artist ID and MoMA artwork ID.

How to design a collection website?

Keep it simple.

Oh yes, Wikimedians also love and admire great visualisations, interactive features of websites, and beautiful design. We don't want culture to appear in a boring environment!

But we appreciate (and use!) it most, when all information on a collection website

can be found by us via search engines (so it's not hidden in the deep web)
is available immediately - we may for instance fail to find your permalinks if they're two clicks behind the image we see first...
is written in open web formats and not obfuscated by Javascript or unnecessary frivolities that make it harder for humans or bots to process your information. Yes, looking at you accusingly, infinite scrolling!

Metadata is good. Metadata is not dirty.

Images of artworks deserve to be admired, preferably full screen. But the information behind them is extremely valuable too. Don't hide your metadata in a second screen. We want to see your credit lines, attribution information, your inventory numbers and acquisition history. Immediately. It helps us write better articles and describe your collections better, from our side.

Museum Rotterdam does this right, in a simple but very effective manner: they have lovely images on their collection website, but their metadata is immediately below it - not hidden in another tab or in a next page. No problem with that at all!

Cooper-Hewitt even makes metadata fun, by turning that 'boring' metadata into stories. With emoji! Without dumbing it down! How cool is that! 👍 😀

Much better than that extra click or tab. 😉

Give us indexes.

When visiting a collection website, we often don't know very well what you hold. What can we look for? Do you have a great photo collection of your town in the 1940s? Funky surrealist sculptures? Interesting Roman coins? Highlights may help us a bit, but maybe - as we are curious Wikipedians - we are looking for that one interesting, little-known item in your collection. There's nothing wrong with good old indexes - those boring, alphabetical lists of things, people, places and times that are relevant to your collection. They give us the overview we want! And a good designer can make them look nice.

Example welcome!

Let us search and filter in crazy and boring ways.

Wikimedians are often looking for very specific things. They are made happy with advanced search options, with faceted search and the option to go berserk with filters. We often like to filter content on quite strange criteria (copyright of images? year of acquisition? Hell yeah!) and we are very happy if this feature is not dumbed down for us!

Example welcome!

Lots of great text? Make it available under a free license.

All content on Wikimedia projects is available under free (Creative Commons) licenses. We do this as a community because we want everyone to be able to re-use the information created by us as freely as possible - yes, also for commercial purposes.

We notice that GLAMs often write and publish excellent texts about their collections and their area of expertise. And we often hear that GLAMs wouldn't mind us re-using (part of) those texts on Wikipedia. If your texts are copyrighted, though, we cannot re-use them: doing that would be a copyright violation, because we would transfer your copyrighted text to an enviroment that is entirely licensed under free Creative Commons licenses.

In order to make it possible for us to re-use your texts, we recommend that you release them under Creative Commons licenses yourself - more specifically licenses that are compatible with Wikimedia projects. That's CC0, public domain, CC-BY and CC-BY-SA.

We advise against the use of the non-commercial clause in Creative Commons licenses. A good reasoning for this can be found in this brochure.

Rubensonline is a Flemish portal that describes artworks by Peter Paul Rubens. The data of the portal is released under a CC0 license, so the texts, metadata and images there can be re-used on Wikimedia projects. The texts are very similar to Wikipedia articles and several articles on Dutch Wikipedia about Rubens' works have already been written on the basis of this website. Of course, Rubensonline is explicitly mentioned as the source!

PDF or HTML?

This one should be obvious. If you are able to publish something in HTML rather than in PDFs, please please do.

Give us permalinks.

All websites eventually become obsolete. And sometimes it's inevitable that - even though it's not good practice - your URLs will change.

In order to be able to deal with that, every collection website nowadays should have permalinks.

Please make them visible immediately, without us having to do an extra click to find/show them. They are not dirty. They are necessary and show that you care.

Example welcome!

Give us unique identifiers.

Especially when we add external cultural data to Wikidata, it is very handy and useful if each piece of that data is identified with a unique number or code. In that way, we can easily point the information on Wikidata to the exact correct piece of data in the source website, for instance with a specific property. We are very happy if your website contains those unique identifiers and if they are visible in your web pages. It is not mandatory but extra handy if your unique identifiers are even part of your permalinks!

Example welcome!

How to deal with images?

Provide correct copyright information for all your images, on a per-image basis.

It is safe to include general copyright disclaimers on your collection website, but usually copyright status is different per image. Some images are copyrighted (for instance: images of recent artworks), some may be released under Creative Commons licenses (example: photographs of events or older three-dimensional objects made by your staff), some are public domain (faithful reproductions of two-dimensional artwork that is in the public domain itself).

For end users, and for 'linkers' like us Wikimedians, it is most helpful, precise and correct if every image has separate copyright information.

Free licenses and public domain

Do you have many images (like photographs) made by your own staff? Make them available under Wikimedia-compatible free licenses.

https://commons.wikimedia.org/wiki/Commons:Choosing_a_license

Do you have many images of public domain two-dimensional artworks? Make these available as public domain as well.

Highest resolution

argumentation for highest res. Example Rijks

Comment: With very high resolution you can go very close to a painting and see a lot of details.

Tell us who made the image itself and (if relevant) who made the thing that is depicted in the image.

Sometimes you don't know. That's fine too - just tell that on your website. No shame in that. Much better than no info about authorship of any image at all.

Data export options

To API or not to API?

For developers/programmers, it is very handy and useful if your website provides an API. (any tips on how a good api should be designed?)

But for regular end users, Wikipedians, volunteers on Wikidata, such an API is not a must. Actually, we find it very helpful if the data from your website is made available in a very simple way: as comma- or tab-separated text files (csv, tsv). MoMA, for instance, does this on GitHub - easy to download, makes us very happy. The result is that we now have a considerable portion of the MoMA collection on Wikidata, too.

CC0

CC0 for data

Footnotes

^ Wikimedia volunteers are people who spend some of their free time editing Wikimedia projects. The most well-known Wikimedia project is Wikipedia, but many people also edit its sister projects, such as Wikimedia Commons, Wikidata, Wikisource, Wiktionary and many more.
^ In the context of this essay, the term collection website means any structured website, maintained by one or more cultural institutions, that showcases cultural content. It can be a collection website of a museum, an online catalogue of a library, a website to search and discover an archive's holdings, a portal that brings together various cultural collections, an online database of buildings or monuments, a knowledge base of information about culture, and much more.
^ "Discovering Physical Objects: Meeting Researchers' Needs" (PDF). Research Information Network. 2008-10-01. "What researchers need above all is online access to the records in museum and collection databases to be provided as quickly as possible, whatever the perceived imperfections or gaps in the records."

[1] Wikimedia volunteers are people who spend some of their free time editing Wikimedia projects. The most well-known Wikimedia project is Wikipedia, but many people also edit its sister projects, such as Wikimedia Commons, Wikidata, Wikisource, Wiktionary and many more.

[2] In the context of this essay, the term collection website means any structured website, maintained by one or more cultural institutions, that showcases cultural content. It can be a collection website of a museum, an online catalogue of a library, a website to search and discover an archive's holdings, a portal that brings together various cultural collections, an online database of buildings or monuments, a knowledge base of information about culture, and much more.

[3] "Discovering Physical Objects: Meeting Researchers' Needs" (PDF). Research Information Network. 2008-10-01. "What researchers need above all is online access to the records in museum and collection databases to be provided as quickly as possible, whatever the perceived imperfections or gaps in the records."

[1]

[2]

[3]