Commons:WMF support for Commons/Commons community calls/Discussion 3 - Content organization
The main question explored in these calls was whether the Wikimedia Foundation should continue to develop structured data, or make the category system multilingual and easier to use. In both calls, there was broad agreement that we shouldn’t maintain two essentially separate systems. There was a preference for developing structured data, tempered by an awareness that the community has been using categories for so long that any switch would require a lot of technical and social commitment.
Categories have played an important role in maintenance workflows and also hold different information than is added through structured data. One person speculated that we might end up with a smaller number of larger categories as we move more information into structured data. Others thought that the solution could be for categories to use structured data, since the major categories are already connected to Wikidata through sitelinks. However, most people were more focused on the outcome than the technical approach: “What matters to most users is the user interface, not the means of storage.”
The issues with categories that were discussed on the call included:
- The dominance of English, which is a barrier to use.
- The difficulty of browsing the hierarchical structure, especially when the community is constantly diffusing categories to limit their size.
- The relationship between categories is not defined. For example, Category:Example Person will include ‘Category:Photographs by Example Person’ and ‘Category:Photographs of Example Person’, but there’s no way to distinguish those through the category tree.
- The display limit of 200 files per page, with no option to filter, which is especially challenging on mobile. There are workarounds but they are not widely known or used.
Potential improvements included helping uploaders find appropriate categories.
The main advantages attributed to structured data were multilinguality; compatibility with other linked open data approaches; and reusability. However, there was also feedback that the implementation was incomplete: “I would love to work with structured data, but it’s half finished and it doesn’t work properly.”
The issues with structured data that were discussed on the call included:
- It is currently harder to contribute structured data:
- It is entered on a separate tab that no one sees.
- For Commons contributors, there is a high barrier to creating a new Wikidata item to describe a file, especially for things like buildings and people that may not have credible references.
- There is a lack of clarity about using statements beyond ‘depicts’ and whether to describe the digital object or the thing that’s been digitized.
- There is no clear, agreed distinction between the structured data caption and wikitext description.
- There is no consensus about whether ‘depicts’ should use only the most specific item or include things further up the tree. (Existing tools depend on the latter.)
- Suggestion that the interface guide contributors to ‘refine’ a depicts statement to a more specific item by showing items that are further down the tree.
- Also a reflection that we should be able to give real-time, contextual feedback to a contributor, e.g. “You don’t need to add the ‘musical instrument’ tag because you already have ‘violin’.”
- There is a lack of effective and well-maintained tools, which was further attributed to:
- An unusable query service.
- Uneven contribution of structured data; tool builders can’t yet assume that most files will have structured data.
- Structured data is less visible to patrollers, despite structured data captions have been a target.
- There is no edit summary for edits made to structured data.
- There are no tools for addressing vandalism (or just inaccurate and unhelpful metadata) in structured data, especially for edits made by anonymous editors.
- Rollback is harder, especially in cases where there is a mix of edits.
- Wikimedia Commons as a whole lacks some of the anti-vandalism features on Wikipedia, such as ORES scoring.
- It was suggested that approaches and tools from Wikidata could help if they were implemented on Commons.
- There are missing features, such as the ability to preview files, and structured data is not pulled through to Wikipedia or regularly-used software like CropTool.
There was a strong demand for better search and faceted browsing, including the ability to search using concepts and across categories. For example, if someone was looking for a photo of a squid, they might also want to see bobtail squids and cuttlefish, but they would be categorized elsewhere in the classification tree. In another example, you want to filter using ‘Venice’ and ‘canal’ and see results for the intersection of these terms. One person summarised this in the following way: “A category is, in principle, the result of a particular query.”
Other suggested improvements included: queries to produce a category-like view and similar photos; and a tool that could populate SDC from just a couple of fields like ‘name’ or ‘subset of’. Finally, there was also a recommendation that photo contest organizers could do more to encourage newcomers to better describe their content.