In meiner Diskussion mit Stefan Niggemeier über Robots.txt hatte ich auf die Auseinandersetzung hingewiesen, die sich seit Jahren um die maschinenlesbare Darstellung von Urheberrechtsinformationen dreht. Den Reaktionen auf diesen Text war zu entnehmen, dass viele Leserinnen und Leser mit der Thematik noch nicht vertraut sind. Daher hier ein interessantes Hintergrundpapier aus den USA, das Lage und Problematik in einem kurzen Abriss darstellt. Es handelt sich um ein aktuelles internes Dokument in englischer Sprache, das ich hier veröffentlichen darf, jedoch ohne den Absender zu nennen. Die Fakten habe ich überprüft. Sie sind nach meinem Kenntnisstand alle korrekt.
Introduction to IPTC, rNews and RightsML
The International Press Telecommunications Council (IPTC, www.iptc.org) is a London-based consortium of news agencies, news publishers and news industry vendors that develops and maintains technical standards for improved news exchange. Established in 1965, its standards are now used by virtually every major news organization in the world.
The IPTC has recently developed two important standards 1) rNews, a protocol for using semantic markup to annotate news-specific descriptive data, known as metadata, in HTML documents, and 2) RightsML, an associated rights expression language that allow news content owners to express information about how the content can or cannot be licensed and used. The ultimate goal of these two IPTC standards is to allow a robust, autonomous, asynchronous, virtual content licensing marketplace in which content owners, publishers and users could gather to identify, license, distribute and use content based on machine-readable rights.
rNews arose out of a need to better convey information about news content to anyone interested in making downstream use of content once it was originally published, whether to present search results, analyze current events, integrate into other news offerings or otherwise use the content or information about it. It includes vocabularies that are common to any creative work but also introduces terms such as byline and headline that are specific to multimedia news content. Work on the standard began in October 2010 and the first version was released a year later in September of 2011.
RightsML, whose development began prior to the release of rNews, is a markup language designed to express usage terms in a machine-readable format. This rights information is what would be referenced by the rNews usageTerms property for machine-to-machine communication of usage Terms. The rights language includes a consistent vocabulary for actions such as the permission or the obligations before one can index, aggregate, translate or share content of any type available digitally. While simply displaying the copyright notice and human-readable usage terms might be sufficient for consumer use, RightsML provides a rich framework for successful business-to-business communication of sophisticated rights information in an unambiguous manner. It is based on an existing rights expression framework successfully employed by the mobile industry, among others, called the Open Digital Rights Language (ODRL) and has been in an experimental phase for news usage wherein various publishing users have been testing out applications of the language since April 2012.
IPTC Interaction with Google and Schema.org
When questioned by publishers on the Schema.org development site, in online forums for semantic web practitioners and at industry panels, Mr. Guha has claimed Schema.org does not see benefit to these properties being included and will not process them. The IPTC continues to persistently lobby Schema.org for inclusion of the omitted properties, but has not been given clear information on how to build a successful case. With search engines’ pivotal role in downstream communications, this stance clearly undermines the content publishers’ interests in expressing copyright and developing an informed marketplace around the restrictions and permissions on reuse of their content, including capturing their fair share of revenue from downstream uses of their content.
To date, the Schema.org actions have created confusion in the market and hampered adoption of rNews as news publishers struggle to understand whether inclusion of these properties in their publishing templates will adversely affect the indexing and search results display of their content. In the meantime, Google and Microsoft have both reported relatively rapid uptake of Schema.org vocabularies across industries in its first nine months, citing 6-8% of the pages their massive systems index as including schema.org marked up documents.
The Superiority of rNews and RightsML over Schema.org and Robots.txt
When Google was just a search engine with an indexing algorithm and a simple link back to originating sources it may have been sufficient for a web page to tell it a simple ‘yes’ or ‘no’ on being crawled through the binary robots.txt protocol. However, Google is now much more than that, with much more extensive use of content in its search results pages and on its own publishing platforms. As Google becomes more and more of both a party and a conduit to a much more dynamic content marketplace, it is now imperative to convey what can and cannot be done with content after it is indexed. Otherwise, Google is able to exploit the structured data publishers are now improving to its further benefit without any obligations.
Robots.txt is no substitute for RightsML. Robots.txt is not robust because it only specifies one “right” – the right to search. It does not address any traditional copyright permissions. It is not flexible because it only addresses large sections of websites, not individual published items that might make up a web page. It is not efficient and can be quite onerous because, in order to determine rights for any particular element of content, an entity needs to process the entire robots.txt file describing all rights for all content on the site rather than processing only the usage rights associated with a particular content item.
Copyright Notice and Usage Rights as Remedies
The needs of users, publishers, distributors and creators could be served best by having a dynamic marketplace in which permissions, restrictions and financial obligations are set by the content supplier and the value derived is determined naturally by market demand. This is only available when the infrastructure of a marketplace is available to leverage and the opportunities to locate and participate in the market are made simple.
If Google were required to crawl and process any metadata markup established by a recognized standards body, including any associated, standards-based rights expression language description of rights, content creators could inform Google (and anyone else interested in making further use of someone else’s digital content), precisely how much of their content can be used, how Google can use it and, if relevant, what compensation Google would have to pay for particular uses. In addition, requiring Google to process this information would have the added benefit of allowing users or potential downstream publishers to search for content by what licenses to the content were available, as well as by the information in the content. This is similar to what Google already does in its “Advanced Search” tab, but which they currently limit to only include variants for free use. Furthermore, once Google starts to recognize and process metadata markups and rights expression languages, others can begin to establish new marketplaces in which content owners, content acquirers and others could gather to license content fairly and openly. The whole process would also be greatly aided if Google and others were more transparent about what they do and don’t do with the information they capture.
Google’s direct involvement with the standards bodies such as IPTC and education publishing’s Learning Resource Metadata Initiative, for example, which are working to advance machine-readable usage rights would also give a boost to these efforts and clearly signal Google’s unequivocal commitment to full endorsement of their developing metadata initiatives. As noted above, Schema.org’s vocabularies have had an uptake already of 6-8%. If Google and its search engine cohort are required to include the copyright and usage terms in Schema.org’s vocabularies for news articles and other creative works and are required to process the information contained in those fields, the infrastructure on the internet to efficiently process and act on rights information could flourish. In this way, fair use can be reinforced and fair share can be returned to digital content publishers.Twittern