This guide outlines the requirements to achieve each certificate level.
A bronze certificate acknowledges the basic requirements of open data, to be legally available for anyone to access, use and share, with an open license. To acquire a Bronze certificate, you must provide:
and the following legal, and technical requirements:
You must determine you have the legal right to publish the data. If you are the primary source of the data, this should be simple. In other cases, identifying whether you have a right to publish this data may be more complicated:
Once you have followed these steps, you can publish the data under an open licence or place it into the public domain with a waiver. We recommend using an existing open data licence. If you create your own licence, you must get it approved by the Open Definition Advisory Council to check that it is an open licence.
If any of the content is covered by copyright (for example if it contains descriptions that someone has written) then the content must also be licensed under an open licence.
Re-users must be able to find your data. A web page that describes your data, such as a repository like data.gov.uk, is a simple way to make your data discoverable. If you don't have a general web page for the data, there are several options to make it easier to find depending on how you are publishing it:
A Silver certificate has a number of additional requirements to ensure a silver release of open data is successful.
To achieve a Silver certificate, you must have a webpage describing the data. This can be on a repository such as data.gov.uk.
If your right to publish the data is complicated or unclear, to achieve a Silver certificate you must publish a document that describes the risks that reusers of the data would be taking on if they use and publicise the data. For example, there may be a risk of the data being removed because of a legal challenge.
If the data originally comes from elsewhere, you must publish a document that describes where the data comes from. This helps people who want to use the data to know how much to trust it.
You must publish a rights statement that describes any copyright or database rights in the data, how it is licensed and what kind of attribution people should give when they use it. Without this being explicitly stated it is difficult for people to have confidence in reusing the data.
If you are releasing data from which individuals could be identified (perhaps in combination with extra information available elsewhere), you must ensure you have addressed any privacy impact that releasing the data might have. That means:
and if you have data protection laws like those in Europe, you must:
If you are publishing data that might go out of date, make sure the data contains timestamps. This ensures that people reusing the data don't unintentionally use out-of-date information.
If you are publishing a series of datasets over time:
If you provide an API to access the data, and you are publishing dumps of the data as well, then you should create a new dump every month or more frequently. This means that people who want to get hold of the whole set of data can get hold of a dump that isn't too out of date.
You must consider how long you intend to publish the data for. For a silver certificate, you should intend for the data to be available in the same form for at least a year. If it is not going to be available for at least a year, it is unlikely any reuser will invest time to use it except in proofs of concept.
If you are publishing a one-off release of a dataset, you must be able to provide a URL directly linking to that dataset so it can be downloaded.
If you are publishing a series of datasets, the URLs for the datasets in the series should follow a consistent pattern so that people can predict the URL for a particular month for example, or create a script that downloads them all.
To achieve a silver certificate, the data must be published in a machine-readable format. Further to this:
To achieve a silver certificate, you must provide technical documentation about the data to help people understand it with a URL for this documentation.
You must also provide instructions for people to contact you:
You will be asked for page URLs that describe how to get in contact with someone in each of these cases; as a direct email address or a web form.
A gold certificate is aimed at data that is being published as a business-as-usual activity. This should enable other people to rely on its consistent availability, so they can build products and services that use the data.
You must have a clear legal right to make the data available as open data. At the gold level you should have resolved any ambiguities about your rights over the data so that others can reuse it without exposing themselves to any legal risks.
If not all the data was originally created or gathered by you, you must provide a machine-readable list of data sources. This can be done using RDFa within the web page where you describe those sources (as required for the silver level certificate), or as a separate document.
You must provide machine-readable metadata about:
Providing this in a machine-readable form enables automated aggregation and attribution of data. You can use the Rights Statement Vocabulary to express this information.
If the data you are publishing has been anonymised through aggregating data (so that individuals cannot be distinguished from others within their group) you must have the anonymisation process independently audited to ensure protection of a person’s privacy.
If the data you're publishing is about individuals:
To achieve a gold certificate, you must provide at least one URL linking to the data from your website so people can find it. We recommend adding links to pages of relevant topics to the data.
You need to provide at least one URL that demonstrates the data appearing within a list of similar data: this might be a repository such as data.gov.uk or alongside other similar datasets with a relevant search term on Google.
You also need to provide the URL of at least one document that references the data in publications, such as reports of blog posts.
You can only achieve a gold level certificate for a one-off release of a single dataset, a collection of datasets, or an API operating over a static set of data that is always up to date. For example, the results of an experiment will never go out of date, but the locations of bus stops will change, therefore must be regularly updated to achieve a gold level certificate.
If you are publishing a series of datasets over time, the delay between creating a dataset and publishing that dataset should be no more than half the gap between releases; for example if the data is released monthly, the data in that release should never be more than two weeks out of date. This helps to ensure that people work with up-to-date data.
If you are publishing an API, you must provide downloads of the data behind that API to achieve a gold level certificate. New dumps must be created within a week of any change to the data. If the data can be changed, you must also commit to correcting any errors people find within it.
You must provide documentation about any known quality issues with the data you are publishing. This helps reusers judge how much the data should be trusted.
Guarantees
You must take a regular offsite backup of the data. It is good practice for this to be a public backup (perhaps hosted by another organisation for you) in case access to your website is interrupted.
If you are providing an API, you must provide documentation that informs people of the service level available. For example, what guarantees can you give about the uptime of the service, what rate limiting do you do, and how long are outages likely to last? You will need to provide a URL to a document describing the service level that you are providing.
For a gold certificate, publication of the data must be a business-as-usual activity; you must ensure that its continued publication is incorporated into your medium-term plans.
If you are publishing a series of datasets, you must:
If you are publishing an API, you must:
You must publish your data using a standard open format such as XML, CSV or JSON. Further to this:
The data must include identifiers for the things described by the data. It must be possible to find out more data about those identifiers either by resolving them using an online service or by resolving them as URLs. These help to ensure that data about the same things can be merged together.
You must provide machine-readable metadata about the data, for example using DCAT with RDFa, including:
If you provide machine-readable metadata about individual files, these must include:
If the data format you are using is a generic data format that uses a data vocabulary (such as names of columns in CSV or properties in JSON), you must provide documentation about the vocabularies that are used in the data. Similarly, if the data contains codes you must provide documentation to explain them. This helps to ensure people using the data interpret it correctly.
To connect with people about the data, you must use social media and list the accounts you use, for example Twitter or Facebook.
You must have a designated place (for example a forum or mailing list) where people can discuss how to use the data and provide the page URL that informs people about it.
If the data can be corrected over time, you must:
An platinum-level certificate is designed for core reference data: data lots of other datasets will reference to provide further information about the things that they provide information about. The platinum-level requirements reflect the importance of this information.
You must also provide machine-readable versions of:
and in jurisdictions that include database rights:
Having this information in a machine-readable form enables automatic construction of citations for the data.
If you are publishing a series of datasets, you should have minimal or no delay between when the data is created or gathered and when you publish it. Similarly, if you are providing an API you should ensure you create a new dump of the underlying data within a day of any changes made. This ensures that people use up to date data.
You must provide documentation about how you ensure data quality, to help people reusing the data judge how much to trust it.
If you are providing an API over the data, you must provide an information page about the current status of the service. This informs API users of both anticipated and unplanned outage.
Data at platinum certificate level should be an essential part of your organisation’s operation and should provide a guarantee that the data will continue to be available long-term.
If you are providing an API, you must:
You should help reusers by enabling them to process it with existing tools which are based on open standards:
You must use URLs as identifiers within your data, and those URLs must resolve to further information about the things that are being identified. If there are reliable third-party URLs you can use, you must use them rather than creating your own. Reusing URLs as identifiers helps people to link together different data and therefore get more value from it.
You must provide a machine-readable provenance trail describing how the data was created and processed. You can do this with the open standard PROV. This is particularly important if the data you are publishing is derived from other data, allowing people to understand where the data comes from and therefore how much to trust it.
You must also provide a mechanism for people who are reusing the data, perhaps having received it from third parties, to validate that what they have received is what you originally published. For example, you might digitally sign the data or provide an MD5 hash that people can check to see whether it has been tampered with.
You must have a person or team who is actively building a data user community, and provide a URL link to their homepage.
You must publish a list of software tools and libraries that can be used with the data, whether those are created yourself or by other people. This helps give people a head start in using the data.