How to achieve an Open Data Certificate?
This guide outlines the requirements to achieve each certificate level.
A bronze certificate acknowledges the basic requirements of open data, to be legally available for anyone to access, use and share, with an open license. To acquire a Bronze certificate, you must provide:
- a title for the open data
- the name of the publisher
and the following legal, and technical requirements:
You must determine you have the legal right to publish the data. If you are the primary source of the data, this should be simple. In other cases, identifying whether you have a right to publish this data may be more complicated:
- If you are using data from elsewhere, you will need to check the licence of that data to ensure you can republish it. If it is open data, you're OK.
- If you are crowdsourcing data, you may need to check that the contributors to that data have given you permission to reuse it through a Contributor Licence Agreement.
- If the rights are complicated or unclear, for example, because the data includes other data which hasn't been clearly licensed itself, you can still achieve a Bronze certificate.
- There is more guidance in the ODI's Publisher's Guide to Licensing. If in doubt, we advise you to consult a lawyer.
Once you have followed these steps, you can publish the data under an open licence or place it into the public domain with a waiver. We recommend using an existing open data licence. If you create your own licence, you must get it approved by the Open Definition Advisory Council to check that it is an open licence.
If any of the content is covered by copyright (for example if it contains descriptions that someone has written) then the content must also be licensed under an open licence.
Re-users must be able to find your data. A web page that describes your data, such as a repository like data.gov.uk, is a simple way to make your data discoverable. If you don't have a general web page for the data, there are several options to make it easier to find depending on how you are publishing it:
- If you are publishing a one-off release of a single dataset, there should be a URL from which the dataset can be downloaded.
- If you are publishing a series of datasets, you should have one or more of:
- a fixed URL from which the current version can always be downloaded
- a URL pattern that developers can use to work out the URL for a given dataset in the series
- a list of releases provided in a machine-readable format
- If you are publishing an API, you should have a URL for the endpoint of that API at which a description is also published.
A Silver certificate has a number of additional requirements to ensure a silver release of open data is successful.
To achieve a Silver certificate, you must have a webpage describing the data. This can be on a repository such as data.gov.uk.
If your right to publish the data is complicated or unclear, to achieve a Silver certificate you must publish a document that describes the risks that reusers of the data would be taking on if they use and publicise the data. For example, there may be a risk of the data being removed because of a legal challenge.
If the data originally comes from elsewhere, you must publish a document that describes where the data comes from. This helps people who want to use the data to know how much to trust it.
You must publish a rights statement that describes any copyright or database rights in the data, how it is licensed and what kind of attribution people should give when they use it. Without this being explicitly stated it is difficult for people to have confidence in reusing the data.
If you are releasing data from which individuals could be identified (perhaps in combination with extra information available elsewhere), you must ensure you have addressed any privacy impact that releasing the data might have. That means:
- only publishing personal data if it is anonymised, unless you are required or permitted to publish it by law
- performing a risk assessment around the release of the data
and if you have data protection laws like those in Europe, you must:
- publish the privacy notice that individuals affected by the data have given consent to
- consult with the person within your organisation who is responsible for data protection
If you are publishing data that might go out of date, make sure the data contains timestamps. This ensures that people reusing the data don't unintentionally use out-of-date information.
If you are publishing a series of datasets over time:
- the delay between creating a dataset and publishing that dataset should not be longer than the gap between releases; for example if there is a new release each month, the data in that release should not be more than a month out of date
- if the underlying data changes frequently (ie at least daily) then you should create a new release every month or more frequently
If you provide an API to access the data, and you are publishing dumps of the data as well, then you should create a new dump every month or more frequently. This means that people who want to get hold of the whole set of data can get hold of a dump that isn't too out of date.
You must consider how long you intend to publish the data for. For a silver certificate, you should intend for the data to be available in the same form for at least a year. If it is not going to be available for at least a year, it is unlikely any reuser will invest time to use it except in proofs of concept.
If you are publishing a one-off release of a dataset, you must be able to provide a URL directly linking to that dataset so it can be downloaded.
If you are publishing a series of datasets, the URLs for the datasets in the series should follow a consistent pattern so that people can predict the URL for a particular month for example, or create a script that downloads them all.
To achieve a silver certificate, the data must be published in a machine-readable format. Further to this:
- if you are publishing documents these must be published in a format for documents (which can include Word, OpenOffice or PDF at the silver level)
- if you are publishing statistics the data must be published in a format for tabular data (which can include Excel at the silver level)
- if you are publishing geographical data or any other kind of structured data, it must be published in a structured data format (which can include JSON, XML or CSV at the silver level)
To achieve a silver certificate, you must provide technical documentation about the data to help people understand it with a URL for this documentation.
You must also provide instructions for people to contact you:
- with any questions they have about the data
- to suggest improvements to publish the data
- to raise any concerns about the disclosure of personal details
You will be asked for page URLs that describe how to get in contact with someone in each of these cases; as a direct email address or a web form.
A gold certificate is aimed at data that is being published as a business-as-usual activity. This should enable other people to rely on its consistent availability, so they can build products and services that use the data.
You must have a clear legal right to make the data available as open data. At the gold level you should have resolved any ambiguities about your rights over the data so that others can reuse it without exposing themselves to any legal risks.
If not all the data was originally created or gathered by you, you must provide a machine-readable list of data sources. This can be done using RDFa within the web page where you describe those sources (as required for the silver level certificate), or as a separate document.
You must provide machine-readable metadata about:
- the licence the data is available under
- the licence the content of the data is available under
- what attribution text should be used for the data (even if you don't require attribution within the licence that you're using)
- what link to use when providing attribution for this data
Providing this in a machine-readable form enables automated aggregation and attribution of data. You can use the Rights Statement Vocabulary to express this information.
Even if there are no rights within the content of the data (for example data that contains facts and numbers), to remove ambiguity for reusers, you should explicitly indicate the content is public domain.
If the data you are publishing has been anonymised through aggregating data (so that individuals cannot be distinguished from others within their group) you must have the anonymisation process independently audited to ensure protection of a person’s privacy.
If the data you're publishing is about individuals:
- If you have a legal right to publish it (for example because you are publishing a public register that is defined by law), you must publish a document that describes that legal right. This helps justify the publication both to people who want to use the data you are publishing and the individuals whose details are given in the data.
- You must publish your assessment of the impact of publishing the data on the privacy of the affected individuals.
- You must have your risk assessment independently audited to make sure that it's taken into account all the possible impacts.
- You must have your anonymisation process independently audited to make sure there is no inadvertent disclosure of personal information.
To achieve a gold certificate, you must provide at least one URL linking to the data from your website so people can find it. We recommend adding links to pages of relevant topics to the data.
You need to provide at least one URL that demonstrates the data appearing within a list of similar data: this might be a repository such as data.gov.uk or alongside other similar datasets with a relevant search term on Google.
You also need to provide the URL of at least one document that references the data in publications, such as reports of blog posts.
You can only achieve a gold level certificate for a one-off release of a single dataset, a collection of datasets, or an API operating over a static set of data that is always up to date. For example, the results of an experiment will never go out of date, but the locations of bus stops will change, therefore must be regularly updated to achieve a gold level certificate.
If you are publishing a series of datasets over time, the delay between creating a dataset and publishing that dataset should be no more than half the gap between releases; for example if the data is released monthly, the data in that release should never be more than two weeks out of date. This helps to ensure that people work with up-to-date data.
If you are publishing an API, you must provide downloads of the data behind that API to achieve a gold level certificate. New dumps must be created within a week of any change to the data. If the data can be changed, you must also commit to correcting any errors people find within it.
You must provide documentation about any known quality issues with the data you are publishing. This helps reusers judge how much the data should be trusted.
You must take a regular offsite backup of the data. It is good practice for this to be a public backup (perhaps hosted by another organisation for you) in case access to your website is interrupted.
If you are providing an API, you must provide documentation that informs people of the service level available. For example, what guarantees can you give about the uptime of the service, what rate limiting do you do, and how long are outages likely to last? You will need to provide a URL to a document describing the service level that you are providing.
For a gold certificate, publication of the data must be a business-as-usual activity; you must ensure that its continued publication is incorporated into your medium-term plans.
If you are publishing a series of datasets, you must:
- Have a single URL where people can always download the most recent issue of the dataset. This could be a redirection to the relevant dataset URL.
- Have a machine-readable document that lists all available releases of the data. This helps people to source them all without guessing URLs.
If you are publishing an API, you must:
- Provide a machine-readable service description document or entry point for the API. This enables applications to detect and adjust to changes in the API which are likely to happen over time.
- You must also provide a single URL allowing people to download the most recent dump of the underlying data.
You must publish your data using a standard open format such as XML, CSV or JSON. Further to this:
- if you are publishing documents, these must use a semantic data format designed for documents, such as HTML, Docbook or Markdown
- if you are publishing statistics, these must be available in a standard tabular data format such as CSV or in a format specifically designed for statistics
The data must include identifiers for the things described by the data. It must be possible to find out more data about those identifiers either by resolving them using an online service or by resolving them as URLs. These help to ensure that data about the same things can be merged together.
You must provide machine-readable metadata about the data, for example using DCAT with RDFa, including:
- the title of the data
- a description of the data
- the release date of the data
- its modification date
- how frequently the data is updated
- an identifier for the data
- a pointer to a landing page for the data
- the language of the data
- the publisher of the data
- the spatial/geographical coverage of the data
- the temporal coverage of the data
- a theme for the data
- keywords or tags for the data
If you provide machine-readable metadata about individual files, these must include:
- the title of the file
- a description of the file
- a release date for the file
- its modification date
- a link to the rights statement related to the data
If the data format you are using is a generic data format that uses a data vocabulary (such as names of columns in CSV or properties in JSON), you must provide documentation about the vocabularies that are used in the data. Similarly, if the data contains codes you must provide documentation to explain them. This helps to ensure people using the data interpret it correctly.
To connect with people about the data, you must use social media and list the accounts you use, for example Twitter or Facebook.
You must have a designated place (for example a forum or mailing list) where people can discuss how to use the data and provide the page URL that informs people about it.
If the data can be corrected over time, you must:
- Provide instructions to enable people to inform you about corrections needed and provide a URL of the page describing how to report these errors.
- Provide a subscribable mailing list or feed used to notify people about corrections to the data, with a URL of the page describing how to subscribe.
An platinum-level certificate is designed for core reference data: data lots of other datasets will reference to provide further information about the things that they provide information about. The platinum-level requirements reflect the importance of this information.
You must also provide machine-readable versions of:
- information in the rights statement as required in a Gold certificate
- a copyright notice or statement
- the copyright year
- the copyright holder
and in jurisdictions that include database rights:
- the database right year
- the database right holder
Having this information in a machine-readable form enables automatic construction of citations for the data.
If you are publishing a series of datasets, you should have minimal or no delay between when the data is created or gathered and when you publish it. Similarly, if you are providing an API you should ensure you create a new dump of the underlying data within a day of any changes made. This ensures that people use up to date data.
You must provide documentation about how you ensure data quality, to help people reusing the data judge how much to trust it.
If you are providing an API over the data, you must provide an information page about the current status of the service. This informs API users of both anticipated and unplanned outage.
Data at platinum certificate level should be an essential part of your organisation’s operation and should provide a guarantee that the data will continue to be available long-term.
If you are providing an API, you must:
- provide database dumps at dated URLs, enabling people to download a series of dumps (perhaps for comparison purposes)
- provide a list of the available database dumps in a machine-readable form such as a feed
You should help reusers by enabling them to process it with existing tools which are based on open standards:[e]
- if you are publishing statistical data, you must publish it in a standard statistical data format (possibly alongside other formats) such as SDMX or Data Cube which exposes the underlying structure of dimensions, measures and attributes
- if you are publishing geographical data, you must publish it in a standard geographical data format (possibly alongside other formats) such as KML or GeoJSON
You must use URLs as identifiers within your data, and those URLs must resolve to further information about the things that are being identified. If there are reliable third-party URLs you can use, you must use them rather than creating your own. Reusing URLs as identifiers helps people to link together different data and therefore get more value from it.
You must provide a machine-readable provenance trail describing how the data was created and processed. You can do this with the open standard PROV. This is particularly important if the data you are publishing is derived from other data, allowing people to understand where the data comes from and therefore how much to trust it.
You must also provide a mechanism for people who are reusing the data, perhaps having received it from third parties, to validate that what they have received is what you originally published. For example, you might digitally sign the data or provide an MD5 hash that people can check to see whether it has been tampered with.
You must have a person or team who is actively building a data user community, and provide a URL link to their homepage.
You must publish a list of software tools and libraries that can be used with the data, whether those are created yourself or by other people. This helps give people a head start in using the data.
Create a certificate
Don't have an account?
Don't worry, you can register later to save your work.Create an account