The first objective is that others can find your data easily.
- Upload the underlying data of your scientific publications to an online data repository. This might be a discipline-specific one or a generalist repository such as Zenodo.
- Include a link to the dataset in the data statement of your scientific publication, so readers can easily find it.
- Avoid broken links by choosing a data repository that gives datasets persistent identifiers (DOI, URN).
- Include keywords and rich metadata when uploading your data to the repository to ensure the dataset shows up in online searchers.
- If you cannot share the data itself, publish the descriptive metadata so other researchers know what you’re working on and can get in touch to collaborate.
- Choose data repositories that are integrated with the wider national and international research infrastructures, so that information of your dataset travels as far as possible and can be found by as many people as possible.
The second objective is that once found, the data files or their metadata can be accessed with as few barriers as possible.
- Check that your chosen data repository allows the data files to be downloaded through a free and open protocol, such as directly from the repository’s secure HTML page.
- Make sure that your data files can be downloaded and opened without any specialist programmes. For example, if you have created software that is needed to read the data, include it with the dataset.
- Be clear about any access limitations to the data you might have in place. These might as embargos or needing to request access from a committee. Accessible does not necessarily mean open or free.
- Include contact details such as email or phone number in the metadata, in case anyone would like to get in touch about the data or request access.
- Think about how long the data will be available. Here rich metadata is useful again as it often available longer than the data files.
The third objective is for the data to be in a format and form that easily understood by others in your field as well as machines.
- Find out the standards and vocabularies that are used to define common words, concepts or units in your field and use these to name and describe your data. This increases semantic interoperability and reduces the risk of others misinterpreting the data.
- If there is no standard ontology in your field, provide short guidance on how you have named and ordered your data. This could be done in a README file where you explain what abbreviations or measurement units were used.
- Upload the data in a file format that is easy to open and extract information from. It is easier to copy information from an Excel table than a PDF table.
- Most data repositories follow a standard vocabulary in describing their content. When you upload a dataset to Zenodo, the metadata the repository asks you to fill in will be automatically in the JSON Schema. This is also a part of interoperability.
- If reusing data, link the reused datasets to your own dataset in the data repository.
The fourth objective is that the data can be readily reused by others.
- Assign your data a license that clearly stipulates how the data can be reused. We recommend Creative Commons for datasets and MIT or Apache 2.0 for software. Read more on choosing a software license here.
- Mark the licence information clearly in the metadata.
- Consider what information would be useful to researchers reusing your data. Provide this information in your documentation, e.g. the dataset’s file naming conventions, folder structure, abbreviations used and units of measurement.
- Data provenance gives context to the conditions under which the research was made. Being explicit about how the data was collected, processed reduces the risk of the data being misinterpreted or misused. Document the workflow that led to the data and include it with your dataset.