Metadata helps researchers understand the content, context, and structure of the dataset. It provides details about variables, units of measurement, data sources, and data collection methods. As interdisciplinary research becomes more common, metadata becomes even more critical when datasets from various sources may be combined and analyzed together. It helps researchers from different fields understand and use data from diverse disciplines.
Prior to the start of a study, PIs and/or key research staff should begin planning for data collection to assure that data is gathered and documented in a consistent manner throughout the project. Part of that preparation includes the identification of the specific data elements to be collected and making decisions regarding the standard(s) associated with them.
Metadata is required for all shared datasets and well-constructed metadata will:
- maintain compliance with the funder’s data sharing policy.
- assist others in understanding the data, including the method(s) of collection.
- enable others to identify the data they want and need.
- communicate data access processes and restrictions and responsibilities for use.
Data Sharing, it’s all about the Metadata
The 3 interconnected components of Metadata are:
1. Data collection involves gathering information from various sources using various methods, such as surveys, interviews, instrument downloads, or manual data entry.
2. Data Annotation (Metadata) provides information about the context, structure, and attributes of the data and plays a crucial role in both data collection and sharing. It documents the origin, format, and characteristics of the data, making it easier for others to understand and use.
- Describe items/content for search and discovery purposes and provide important context about the shared data - enabling users to search, browse, sort, and filter information.
- Explain the organization of the shared data and/or its relationship(s) to other data, including the structure and navigation of folders and files.
- Define the administrative properties of shared data, which can include elements such as origins/sources, data standards, technical rules, data retention, access rights, and use.
3. Data sharing refers to the process of making data available to others, either within an organization or to external parties, to collaborate, accelerate research, or foster innovation.
5 Minute Videos
The below videos review some basic concepts of metadata in less than 5 minutes.
5 Minute Metadata - What is metadata? (3:55)
5 Minute Metadata - What is a standard? (3:28)
Common Data Elements and Data Standards
- recommended by a data policy from a journal, journal publisher, or funder.
- actively maintained by a representative of the resource.
- active and ready for use.
The README File
- Keywords: Terms or phrases that describe the subject, domain, and/or content of the data.
- Persistent Identifiers (PIDs): Unique identifiers, such as: ORCID ids, DOI (Digital Object Identifier), etc.
- Naming Conventions: Standards used to organize and identify folders and files and for version control.
- Data Ownership: Details regarding the creator, ownership/source(s), and rights associated with the data.
- Data Content/Quality: Information on data validation, anomalies, accuracy, precision, and completeness.
- Time Intervals: Information about the time resolution and frequency of data collection or timestamps indicating when data was collected or recorded.
The Data Dictionary
- Data Element Name: This is the name of the data element.
- Definition/Description: Describes the data element, its purpose and its context. e.g., weight in kilos, height in cm
- Data Type: This defines the type of data that can be stored in a field. E.g., text or numeric, date format
- Values and Anomalies: Variables used for a particular data element and deviations from standards, norms, or expected results.
- Data Structure/Groups: A group of data elements that describe a unit in the system and/or relationships between data elements.
3rd Party Resources
Creating metadata manually can be a confusing and time-consuming task. Stanford University and CalTech offer information about the process, including tools to assist researchers in automating the creation of Metadata.
Create metadata for your research project - Stanford University
The Research Data Management Workbook - California Institute of Technology
We will update this page as we gain more knowledge on this topic.