Research Guides: Open Science : Planning

Data Management Plans

During the planning stages of research, it is helpful to create a data management plan. This plan deals with where data will be stored, how it will be named, and how you will define and organize that data. A DMP typically describes:

What data will be produced as a part of the project
How each type of data will be organized, documented, standardized, stored, protected, shared and archived
Who will take responsibility for carrying out the activities listed above, and
When these activities will take place over the course of the project (and beyond)

Creating a Data Management Plan:

DMP Tool - a free, open-source, application that helps researchers create data management plans (DMPs). These plans are required by many funding agencies as part of the grant proposal submission process. The DMP Tool provides a click-through wizard for creating a DMP that complies with funder requirements. It also has direct links to funder websites, help text for answering questions, and data management best practices resources.
DMP Tool Funder Requirements - Templates for data management plans based on specific requirements listed in funder policy documents

Data Storage

Data storage: Storing data in an appropriate drive or repository is one of the first elements to begin planning for. The storage plan should be secure, have adequate function and memory for the research team’s needs, and comply with all data protection regulations. Storage can take place on a repository like OSF.io (Open Science Framework) or on a private drive like Google Drive or OneDrive.

See the UC Project Template on OSF

Use a data repository to provide for:

Persistent identifiers for your data (like DOI) that are unique and citable
Persistent access
Preservation
Backup
Management of access
Versioning
Licensing

Naming Conventions and Directory Structures

Data naming: Establishing a data naming convention for files can also assist researchers in increasing transparency and replicability. Creating a meaningful and standardized file name structure makes it easier to both find and understand data. There is no “best way” to create a data naming convention as these are often contextualized to the research project, but a few common practices include:

Including dates in a standard format, such as YYYY-MM-DD at the beginning of the file. This way, files can be organized by most recent.

Using an underscore to separate each item from a code. For example, if the naming convention uses year, which experiment, and which version, it could look like this: “2024-05-25_experiment1_v2”

Be specific about casing, whether that is upper or lower case

Data naming ideally goes from most broad to most specific. For example: “YYYY-MM-DD_Experiment1_Group2_JohnsonD_v1” indicates first the date that the data collection took place, then which research experiment this data is for, which group the participant is in, and finally which participant the data is about.

Data dictionary: A data dictionary defines and describes the variables and values used in your data set, providing essential context. Data dictionaries are documents that include information such as variable names, descriptions, units of measurement, and any coding schemes. Doing so increases consistency in the collection and reporting of data across collaborators. A data dictionary also helps streamline data analysis. Here is an example of information that might be included in a data dictionary for each variable.

See: McGill Codebook Cookbook

Variable	Description	Units	Coding Scheme
Q1_AGE	Age of respondent	Years	N/A
Q2_GENDER	Gender of respondent	N/A	1 = Male, 2 = Female
Q3_INCOME	Annual household income	US Dollars	N/A
Q4_EDUCATION	Highest level of education	N/A	1 = High School, 2 = Bachelor’s, 3 = Master’s, 4 = Doctorate
Q5_OCCUPATION	Respondent’s occupation	N/A	N/A
Q6_SATISFACTION	Satisfaction with service	Likert Scale (1-5)	1 = Very Unsatisfied, 2 = Unsatisfied, 3 = Neutral, 4 = Satisfied, 5 = Very Satisfied

Data directory: As files are named and organized per part of the data management plan, a logical directory structure is also important for anyone who might be trying to navigate wherever it is stored. A data directory makes it easier for collaborators and others to navigate through the different stages of a project. This typically is a simple text README file that presents everything in the drive or repository to view at once.

For an example of a data set that includes a README file, see this Scoping Review Project on OSF.

Directory structures:

Directories (main folder) and subdirectories (nested folders) organized to make research materials discoverable and understandable
Create subdirectories for like materials: separate data, code, and results.
Locations should be distinctive, consistent, and informative:
- What it is
- Why it exists
- How it relates to other files
For more information on data organization, see Karl Broman's work.