During the planning stages of research, it is helpful to create a data management plan. This plan deals with where data will be stored, how it will be named, and how you will define and organize that data. A DMP typically describes:
Creating a Data Management Plan:
Data storage: Storing data in an appropriate drive or repository is one of the first elements to begin planning for. The storage plan should be secure, have adequate function and memory for the research team’s needs, and comply with all data protection regulations. Storage can take place on a repository like OSF.io (Open Science Framework) or on a private drive like Google Drive or OneDrive.
Use a data repository to provide for:
Data naming: Establishing a data naming convention for files can also assist researchers in increasing transparency and replicability. Creating a meaningful and standardized file name structure makes it easier to both find and understand data. There is no “best way” to create a data naming convention as these are often contextualized to the research project, but a few common practices include:
Including dates in a standard format, such as YYYY-MM-DD at the beginning of the file. This way, files can be organized by most recent.
Using an underscore to separate each item from a code. For example, if the naming convention uses year, which experiment, and which version, it could look like this: “2024-05-25_experiment1_v2”
Be specific about casing, whether that is upper or lower case
Data naming ideally goes from most broad to most specific. For example: “YYYY-MM-DD_Experiment1_Group2_JohnsonD_v1” indicates first the date that the data collection took place, then which research experiment this data is for, which group the participant is in, and finally which participant the data is about.
Data dictionary: A data dictionary defines and describes the variables and values used in your data set, providing essential context. Data dictionaries are documents that include information such as variable names, descriptions, units of measurement, and any coding schemes. Doing so increases consistency in the collection and reporting of data across collaborators. A data dictionary also helps streamline data analysis. Here is an example of information that might be included in a data dictionary for each variable.
Variable |
Description |
Units |
Coding Scheme |
Q1_AGE |
Age of respondent |
Years |
N/A |
Q2_GENDER |
Gender of respondent |
N/A |
1 = Male, 2 = Female |
Q3_INCOME |
Annual household income |
US Dollars |
N/A |
Q4_EDUCATION |
Highest level of education |
N/A |
1 = High School, 2 = Bachelor’s, 3 = Master’s, 4 = Doctorate |
Q5_OCCUPATION |
Respondent’s occupation |
N/A |
N/A |
Q6_SATISFACTION |
Satisfaction with service |
Likert Scale (1-5) |
1 = Very Unsatisfied, 2 = Unsatisfied, 3 = Neutral, 4 = Satisfied, 5 = Very Satisfied |
Data directory: As files are named and organized per part of the data management plan, a logical directory structure is also important for anyone who might be trying to navigate wherever it is stored. A data directory makes it easier for collaborators and others to navigate through the different stages of a project. This typically is a simple text README file that presents everything in the drive or repository to view at once.
For an example of a data set that includes a README file, see this Scoping Review Project on OSF.
Directory structures:
University of Cincinnati Libraries
PO Box 210033 Cincinnati, Ohio 45221-0033
Phone: 513-556-1424
University of Cincinnati
Alerts | Clery and HEOA Notice | Notice of Non-Discrimination | eAccessibility Concern | Privacy Statement | Copyright Information
© 2021 University of Cincinnati