Table of Contents
- Finding, Using, and Analyzing Data: Services available to those working actively with data (or who would like to)
- Data Management Plans: Nuts and bolts
- Research data consultations: What happens when you make a request to RDCG
- Best Practices for Research Data Management: Where to get training
- Data After Your Project: Resources for active and archival storage, sharing, and preservation
I need to locate available data or statistical information for my project. How do I do that?
The Center for Science and Social Science Information has a data librarian on staff, in addition to subject specialists with expertise in where to find information across the sciences and social sciences. Email email@example.com to request a consultation or email your subject specialist.
The data I have found has access restrictions because the data isn’t de-identified. Where do I go for assistance?
Data that falls under the Health Insurance Portability and Accountability Act (HIPAA) requires confidentiality of electronic health information that can be linked to an individual patient. Studies on underage humans also have access restrictions. In addition, data on vulnerable populations may have ethical and IRB limitations on how the data may be accessed, shared, and retained.
The data I found uses obsolete software from Windows 3.1.
The RDCG consultation team includes people in the Yale University Library’s digital preservation services who work with these software issues all the time. Feel free to email RDCG so they can help you!
I need to analyze my data, but have some questions about using my analysis tools. Where do I go for help?
RDCG’s membership includes representatives from the Yale Center for Research Computing, Center for Science and Social Science Information Statistical Lab (StatLab), and the Digital Humanities Lab. These groups have different areas of expertise. We recommend either contacting the best group for your needs directly or emailing RDCG to facilitate a connection.
The StatLab provides facilities, equipment, software, instruction and in-depth consulting for data management, statistical software, quantitative methods, data resources, and emerging technologies. The StatLab also employees consultants who answer questions ranging from personal and departmental research to classroom activities using statistical software and statistical methodologies.
Another resource is the Yale Center for Analytical Sciences. Their service models are different from those offered by RDCG member organizations, so please contact them for further information.
What is a data management plan?
A data management plan is a road map for how a researcher or research team will create, house, deliver, maintain, archive, and preserve your data. It is an essential component of responsible research conduct. Data management plans, often abbreviated “DMPs,” are short documents submitted as part of a grant that address major themes in how a research group will collect, analyze, and store data during and after a project.
DMPs are required by many federal agencies due to the White House Office of Science & Technology Policy Memo on Expanding Public Access to the Results of Federally-Funded Research. Private foundations increasingly require these plans, too.
A DMP, depending on the type of research conducted, may answer the following:
- Metadata/documentation standards the data will use.
- File formats, directory structures, and any anonymization protocols.
- Replication and reproducibility plans.
- Backup and retention policies.
- Access restrictions and data security protocols, especially for human subjects or endangered species research.
- Data sharing embargoes, or how long after collection the data will be kept private to the research team.
- Data repository or long-term storage plans.
RDCG consultants are available to work with researchers on finding the best answers to these questions for your lab, which may involve a consultation or referral to other research data support services at Yale. There is no one-size-fits-all solution for research data management.
Do I need to have a funding agency requirement to meet with RDCG or another data service provider?
Data support services at Yale work across the research life cycle — some are definitely grant-focused. RDCG’s consultants believe that good data management practices are for everyone and accepts questions from everyone in the Yale community.
I need to write a data management plan due this Friday at 5 PM! Are there any tools that can help me write one?
Yale maintains institutional access to the Data Management Planning Tool from the California Digital Library. This tool is useful if you need general information about what a data management plan is or want to view the list of requirements that various funding agencies mandate. It also contains templates and a DMP creation tool.
What do you mean by consultation?
The Research Data Consultation Group (RDCG) provides consultations about your current data practices, research needs, and the options available to you. One typically involves you and one or more consultants, but it may include a lab group and your collaborators, too. Sometimes, RDCG consultations include outcomes like follow-ups — the consultants want to ensure that the advice they give is a right fit for your lab. At other times, the conversation is limited to email or involves a referral to the services you need.
How does a consultation work, and what happens behind the scenes?
When you use the Contact form, the request is seen by all of the RDCG consultants. RDCG claims consultations based on individual members’ expertises and your need. A good level of detail in a consultation request is one that tells them your discipline, the kind of data you work with, and the reason you have requested a consultation. The consultants who claim the request will email you to set up a meeting or other follow-up.
How much do RDCG services cost?
Consultations are free of charge. However, the implementation of services or infrastructure from ITS or other partners may require funding.
Depending on your discipline, you can find on-demand data management best practices tutorials and courses online. Many come out of the United Kingdom and Europe, which have different funding contexts from the United States.
- Digital Humanities: Training materials from the University of Oxford
- Earth Sciences: Data Management Short Course for Scientists from the Earth Science Information Partners Commons
- Generic: Data Management Modules, Webinars, and Screencasts from DataONE
- Generic/Social Sciences: MANTRA Training from the University of Scotland
Are there workshops on metadata and file formats?
RDCG consultants have offered numerous workshops on metadata, file formats, and other practical aspects of research data management. Many research data workshop needs are perennial, such as:
- Organizing Data and Data Documentation — A session on good practices for data documentation, file organization, file naming, metadata for research data, data storage options available at Yale, and more.
- Where Did I Put That File? — A session for library staff on how to develop file name, directory, and other best practices for working with files alone or in groups.
- Research Data Management workshops — Offered 1-2 times per semester and during the summers; content often includes the basics of file naming and organization, sharing and ethics, and versioning/documentation.
RDCG works with lab groups, departments, and interested students or researchers to develop workshop content. The options you see above are modular.
The Yale Center for Research Computing, Digital Humanities Lab, and other units offer a multitude of workshops. These vary from year to year and semester to semester based on research trends across the disciplines — see the Training Calendar for details.
What resources are available at Yale for data storage?
The Storage Options page on the Yale ITS web site provides information on research data storage solutions.
What is a repository?
This is a generic term used for a destination that is both (a) designed for data storage and (b) where data may be submitted and made available to a community. Data repositories may be general (catering to a wide variety of data communities) or specialized (where only genomic, astronomical, &c. data may be stored) or institutional (data from various communities where the members all have the same institutional affiliation). Some journals, like Science and Nature, have their own repositories.
Which repositories provide long-term archival storage?
A list of data repositories is available at re3data.org, and some of these may offer long-term archival storage.
How much metadata do I actually need to supply for others to use my data?
You should supply enough context for your data that others could reliably test it for reproducibility or understand the methodology behind a research article. If you use an online repository in your research area, this means practicing due diligence when completing the fields available to you during the data submission process; ensuring that column names in tabular data are meaningful and defined by a data dictionary; and properly documenting your code, at minimum.
If you are hosting your data outside of an established repository, you can use README.txt files and metadata files. For specific recommendations on larger, self-hosted projects, ask for a consultation.
Does anyone offer consultations on preserving my data?
Yes. Several of the RDCG consultants have backgrounds in preservation and metadata, so if you have questions, they’d be happy to answer them.