Welcome to Yale Dataverse!
Yale Dataverse is a generalist data repository hosted by Yale University Library for Yale-affiliated researchers to support data sharing, access, reproducibility, and open data initiatives.
This Quick Start Guide is designed to walk you through the basic steps for getting started with Yale Dataverse. For additional guidance you can check our FAQs or email us at dataverseadmin@yale.edu.
Introduction and Terminology
The Dataverse Project is an open-source, research data repository software, originally created by Harvard University. Yale Dataverse is an instance of that program and refers to the whole repository, inclusive of all individual Yale users’ dataverses. Individual researchers, departments, or teams/labs create their own dataverses, within which they deposit datasets, or additional, nested (sub)dataverses.
You may wish to consult with Yale’s Research Data Management Librarian, Dr. Brandon J. Miliate (brandon.miliate@yale.edu ) to discuss best practices and advice for your specific project.
1. Log in
Signing in the first time will automatically create a new user account.
If you have an existing CAS login, go to https://dataverse.yale.edu, click “Log In” in the upper right, and then select the “Yale University” link above the drop down menu. This will take you to a page to login with your Yale NetID and password.
If you do not have Yale credentials but are collaborating with Yale researchers, submit this form to request an account. We will confirm your role in the project and provide a username and password.
2. Create a New Dataverse
Click on the “Add Data” button in the banner menu and select “New Dataverse.” You will then be able to assign a name to your dataverse, indicate the category, and customize the URL.
You will also be asked to decide what kinds of metadata you would like to have included when you or your team members add datasets. Broadly speaking, metadata should support the discoverability or your dataverse/datasets by providing identifying information (author name(s), topic, date, keywords, subject, etc.) We require a few items at minimum to ensure discoverability and accessibility and have provided several disciplinary-specific metadata collections. If you are unsure about the standard metadata recommendations for your specific field of study, you may wish to consult the Metadata Standards Catalogue.
The final step before creating your dataverse is to designate which browse/search facets should be included with your future datasets. These are a subset of the metadata fields that allow people to filter their findings in Dataverse, and are listed as categories on the left-hand side of the page. After you have designated these categories, you can click on “Create Dataverse”.
3. Add Collaborators
If other researchers or team members need access to your Dataverse to deposit data or perform other administrative tasks, they will also need to create an account on Yale Dataverse. As the owner or admin, you can assign them specific roles according to their needs and permissions.
To add collaborators to your project, navigate to your dataverse, click on the “Edit” drop down menu in the upper right, and select “Permissions”. You can change users and their role in the second drop down menu “Users/Groups.” To add a collaborator to a project they must have registered with Yale Dataverse by logging in at least once.
Dataverse recognizes 8 different roles, all with varying permissions. It is important to pay attention to the kinds of permissions given to each role, as many of these offer subtle distinctions that are not always necessary. In general, the following are the most useful roles to keep in mind:
Admin: A person who has all permissions for dataverses, datasets, and files, including approving requests to access restricted data. Users who create a new dataverse are automatically assigned this role.
Dataverse + Dataset Creator: A person who can add sub-dataverses and datasets within a dataverse.
4. Upload Datasets
Once you have created your dataverse, you can add datasets.
Navigate to your dataverse and then select “Add Data” in the upper right, and “New Dataset.” (Note that you do not want to click on the other “Add Data” option in the site banner as that will upload your dataset outside your newly created dataverse).
Before uploading you will be asked to provide the metadata that you indicated as optional or required when setting up your dataverse. Unless you are working an ISPS (Institute for Social and Policy Studies) or a Software-based project, you can ignore those two drop-down menus. You will then be able to select the files you wish to upload.
The definition of a dataset is very flexible and can consist of one or more data files or a full replication package, including code files, databases, README files, and other materials. You may choose to upload individual files/folders or compressed/zipped files. By default, Yale Dataverse unzips compressed files and displays the underlying file structure. The system also automatically processes and converts .csv and .xlsx files to .tab files. However, you have the option to download files in a range of file types or as zipped files after uploading.
The default method to upload data accommodates individual zip/compressed files up to 2.5GB. If you have individual files larger than 2.5GB, if you prefer the upload to not be unzipped or converted, we have some other upload options available for use. Please contact dataverseadmin@yale.edu so we can discuss alternative upload options.
After confirming that your files have uploaded properly and the files paths are correct, click Save Dataset. (This will NOT publish your dataset or make it discoverable. We will cover that in the next step).
5. Publish your data OR create a “preview url” to share without publishing
After you save the dataset, you will have the option to publish it. Publishing is essential to ensure that your research data is findable and accessible, and a necessary step in the data repository process.
However, if you would prefer that your dataset not be visible while it is under review at a journal or as you finalize your datasets, etc. you can create a “preview url”, which will allow you to share a private link to your data with reviewers:
- Upload your files and click on “Save Dataset.”
- On the next screen, instead of publishing, go to “Edit Dataset.”
- From the dropdown menu, select “Preview URL.”
This URL can be shared with journals and reviewers, but the data will remain invisible to anyone browsing the Yale Dataverse. Once your paper is accepted or published, you can publish the dataset to make it publicly accessible, after which the “preview url” will be deactivated.
Once published, datasets are not easy to delete. By design, Yale Dataverse is designed as a permanent repository for finding and sharing data. You are able to deaccession datasets, which will remove the actual datafiles, but retain a record in your dataverse. Specific files can also be restricted, in some cases. However, you can replace files as needed for updates and revisions.
6. Versioning a Published Dataset
Versioning is particularly important and central to The Dataverse Project. Yale Dataverse is specifically designed for sharing and storing data and not as a workspace, backup, for private storage option. However, datasets can be edited and/or new versions uploaded as needed. The system will save multiple versions in the version history.
To include a new version, you can navigate to the three dot drop down menu to the right of the dataset and select “replace” or go to “Edit” and then “File (Upload)”. When you upload a new dataset, Dataverse will automatically recognize this as a new version. All versions will be visible under the “Versions” tab on your dataverse homepage.
7. Next Steps
Yale Dataverse has many additional features to support research data management after uploading data to the repository. You can find more details on these intermediate/advanced use cases here.
If you need any additional guidance or are looking for more personalized suggestions, please reach out to dataverseadmin@yale.edu.