When working on a data-related project, it’s essential to perform tests and make sure the specifics of what you’re doing function as they should. It’s often impossible to make that verification without using test data sets.
The internet offers numerous places to get those, thereby keeping your project on schedule and boosting its chances of success. Here are six of them.
FiveThirtyEight is a current affairs website that provides the public with the data used for its articles and infographics. It got its start as a polling aggregator solely focused on political topics but has since branched out to cover sports, societal matters and more.
You can also visit the FiveThirtyEight Github. The data there ranges from information about which states have the worst drivers to the economic worth of different college majors. The broad range of information makes it an excellent resource for continuously curious people.
This website has a wealth of information beyond data sets, but it’s easy to narrow down your search. After arriving at the Kaggle homepage, look for the search box at the top of the page. Then, use the “in: datasets” tag.
For example, to get data about shopping, enter “shopping in: datasets” into the search box. Alternatively, click the Datasets menu at the top of the homepage to browse instead of getting specific. There is also a search box at the top right of the primary data section.
Digging into a particular data set is simple. Click on the link associated with one of them. Then, choose the Data tab at the top of the page to get the necessary files.
Representing an initiative from the U.S. government to make the data it collects more accessible to the public, this website is one of the places offering free data sets for people who need or want them.
The site is refreshingly user-friendly and breaks down the data by topic in addition to enabling keyword searches. Also, Data.gov offers more than 100,000 data sets with more added every night.
Many people say machine learning is taking over our lives for the better. Whether your data science project is for something related to machine learning or not, Data.gov highlights how so much of the information collected today is associated with human existence and has the potential to improve it.
4. Software with sample data sets included
Some tools come with built-in data sets for you to use.
“By displaying location or address-based business data against an accurate map, the map viewer can visualize their typical business data in a new way,” says Geoffrey Ives, President of Map Business Online. “By including both location-based map layers and demographic data in Map Business Online we have increased the value of these map visualizations.”
If you need data related to geography or population, Map Business Online sources material from the U.S. Census Bureau and Geolytics, Inc for users. Plus, it includes data from Canada and the United Kingdom. You can get statistics related to ethnicity, occupation, marital status and much more.
Similarly, people who purchase the Statistics and Machine Learning Toolbox from MathWorks get various sample data sets to work with as well. They include simulated data about hospitals, mileage information for particular kinds of cars and even statistics about popcorn.
If you’d rather not search for data to import into the tools you use, consider options like those discussed directly above. The built-in information they offer could streamline your data science processes.
There are data sets for numerous purposes, and you may need a particular type for a current project. If you’re making a tool that gives recommendations to people, the GroupLens site offers its MovieLens data sets that could help you.
As the name suggests, it has information about films — specifically, the ratings attributed to those movies by the people who watched them. One of the data sets offers 20 million ratings.
Most of the data sets mention the number of movies and ratings contained within. If you’re experimenting with big data, pay attention to those figures in your research.
6. Climate data online
The information on Climate Data Online is in expandable sections related to seasonal temperatures, wind direction, hourly precipitation and other topics related to the Earth and its detectable characteristics.
Click on one of the topic headers to expand the information below it. Use the Documentation and Data Samples drop-down menu to get a spreadsheet’s worth of content for your project.
Reliable data without hassles
These sites highlight how valuable data is only an internet search away, and much of it is available for free — either through a trial period or entirely open access.
Instead of relying on too much guesswork when working on data-centric projects, use these sets to test for the desired function.