In addition to the trove of books, the Institutional Data Initiative is also working with the Boston Public Library to scan millions of articles from different newspapers now in the public domain, and it says it’s open to forming similar collaborations down the line. The exact way the books dataset will be released is not […]
Earlier this month the 2024 edition of the HTTP Archive’s Web Almanac was published – our industry’s bi-annual “State of the Web” report. This is the second edition with a chapter focussed on digital sustainability, and the second using our Green Domains dataset. In this post we share some of the key highlights and takeaways […]
Data Data Science Education How-To’s Python Whether you’re starting a new project or expanding an existing one, as a data scientist, you’re always on the lookout for new material to explore. Knowing where to get data for data science projects can be challenging, and finding “good data” can be even more difficult. In this article, […]
Since 2006, we’ve been building the world’s largest database tracking which parts of the internet run on renewable power – the Green Web Dataset. The Dataset powers a lot of our open source work, including the Green Web Check, Green Web Directory, and our Greencheck API. Over 7 million checks per day are made against […]