BERKELEY — Ponder, a data science startup born out of UC Berkeley, announced $7 million in seed funding to develop scalable, enterprise-ready and easy-to-use machine learning and analytics tools. Lightspeed Venture Partners led the round with Intel Capital, 8VC, and The House Fund participating.
Today, organizations are pouring time and money into data science initiatives, only to realize that they are not seeing returns on their investments. Take pandas, a Python library widely regarded as the most popular tool in data science, used by millions of data scientists to prepare, transform, and analyze data in machine learning workflows. Despite its widespread adoption, pandas becomes unusable on large datasets that are now the norm. Extensive engineering cycles are then wasted on rewriting pandas workloads into big data frameworks, leading to fewer models and insights in production. The resulting data pipelines also become difficult to maintain and debug.
Ponder is commercializing its popular open-source tools, Modin and Lux, to address pandas usability challenges at scale—all without changing the existing ways data teams work with their data. Modin is a scalable “drop-in replacement” for pandas, meaning that data scientists can seamlessly scale up to large datasets, without the users having to change a single line of code. Lux, on the other hand, is a visualization tool for pandas that automatically identifies visual insights on large and complex datasets, once again without changing a single line of code.
“We’ve spoken to dozens of data teams at this point, and a universal sentiment was that they use pandas extensively, but run into performance problems at scale, causing them to have to redo their work from scratch,” said Doris Lee, CEO of Ponder. “With Ponder’s tools, data scientists no longer have to pick between convenience and scale: they can get both.”
Ponder’s technology is used by 10 of the Fortune-100 companies and across sectors, ranging from pharmaceutical companies like Bristol Myers Squibb and GSK, to technology companies like Intel and VMware, to automotive companies like Ford and Tesla. The open-source tools have been downloaded over 2.5M times. At one ecommerce company, Modin helped scale up their data pre-processing pipelines to use 1000 times more data with orders-of-magnitude improvements in performance. Lux has been used for insight discovery in a variety of settings, from detecting anomalies in mobile networks to diving into experimental data for drug discovery at one of the world’s largest pharmaceutical companies.
“Pandas is the de-facto swiss-army knife of data science, leveraged across industries for data exploration and machine learning. Unfortunately, it presents users with roadblocks when working with even moderately large datasets,” said Gaurav Gupta, partner at Lightspeed. “Ponder’s open-source technology addresses these issues, positioning the company to become a market leader in scalable data science.”
Ponder’s origins lie in the UC Berkeley RISELab, which has produced several successful Silicon Valley startups, including Databricks and Anyscale. “Ponder’s technology is based on many years of cutting-edge research that we did to bridge usability and scalability in data science tooling. And the impact is enormous: we are making scalable data science accessible to millions of data practitioners who live and breathe pandas,” said Aditya Parameswaran, President of Ponder, also a Professor at UC Berkeley.
With the new funding, the remote-first company is looking to significantly scale its team in 2022 to meet the rising demand of enterprise customers and to continue growing and supporting the open-source community of users.