Data collection & cleaning

Friday, Dec 11, 2020
The aim behind this project was to gather thousands of data, clean and present in an actionable format to a young startup.

The client: A mid-size Social Impact Start-up.

About the Company: The company serves both NGOs and Sponsors to amplify the impact of social-spend. They leverage technology to solve the cross-cutting problems of the social sector.

The Challenge: There is a lack of credible and rich data source for social sector companies (NGOs). The company was forced to devote a significant amount of time and resources to collecting preliminary information about the NGOs in India. The said effort had resulted in a weekly exercise of having to go through 9 data repositories to manually collect data. The resultant data had a significantly high rate of human-error and would also require a secondary and final proof-reading prior to being called “actionable” with confidence. The company was in the growth-phase and thus required this process to be automated. The final data should have a very low error tolerance and should be immediately actionable by the teams.

The P42L Solution: During our meetings & discussions with the relevant stakeholders at the company, it was apparent that they had a “data problem”.

We proposed a three-step solution to their business-critical problem:

1.Automated Data Collection (via scrapers, crawlers, and RPA interactions) 2.Setting-up of rules for automated cleaning of the collected data

3.Creating a Data Lake and a Data Pipeline to feed into their existing system

Result: Took a total of 110 working days to develop and deliver the software to the client. The organization received a total of 2,000,000 actionable data points. The same was delivered to them in a NoSQL database and a corresponding data feeding API. The API currently feeds into their website, CRM, Project Management, and Data Analysis software.

Key Takeaways

-Created a credible data source

Through data scraping and web crawling, huge amounts of required data were validated and extracted. It was well-structured and reliable. Eliminating manual data & cleansing provided error-free data and made it readily actionable.

Improved operational efficiency and optimizing marketing, the company could gain a competitive advantage.

-Saved man-hours

The earlier manual collection of data has been replaced with automated data collection using RPA technology. Automating data collection enabled ingesting, filing, and preparing information with elevated accuracy. This saved valuable man-hours and the team could spend their efforts on strategic and more productive tasks.