If you needed to analyze billions of highly-customized data points — and then turnaround and contextualize, interpret and make the data actionable for a non-technical audience — where would you start?
That’s the question Yello’s engineering team set out to solve earlier this year. Yello, a Chicago-based recruiting software company that helps some of the world’s largest companies hire students on campuses every year, houses terabytes of clients’ data, which includes career fairs, resume submissions and interviews.
The reporting opportunities are limitless, according to team members, but the company has learned that busy campus recruiters value more prescriptive, automated insights rather than analyzing data on their own.
Led by an in-house data team looking to grow their skills in business intelligence, Yello started to rethink its reporting functionality from the ground up. The goal was to build a foundation for real-time, interactive data analysis dashboards that could visualize those billions of data points into easily digestible takeaways for busy campus recruiters.
“For our clients, and for Yello, the purpose of this project is to enable data-driven decision-making,” Yello Data Services Manager Stella Nisenbaum said. “We want to expand our reporting to tell a story that allows clients to see the big picture.”
To do that, the team built new infrastructure — including a “data lake of data lakes” and multiple data warehouses — and tested out reporting and visualization tools that could increase flexibility and help scale the tool after launch.
But before they got started, the team got reacquainted with the basics: understanding user needs.
Enabling data-driven recruitment decisions
To make decisions about event planning, recruiters look at data on past event attendance, expenses and staffing, as well as more specialized metrics. For example, a recruiter might consider an event more valuable and worth repeating if it has high throughput, meaning that a large proportion of event registrants made it through the pipeline to the interview stage.
Recruiting managers also use event data to make determinations about resource allocation, like the number of materials and the number of staff members who need to be sent to each event. Business leaders may also pull from the same datasets to prove big picture points, like showing the ROI of specific university relationships or attendance at particular industry conferences.
The development team at Yello saw an opportunity to take the company’s reporting to the next level by adding a self-service option that would empower clients to access that data more directly — but first, they’d have to build out some new infrastructure.
The first challenge the development team faced was a big one: Yello’s data is client-owned and must be kept private and contained. Pooling all of it together in one data lake — as is standard practice at most organizations — wasn’t an option.
“We follow compliance requirements to keep our clients’ data in isolation, which means each client’s data has to remain in its own database,” said Neelesh Gupta, a senior data engineer at Yello. The company accommodated these requirements by creating one giant operational-supply data lake to house a collection of client data lakes and databases — an approach that’s atypical in data warehousing.
DATA TALK
Additionally, because Yello had given clients the flexibility to ingest different and completely customized datasets, the new reporting solution would also have to process each client’s database and reporting schema individually.
By implementing metadata-driven processing, Data Architect Ganesha Thatapura said they could create a common platform to process multiple clients’ data at scale while maintaining data isolation for compliance. The platform manages each client’s data lake and data warehouse based on a common template and provides for rule-driven customization, per clients’ needs.
The team also had to plan for large discrepancies in the size of their client databases, which range from a few dozen megabytes to hundreds of gigabytes.
“We had to figure out how to build our framework and our first couple of deliverables in such a way that we can easily expand and scale,” Nisenbaum said.
Assembling a user-friendly reporting toolkit
Once the team identified its core challenges — keeping data isolated, while building a reporting tool that could run different schemas and scale to database sizes — they next had to decide on the tools to use.
“We started with several uses to explore different tools,” Gupta said. “This allowed us to see how tools come up across different points and constants that we use with our customers, like multi-tenancy support, security, support of dynamic fields and visualizations.”
Upon testing, Yello moved forward with Postgres, as well as AWS’s cloud infrastructure and additional services. Gupta said they enrolled AWS Database Migration Service to copy data from existing operating systems to other databases, AWS’s ETL program Glue for performing any type of transformations on top of existing data and AWS Lambda for orchestration of flowing data.
Because integration and scaling were key considerations — as well as user-friendly visualizations for data — Yello chose Looker, which allows the team to define the underlying schema, metrics and the universe of what’s available for building isolated custom processing at scale.
Then, according to Nisenbaum, the team could drag and drop items to set the framework for a client-issued visualization, allowing for self-service business intelligence.
For achieving performance, as well as to minimize reporting tool dependence, the complexities of metrics calculations were embedded in the data warehouse design, Gupta said. This ensured the major part of the metrics computation time to be part of the batch processing window and, therefore, allow faster access to the metrics at run-time. In addition, Gupta said it provides easier portability of metrics across multiple reporting tools.
‘Rave reviews’ on new dashboards
Now, with the efforts of Nisenbaum, Gupta, Thatapura and others on the team, Yello’s participating clients can harness the power of data themselves through self-service business intelligence. According to Yello, the new dashboards have earned rave reviews from early adopters who are excited about their potential to increase visibility and accelerate decision-making.
With their new data infrastructure and reporting tool already in place, the development team will continue to roll out dashboards for other categories of data, beyond just event metrics.
“This and other projects that take advantage of Yello’s newfound data capabilities will keep our team busy — and deeply connected to Yello’s mission of connecting recruiters and college students — for the foreseeable future,” Nisenbaum said.