At Wellthy, we use dbt to build and maintain our data transformations. dbt is a open source, command line tool that lets data teams quickly and collaboratively deploy analytics code following software engineering best practices using SQL. For more information about dbt you can reference the documentation here or feel free to test it out by following their online tutorial, but please note this is not required for this interview.
For the purpose of this challenge, think of dbt as a tool that enables you to build and test tables (data models) using SELECT statements. Each data model that is created in a dbt project is stored in a .sql file. Since data models are often dependent on each other, dbt creates a directed acyclic graph (DAG) that allows you to see these dependencies. This is done by using jinja like so {{ ref('some_model_name') }} to reference tables in the FROM statement.
Data models can be configured and data tests and documentation can be added using YAML files. Click here for more information about dbt tests or here for more information about documentation.
At Wellthy, we support families in caring for their loved ones, whether it's for their spouse, parents, children, or even the next-door neighbor who's just like family. We do this by seamlessly integrating technology and personalized care support to help caregivers tackle the logistical and administrative tasks of caring for the ones they love including themselves.
This process starts with a caregiver creating a care project on the Wellthy platform. The caregiver can then create or choose tasks based on their needs, and their care project will be assigned to a care coordinator who will complete their tasks. If you are interested in learning more about Wellthy’s service you can read more at here.
Wellthy’s main source of revenue is through employer benefit programs. Companies (clients) will sign a deal to offer Wellthy as a benefit to its eligible employees. In order to increase revenue and retention, we want to make sure we are encouraging use of Wellthy and offering a great experience to caregivers and their families. Therefore, it is important that our care coordination team is staffed appropriately to meet demand. We aim to staff according to normal levels of activity and not surges which can be handled by offering overtime incentives or asking other Wellthy employees to help out with tasks. This helps us keep costs down while still ensuring we can complete caregiver tasks in a timely manner.
A data analyst comes to the analytics engineers and says that the Care Team is struggling to meet capacity. One manager hypothesizes that this is due to an increase in the number of care projects and they need to hire more people. However, another manager thinks that there is an efficiency problem and certain tasks or projects are causing bottlenecks in the process. The data analyst needs to be able to analyze the data and deliver insights to the Care Team so they can make a decision on how to handle the capacity problem.
- How many projects are starting each week?
- How many
in progresstasks are in each project? - How long is it taking to complete tasks and projects?
- Is there a particular phase of a task that takes longer than others?
- Is there a particular project type or task type that takes longer than others?
An analytics engineer on the team has started modeling out this data to allow the analyst to answer these questions more easily. The analytics engineer has submitted a pull request (PR) for you to review.
Prior to your technical interview, please review the files in the base, intermediate, and prod subdirectories as well as the open PR. Make note of any questions you might have about the sample data model - you will have an opportunity to ask them during the interview.
During your technical interview, we will discuss your review of the PR in a collaborative session. You will not be asked to do any coding yourself, but please be prepared to share your screen and discuss what changes, suggestions, or questions you would include in your PR review. Please note: You do not need to submit anything ahead of the interview or prepare a presentation of any kind.
Our dbt structure is separated across multiple databases and schemas in Snowflake and transformations are performed in different "layers" to reduce the repetition of logic across multiple models.
- The
prepdatabase contains all of the early transformations that are not directly accessed by BI Tools or external processes.- The
baseschema contains a 1-1 relationship with the source tables and includes the individual table transformations and column aliasing. Every source table should have a correspondingbasemodel. - The
intermediateschema contains early combinations of records that are used as stepping stones to produce additional downstream transformations. One example of this could be combining project information from theprojectsmodel and theproject_historymodel into the more comprehensivefull_project_historymodel. This sepration will allow for leveraging the same models downstream without repeating the same joins multiple times.
- The
- The
proddatabase contains the "final" transformations to produce the models that are accessed directly by BI Tools and external processes.
Since this is an incomplete dbt project, you are unable to run the models yourself to verify their accuracy. As such, please assume the following:
dbt runresults in a successful build of all models.dbt testresults in a successful test of all models.
