Welcome to the StreamFlix technical assessment!
StreamFlix is a fictional video streaming service. We are looking to better understand our Monthly Recurring Revenue (MRR) and Churn rates. You have been given a raw dbt project and access to our raw data dumps.
You will find three CSV files in the seeds/ directory:
raw_users: User account information.user_id: Unique identifier (mostly).created_at: Account creation timestamp.country: User's country code.marketing_channel: How the user found us.
raw_subscriptions: Subscription history.subscription_id: Unique identifier.user_id: Foreign key to users.plan_type: Basic, Pro, or Premium.status: 'active' or 'cancelled'.start_date: When the subscription started.end_date: When it ended (NULL if active).
raw_payments: Payment transaction logs.payment_id: Unique identifier.subscription_id: Foreign key to subscriptions.amount: Transaction amount.payment_date: Date of payment.status: 'success' or 'failed'.
This project is configured to use DuckDB, so you don't need to set up an external database. If you are not confident running locally, use a github workspace with a blank template.
- Python 3.8+ installed.
- Clone the repository.
- Install the required dependencies:
This installs
pip install -r requirements.txt
dbt-duckdbfor the project andpandasfor running Python analysis.
To build the models and run the tests:
dbt build --profiles-dir .This will:
- Load the CSV seeds into a local
dbt.duckdbfile. - Run your models.
- Run your tests.
Please use dbt to model this data and answer the following business questions.
Create staging models (stg_) to clean the raw data.
- Note: The data engineering team says the source systems are a bit "messy". Watch out for duplicates, test accounts (User ID 999), and data inconsistencies.
Build the following models in your marts layer:
dim_users: A user dimension table showing the user's current subscription status and lifetime value.fct_mrr: A monthly snapshot fact table that shows the MRR (Monthly Recurring Revenue) for each user for each month.- Tip: Only successful payments count towards revenue.
Create a simple analysis (SQL file in analyses/ or a dashboard description) to answer:
- What is the Churn Rate for the last 3 months?
- Which Marketing Channel has the highest average Lifetime Value (LTV)?
Add dbt tests to ensure your models are reliable. We expect to see at least:
- Unique and Not Null tests on primary keys.
- Accepted values tests where appropriate.
Hint: You can define tests in a YAML file in the models/ directory. For example:
version: 2
models:
- name: dim_users
columns:
- name: user_id
tests:
- unique
- not_nullYou can query the DuckDB database directly using Python to perform advanced analysis or generate reports.
A script run_analysis.py is included. It executes compiled SQL files generated by dbt. This allows you to write your analysis in dbt/SQL (in the analyses/ directory) and run them via Python.
- Compile your dbt project:
This generates the executable SQL in
dbt compile
target/compiled/streamflix/analyses/. - Run the python script with the filename:
python run_analysis.py new_users_last_month.sql