StreamFlix Analytics Engineer Tech Test

Welcome to the StreamFlix technical assessment!

Scenario

StreamFlix is a fictional video streaming service. We are looking to better understand our Monthly Recurring Revenue (MRR) and Churn rates. You have been given a raw dbt project and access to our raw data dumps.

The Data

You will find three CSV files in the seeds/ directory:

raw_users: User account information.
- user_id: Unique identifier (mostly).
- created_at: Account creation timestamp.
- country: User's country code.
- marketing_channel: How the user found us.
raw_subscriptions: Subscription history.
- subscription_id: Unique identifier.
- user_id: Foreign key to users.
- plan_type: Basic, Pro, or Premium.
- status: 'active' or 'cancelled'.
- start_date: When the subscription started.
- end_date: When it ended (NULL if active).
raw_payments: Payment transaction logs.
- payment_id: Unique identifier.
- subscription_id: Foreign key to subscriptions.
- amount: Transaction amount.
- payment_date: Date of payment.
- status: 'success' or 'failed'.

Setup & Running

This project is configured to use DuckDB, so you don't need to set up an external database. If you are not confident running locally, use a github workspace with a blank template.

Prerequisites

Python 3.8+ installed.

Installation

Clone the repository.
Install the required dependencies:
```
pip install -r requirements.txt
```
This installs dbt-duckdb for the project and pandas for running Python analysis.

Running the Project

To build the models and run the tests:

dbt build --profiles-dir .

This will:

Load the CSV seeds into a local dbt.duckdb file.
Run your models.
Run your tests.

Your Task

Please use dbt to model this data and answer the following business questions.

1. Data Cleaning & Staging

Create staging models (stg_) to clean the raw data.

Note: The data engineering team says the source systems are a bit "messy". Watch out for duplicates, test accounts (User ID 999), and data inconsistencies.

2. Data Modeling

Build the following models in your marts layer:

dim_users: A user dimension table showing the user's current subscription status and lifetime value.
fct_mrr: A monthly snapshot fact table that shows the MRR (Monthly Recurring Revenue) for each user for each month.
- Tip: Only successful payments count towards revenue.

3. Analysis

Create a simple analysis (SQL file in analyses/ or a dashboard description) to answer:

What is the Churn Rate for the last 3 months?
Which Marketing Channel has the highest average Lifetime Value (LTV)?

4. Testing

Add dbt tests to ensure your models are reliable. We expect to see at least:

Unique and Not Null tests on primary keys.
Accepted values tests where appropriate.

Hint: You can define tests in a YAML file in the models/ directory. For example:

version: 2
models:
  - name: dim_users
    columns:
      - name: user_id
        tests:
          - unique
          - not_null

Running Analysis with Python

You can query the DuckDB database directly using Python to perform advanced analysis or generate reports.

Running the Analysis Script

A script run_analysis.py is included. It executes compiled SQL files generated by dbt. This allows you to write your analysis in dbt/SQL (in the analyses/ directory) and run them via Python.

Compile your dbt project:
```
dbt compile
```
This generates the executable SQL in target/compiled/streamflix/analyses/.

Run the python script with the filename:

python run_analysis.py new_users_last_month.sql

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analyses		analyses
models		models
seeds		seeds
tests		tests
.gitignore		.gitignore
.user.yml		.user.yml
README.md		README.md
dbt_project.yml		dbt_project.yml
profiles.yml		profiles.yml
requirements.txt		requirements.txt
run_analysis.py		run_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StreamFlix Analytics Engineer Tech Test

Scenario

The Data

Setup & Running

Prerequisites

Installation

Running the Project

Your Task

1. Data Cleaning & Staging

2. Data Modeling

3. Analysis

4. Testing

Running Analysis with Python

Running the Analysis Script

About

Uh oh!

Releases

Packages

Languages

BuildCircle/dbt-kata

Folders and files

Latest commit

History

Repository files navigation

StreamFlix Analytics Engineer Tech Test

Scenario

The Data

Setup & Running

Prerequisites

Installation

Running the Project

Your Task

1. Data Cleaning & Staging

2. Data Modeling

3. Analysis

4. Testing

Running Analysis with Python

Running the Analysis Script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages