functional_python/notes.md at master · LucaJiang/functional_python

Good afternoon, everyone. I'm really happy to meet you here at PyConHK. I'm Jiang Wenxin. This work is done in collaboration with Yin Jian. We are both PhD students at CityUHK. While object-oriented design gets most of the attention in Python, there's another way of thinking that can make your code clear and reliable. That is functional programming. Today, I'm going to show you how a few simple ideas from functional programming, like writing predictable functions and keeping immutability in your data, can make your Python code easy to read, simple to test, and a enjoyable to work with. Let's get started.

A bit about myself: I'm Wenxin Jiang, a PhD student at CityUHK. I spend my time working with large genetics datasets to uncover insights in genomics and biology. In my research, I use both Python and R to analyze complex data. Like many of you, I'm always searching for ways to write code that's not just correct, but also clear and reliable. This passion is what drove me to explore functional programming in Python. And I'm truly excited to share what I've discovered with all of you today.

Let's start with a simple example: a table of student scores. Each row contains a student's name, class, subject, and their score. But we notice that one entry has Error indicating the data was missing or corrupted. This kind of issue is very common in real-world data. Sometimes they are NA, or even wrong types like strings instead of numbers. In real projects, I believe most of us need to start our analysis with such messy datasets. Before diving into tasks like calculating grades, or finding averages for each subject or class, we need to carefully clean the data and handle errors first.

This is imperative style, don't need to read through this. It's just an example to illustrate the ideas. This is the traditional way, and the key point is that it's very detailed and step-by-step. In imperative style, we tell the computer exactly how to do each task. For example, we write a for loop to process each row, use try-except to handle errors, and rely on if-elif chains to calculate grades. We also manually update statistics step by step. It works, but it's long and hard to follow, especially as things get more complex. You have to read through all the details to understand the overall logic. I always get lost in such code.

Now, with the functional approach, we can focus directly on what our goal is, without worrying about the details of how to achieve it. We pack each step into a small simple function. And each function handle a specific task, such as adding grades, filtering scores, or calculating averages. Linking these functions together using methods like pipe, we create a clear and logical pipeline. This makes the main code short, readable, easy to follow, and simple to test and modify.

The core principle of functional programming is function and especially pure function. A pure function is one that does not have side effects. This means it doesn't change any data outside of itself. We also avoid using global variables that can be changed from anywhere. This keeps our functions isolated and predictable. And, it's like a simple math equation: it takes some input, gives you an output. And if you give it the same input again, it will always return the same output. Each function should focus on doing just one thing well. Pure functions make your code predictable and easy to test and maintain.

This table highlights the two key principles of functional programming. On the function side, we want no side effects. Functions shouldn't unexpectedly change any other parts of your program. On the variable side, we want immutability. Data shouldn't be changed once created. These principles are connected: if your data is immutable, your functions naturally avoid side effects, because they can't modify the data. As I will mention in my another talk, recent Python versions now support true parallelism, which is a really exciting development. Functional programming helps here too, by ensuring code stability in multi-threaded tasks.

As I have mentioned, immutability is a key idea in functional programming. But why does it matter so much? What's the problem with ignoring immutability? Let me show you a common pitfall. Imagine a simple task that we want to add a bonus to some scores. We have a original scores, and we have bonus scores after adding 10 points. But this function somehow changes the original list! Because two score variables point to the same list in memory. Some students' scores might get an unexpected bonus, which is totally unfair!

Now let's look at the functional solution. Instead of changing the original data, here in this function, we create a brand new list with the updated values. The original stays safe and unchanged. This is why functional programming gives me greater confidence in my code. Without it, I've often spent hours tracking down bugs caused by some other function hundreds of lines away. When I found it, it was like a Goblin suddenly jump out and say "Surprise! I changed your data!".

So why choose functional programming? First, readability: focus on what you want, not how to do it. It's like giving a command to come to CityU without listing every detail, like go to Kowloon Tong, then Festival Walk. Second, modularity: small functions act like logo blocks. You can build complex behavior by combining simple, well-defined pieces. Each function does one thing, and does it well. Third, testability: pure functions are predictable and isolated. You can easily write tests for them without worrying about the rest of your program. In the end, it's about writing code that's easier to read, test, and maintain. No surprises, no getting lost.

Well. If you are familiar with R language, you must know the pipe operator. It make data analysis code super clean and easy to read. If not, no worries, let me inspire you with a mathematical perspective on function composition. There are two common approaches: the traditional nested style and the pipeline style. Nested code is like a maze — you have to find and start at the innermost function and work your way outward. It's really easy to get lost. The pipeline style, on the other hand, tells a much more straightforward story: first apply f, then pipe to g with argument a, then to h with b. The flow is simple, clean, and easy to follow.

Another analogy for function composition is railway tracks. Imagine building a railway system where each piece of track serves a specific purpose. Functional programming is a practical way to build complex behavior from small, reliable pieces. Think of each function as a small piece of track: one turns a pineapple into an apple, another turns an apple into a banana. We can compose them because the output of one function matches the input of the next. Each piece is small, testable, and reusable. When requirements change, we can simply add or adjust some functions without rewriting the entire track.

Now let's add the second track to handle errors. Top green track for normal values, bottom red track for errors. Each function acts like a switch: if it succeeds, the flow continues on the green track; if it fails, we drop to the red track, bypassing the rest of the pipeline. In Python, we can raise an exception automatically redirects the flow to the nearest except block — that's our red track. We write small, composable steps where each function either returns a value or raises a specific error. All errors are then handled in one centralized place. Therefore, our main pipeline stays clean and focused on the happy green path. And we don't need nested try-except blocks everywhere.

There are also some other useful functional concepts in Python that can makes our life easier. Here's a brief overview. Iterators let you loop over data without loading it all into memory. Map, filter, and reduce allow you to apply functions to collections in a clean and efficient way. List comprehensions provide a concise way to create lists. Generators allow you to produce sequences of values on the fly. And lazy evaluation delays computation until it's really needed, which can boost performance.

Thank you for your attention. You can find me on GitHub at LucaJiang, and the slides of this talk are available online.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

notes.md

Latest commit

History

notes.md

File metadata and controls