-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial: strings
This tutorial provides an overview of the string manipulation utilities available in the thalesians.adiutor.strings module, with examples of how to sanitize strings and generate unique strings.
The thalesians.adiutor.strings module provides tools for cleaning, formatting, and generating unique strings. These utilities are particularly useful in data processing and ensuring consistent string formatting across applications.
The sanitize_str function takes a raw string and transforms it into a sanitized, lowercase string suitable for use as an identifier or file name.
- raw_str: The raw string to sanitize.
- sanitized_str: The sanitized string, with special characters removed or replaced by underscores.
import thalesians.adiutor.strings as our_strings
raw_string = "Hello, Hélyette?! This is a__raw_str"
sanitary = our_strings.sanitize_str(raw_string)
print(sanitary)hello_helyette_this_is_a_raw_str
The make_unique_str function ensures that a given string is unique within a list of existing strings by appending a numeric suffix if necessary.
- base_str: The base string to make unique.
- existing_strs: A list of strings against which uniqueness is checked.
- unique_str: A unique string derived from the base string.
import thalesians.adiutor.strings as our_strings
existing_strings = ["foo", "foo_1", "foo_3"]
unique_str = our_strings.make_unique_str("foo", existing_strings)
print(unique_str)foo_2
Here are minimal unit tests to validate the functionality of the sanitize_str and make_unique_str functions:
import unittest
import thalesians.adiutor.strings as our_strings
class TestStringUtils(unittest.TestCase):
def test_sanitize_str(self):
self.assertEqual(
our_strings.sanitize_str("Hello, Hélyette?! This is a__raw_str"),
"hello_helyette_this_is_a_raw_str")
def test_make_unique_str(self):
self.assertEqual(our_strings.make_unique_str("foo", ["bar"]), "foo")
self.assertEqual(
our_strings.make_unique_str("foo", ["foo", "foo_1", "foo_3"]),
"foo_2")
if __name__ == "__main__":
unittest.main()The thalesians.adiutor.strings module offers simple yet effective tools for string manipulation. The sanitize_str function ensures strings are formatted in a consistent, machine-friendly manner, while make_unique_str helps avoid naming conflicts in datasets or file systems.