Python: regex, concurrency

Regex:

Python built in re module is really handy.

Here are some usefull patterns and function which will come in handy:

# Usefull patterns:
URL_PATTERN = r"(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?"
MAN_PAGE_PATTERN = r"man\d/.+.html$"
GIT_REPO_NAME_PATTERN = r"[^/]+.git$"

# Useful wrapper around `re.search`
def get_match(pattern: str, string: str) -> str or None:
    """Scan through string looking for a match to the pattern, returning
    a Match sub string, or None if no match was found."""

    match = re.search(URL_PATTERN, tldr_summary)
    return match.group() if match else None

Concurrency:

In python with in built modules multiprocessing and concurrent converting your single process single threaded program has been pretty easy.

For example: Simple Example to convert any list comprehension to execute “concurrently”

# test_concurrency.py
import time
import timeit
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

def parallelize(func, iterable, use_thread=True, max_workers=5):
    pool_executor = ThreadPoolExecutor if use_thread else ProcessPoolExecutor
    with pool_executor(max_workers=max_workers) as executor:
        data = list(executor.map(func, iterable))
    return data

def square(x: int) -> int:
    time.sleep(1e-3)
    return x ** 2

# For invoking multi processing we need __main__ module defined !! :)
if __name__ == "__main__":
    input_iter = list(range(100))
    number = 10
    repeat = 5
    print("1 prcess 1 thread", timeit.repeat(lambda: [square(x) for x in input_iter], repeat=repeat, number=number))
    print("1 prcess 5 thread", timeit.repeat(lambda: parallelize(square, input_iter), repeat=repeat, number=number))
    print("5 prcess 1 thread each", timeit.repeat(lambda: parallelize(square, input_iter, use_thread=False), repeat=repeat, number=number))

Execution:

$ python test_concurrency.py
1 prcess 1 thread [1.2861979760000002, 1.298199268, 1.2918409180000001, 1.281806194, 1.295781883]
1 prcess 5 thread [0.28324162100000017, 0.2827319930000005, 0.2845571439999999, 0.2883606469999993, 0.2812394300000003]
5 prcess 1 thread each [1.5489112939999998, 1.5780382430000017, 1.5727520679999998, 1.5113713640000004, 1.524860704]

Important Note:

Very usefull references: