Yao Lirong's Blog

Python Manual

2020/11/29

List

  • Find Median of List in Python: statistics.median(list)
  • Union of two lists: to remove all repetitions, use res = list(set().union(lst1, lst2, lst3, ...))
  • concatenate two lists: lst1 + lst2
  • Hash a list: you cannot hash a list, because list is mutable. You can only hash immutable objects. Therefore, to hash a list, you first convert it to a tuple: hash(tuple([1,2,3])).
  • Take only the elements indexed at xth multiple to form a new list: lst[::x] so the third argument of : means step, just as in range.
  • Generate range with floats: use np.arange(start, stop, step), or np.linspace(start, stop, num_wanted)
  • Unzip a zipped list: list(zip(*test_list)) note * is the unpacking operator to unpack the iterable into separate elements that can then be passed as arguments to the zip() function.

Dict

  • Sort a dictionary: dct= dict(sorted(dct.items(), key=lambda item:item[0])) to sort by keys; change to item[1] to sort by values.
  • Remove an item from dict by key: dct.pop(your_key)

Function

  • Access & Change global variable in local functions: you can access global variable in local functions without any other keywords. However, if you want to change the global variable in your local function. You will have to use the global keyword.

    By using a global keyword, you can either create a global variable in a local function, or link back to a global variable already created.

    1
    2
    3
    4
    5
    6
    7
    x = "h"

    def myfunc():
    global x
    x = "fantastic"

    myfunc()
  • Change Variable in an Outer Scope: Similar to the global keyword, we have a nonlocal keyword for this purpose.

    1
    2
    3
    4
    5
    6
    7
    def foo():
    a = 1
    def bar():
    nonlocal a
    a = 2
    bar()
    print(a) # Output: 2
  • Multiple number of arguments to a function:

    1
    2
    3
    def foo(a, b, c, *others):
    print(a, b, c)
    print("And all the rest are ", list(others))
  • Import a Custom Module: the same as import, but now the module name can be a variable instead of a static string

    1
    2
    3
    package_name = "numpy"
    package = __import__(package_name)
    package.array()

Class

  • print a class like Java’s toString :

    1
    2
    3
    4
    5
    class Test:
    def __repr__(self): # what to display when looked at in an interactive prompt
    return "Test()"
    def __str__(self): # what to print when called print(Test)
    return "member of Test"
  • Self-defined comparator:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    class CustomNumber:
    def __init__(self, value):
    self.value = value

    def __lt__(self, obj):
    """self < obj"""
    return self.value < obj.value

    def __le__(self, obj): """self <= obj"""
    def __eq__(self, obj): """self == obj"""
    def __ne__(self, obj): """self != obj"""
    def __gt__(self, obj): """self > obj"""
    def __ge__(self, obj): """self >= obj"""
  • hash on a custom object:

    1
    2
    3
    4
    5
    6
    7
    8
    class Emp:
    def __init__(self, emp_name, id):
    self.emp_name = emp_name
    self.id = id

    def __hash__(self):
    # when you want to get the hash, use hash(instance_of_custom_object)
    return hash((self.emp_name, self.id))
  • Local variable in a class:

    • Elements outside the __init__ method are static elements; they belong to the class.
    • Elements inside the __init__ method are elements of the object (self); they don’t belong to the class.
    1
    2
    3
    4
    class MyClass:
    static_elem = 123 # static
    def __init__(self):
    self.object_elem = 456 # specific to eacy instance

Exception

Self-specified exception:

1
2
3
4
5
6
7
8
9
10
11
12
class MyCustomError(Exception):
def __init__(self, *args):
if args:
self.message = args[0]
else:
self.message = None

def __str__(self):
if self.message:
return 'MyCustomError, {0} '.format(self.message)
else:
return 'MyCustomError has been raised'

try catch clause in python:

1
2
3
4
try:
print(x)
except:
print("Exception thrown. x does not exist.")

String

  • convert string to int: int(s)

  • How to remove the leading and trailing spaces in Python: my_string.strip()

  • 合并一个 String List: "".join(str_lst)

  • Join with seperator: ",".join(str_lst)

  • advanced split with re : re.split("split_on_what_in_regex", str)

  • Extract characters from a string: "".join(re.findall("[a-zA-Z]+", str))

  • Convert String of Digits into a List of Digits: and just to characters

    1
    2
    3
    4
    5
    6
    num = 2019

    # If you want a list of integers
    res = [int(x) for x in str(num)]
    # If you are good with a list of characters
    res = list(str(num))
  • Format float to scientific computing: print("a = %.2e" %(num))

  • String Format in General: f-string is a new feature since python 3.6 and you should use it as string formatting convention f"iter: {i}"

    • to align signs:

      '+' indicates that a sign should be used for both positive as well as negative numbers.
      '-' indicates that a sign should be used only for negative numbers (this is the default behavior).
      ' ' indicates that a leading space should be used on positive numbers, and a minus sign on negative numbers (most used)

    1
    2
    3
    4
    5
    6
    7
    8
    # scientific format with f-string
    f'{num:.5e}'
    # float number, use space to also align negative sign
    f'{num: .3f}'
    # align integers to have a fixed length
    f'{num:3d}
    # f-string braced evaluation also supports everything (including functions)
    f"{"Eric Idle".lower()} is funny."

Data Structures

  • Queue: Python 用的不是 enqueue dequeue,而是 put get

    1
    2
    3
    4
    import queue
    q = queue.Queue()
    q.put(s)
    v = q.get()
  • Priority Queue:

    1
    2
    from queue import priorityQueue
    q = PriorityQueue()

函数式编程

IO

Profiling

Creating Profiling Data

1
2
3
4
5
6
import cProfile
profiler = cProfile.Profile()
profiler.enable()
# Code goes here
profiler.disable()
profiler.dump_stats("execution.stats")

Inspecting Profiling Data

1
2
3
import pstats
stats = pstats.Stats("example.stats")
stats.print_stats()

The following columns will be shown:

  • ncalls: number of times function was called
  • tottime: amount of time spent in the function (not counting any time spent in subfunctions)
  • percall: tottime / ncalls
  • cumtime: all the time spent in the function and subfunctions
  • percall: cumtime / ncalls
  • filename:lineno(function): name of function that was called and where it is defined

A fairly common practice is to sort by one of the above attributes. Or to look at its callees to see where that function wound up spending time. You can also perform the inverse, and look up a function’s callers. This can be helpful if you have a function that is taking a lot of time, but you don’t know who is calling it.

1
2
3
stats.sort_stats("cumtime").print_stats(2) #print first 2 functions that spent highest cumulative time
stats.print_callers("cprofile_example.py:7") # 7 is line7
stats.print_callees("cprofile_example.py:3")

You can also use the visualization tool snakeviz.

Reference: Profiling Python Code with cProfile

Profile Memory

1
2
3
4
5
6
import tracemalloc

tracemalloc.start()
# Code goes here
print("maximum memory usage is " + str(tracemalloc.get_traced_memory()[1] / 1024 / 1024 / 1024) + " Gb")
tracemalloc.stop()

Pip

  • pip freeze to show all installed packages
  • pip show <package_name> to show a specific package

NumPy

  • Difference between max and maximum:

    • numpy.maximum(A,B) returns the element-wise bigger one of the two
    • numpy.max(A) returns the maximum value inside A
  • Matrix/Vector Multiplication:

    • np.matmul(A, B): Returns matrix product of A and B
    • np.multiply(A, B): Returns element-wise multiplication of A and B
    • np.dot(A, B): Returns dot product of A and B
  • numpy.diagonal(M): Returns the diagonal of a 2-D matrix M

  • numpy.tile(A, reps): repeats A reps times

  • numpy.where(cond, A, B): condition on array. Really useful function, so is just A if cond else B

  • Solve TypeError: only integer scalar arrays can be converted to a scalar index when you execute a[a == b]: this happens because a is not an np array. It is a list and the message above comes from the list type. reference

  • Convert sclacr to array or to any shape: np.reshape(scalar, (1,1))

  • When your matrix operations involve inverses $A^{-1}$, it is always better to use the inverse indirectly than to manifest it explicitly because manifesting it often involves intricate computation that may harm numerical stability. That is, use np.linalg.solve() instead of np.linalg.inv reference

  • np.frompyfunc to more efficiently apply function on numpy arrays: This function is internally called when you apply a function to a np.array, but if the otuput doesn’t meet your expectation, you can use this function to specify what it should do.

    1
    2
    3
    4
    double = lambda x = 2x
    npfunction = np.frompyfunc(f, <input_number>, <output_number>)
    npf = np.frompyfunc(double, 1, 1)
    # npf(arr) <==> f(arr) in this particular case
  • For each row, extract the corresponding column: Qs = network(states)[np.arange(actions.shape[0]), actions]

    network(states) is $B \times dim_A$ representing for each sample, the value of taking a specific action. actions is vector of $B$ storing which action we actually took. Using this command, we extract the value of taking a specific action at a specific state. Note There are a total $B$ (state, action) pairs.

Pandas

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 直接循环 df 循环的是 col 名
for col in df:
print(col)

# 想要循环每一行的数据应使用 iterrows()
# row = (row_index: int, data: pd.Series)
for row in df.iterrows():
print(row)

# 想要读取某一行的数据使用 loc[i],返回 pd.Series
row0 = df.loc[0]

# loc 用来过滤时如果有两个以上条件:只能用&,用and会报错,此外也要用圆括号括起来
df.loc[ (df["att1"] == "012") & (df["code"] == "2A") ]

Mathplotlib

import matplotlib.pyplot as plt

  • Change where y range starts in matplotlib: plt.ylim(bottom = x)

  • Rotate the labels in x-axis by 90 degrees: this trick helps you when you have too long x-axis labels. plt.xticks(rotation = 90 )

  • Output/Save Plot: plt.savefig('filename.png')

  • Change labels, ticks, …

    Change ticks are applicable when your x-axis is discrete, like [1, 2, 5, 10] and you want any neighboring two only has unit distance instead of, say between 2 and 5 have 3 unit distance.

    1
    2
    3
    4
    5
    6
    7
    plt.xlabel('X axis', fontsize=15)
    plt.ylabel('Y axis', fontsize=15)

    plt.xticks(lst_of_tick_position, labels, color='blue', rotation=60)

    # disabling yticks by setting yticks to an empty list
    plt.yticks([])
  • Different Kinds of Plot:

    • scatter plot: plt.scatter(x,y)

    • histogram: plt.hist(x,y)

    • 普通折线图:

      1
      2
      3
      x = np.arange(-10,10,0.1)
      y = 2*x
      plt.plot(x,y)
  • reset plot: plt.clf()

  • Plot lines w/ custom line label:

    1
    2
    3
    4
    5
    6
    #plot individual lines with custom colors, styles, and widths
    plt.plot(df['leads'], label='Leads', color='green')
    plt.plot(df['prospects'], label='Prospects', color='steelblue', linewidth=4)
    plt.plot(df['sales'], label='Sales', color='purple', linestyle='dashed')

    plt.legend()

Json

  • Json doesn’t dump UTF-8: When you have json output like \u2019, it may not be your fault. Note the json standard is to escape non-ascii characters even if it’s not needed. You can override this with the following command:

    1
    2
    with open('output.json', 'w') as f:
    json.dump(posts, f, indent=4, ensure_ascii=False)
CATALOG
  1. 1. List
  2. 2. Dict
  3. 3. Function
  4. 4. Class
  5. 5. Exception
  6. 6. String
  7. 7. Data Structures
  8. 8. 函数式编程
  9. 9. IO
  10. 10. Profiling
    1. 10.1. Creating Profiling Data
    2. 10.2. Inspecting Profiling Data
    3. 10.3. Profile Memory
  11. 11. Pip
  12. 12. NumPy
  13. 13. Pandas
  14. 14. Mathplotlib
  15. 15. Json