python | Hexo

Built-in Functions

name	description
abs	Return the absolute value of a number.
all	Return True if all elements of the iterable are true .
any	Return True if any element of the iterable is true.
ascii	Return a string containing a printable representation of an object.
bin	Convert an integer number to a binary string.
bool	Return a Boolean value, one of True or False.
breakpoint	Drops into the debugger at the call site.
bytearray	Return a new array of bytes. (0 <= x < 256)
bytes	Return a new “bytes” object. (0 <= x < 256)
callable	Return True if the object argument appears callable, False if not.
chr	Return the string representing a character whose Unicode code point is the integer i.
classmethod	Transform a method into a class method. `C.f()`
compile	Compile the source into a code or AST object. (executed by exec() or eval())
delattr	Deletes the named attribute, provided the object allows it.
dict	Create a new dictionary.
dir
divmod
enumerate	Return an enumerate object.
eval
exec	Dynamic execution of Python code
filter
float
format
frozenset
getattr
globals
hasattr
hash
help
hex
id
input
int
isinstance
issubclass
iter
len
list
locals
map
max
memoryview
min
next
object
oct
open
ord
pow
print
property
range
repr
reversed
round
set
setattr
slice
sorted
staticmethod
str
sum
super
tuple
type
vars
zip
import

Functional Programming Modules

itertools : Functions creating iterators for efficient looping

name	description	example
count	start, start+step …	count(1) –> 1 2 3 …
cycle	p0, p1, … , p0, p1, …	cycle(‘ABC’) –> A B C A B C …
repeat	elem, elem, … endlessly or up to n times	repeat(10, 3) –> 10 10 10
accumulate	p0, p0+p1, p0+p1+p2, …	accumulate([1,2,3,4,5]) –> 1 3 6 10 15
chain	p0, p1, … q0, q1, …	chain(‘ABC’, ‘DEF’) –> A B C D E F
chain.from_iterable	p0, p1, … q0, q1, …	chain.from_iterable([‘AB’, ‘DE’]) –> A B D E
compress	(d[0] if s[0]), (d[1] if s[1]), …	compress(‘ABCDE’, [1,0,1,0,1]) –> A C E
dropwhile	seq[n], seq[n+1], starting when pred fails	dropwhile(lambda x: x<5, [1,4,6,4,1]) –> 6 4 1
filterfalse	elem of seq where pred(elem) is false	filterfalse(lambda x: x%2, range(5)) –> 0 2 4
groupby	sub-iterators grouped by value of key(v)
islice	elements from seq[start:stop:step]
starmap	func(seq[0]), func(seq[1]), …
takewhile	seq[0], seq[1], until pred fails
tee	it1, it2, … itn splits one iterator into n
zip_longest	(p[0], q[0]), (p[1], q[1]), …	zip_longest(‘AB’, ‘x’, fillvalue=’-‘) –> Ax B-

functools : Higher-order functions and operations on callable objects

name	description	example

operator : Standard operators as functions
| name | description | example |
| —- | :———– | :——— |
||||
||||
||||
||||
||||
||||
||||

##IPython

IPython (interactive Python) 作为一个增强的 Python 解释器致力于提供“科学计算的全生命周期开发工具”。如果将 Python 看作数据科学任务的引擎，那么 IPython 就是一个交互式控制面板。IPython 被紧密地连接在 Jupyter 项目中，提供一个基于浏览器的 Notebook。Jupyter Notebook 不仅可以执行 Python/IPython 语句，还允许用户添加格式化文本、静态和动态的可视化图像、数学公式、JavaScript 插件等等。

用符号?获取文档

每一个 Python 对象都有一个字符串的引用(docstring)，该字符串包含对象的简要介绍和使用方法。

In [1]: len?
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type:      builtin_function_or_method

In [2]: l = [1,2,3]

In [3]: l?
Type:        list
String form: [1, 2, 3]
Length:      3
Docstring:
Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.

In [4]: l.append?
Signature: l.append(object, /)
Docstring: Append object to the end of the list.
Type:      builtin_function_or_method

通过符号??获取源代码

IPython 提供了获取源代码，两个问好 (??)，就是比看 doc 多一些好奇~:

In [13]: def swap(x, y): return y,x

In [14]: swap??
Signature: swap(x, y)
Docstring: <no docstring>
Source:    def swap(x, y): return y,x
File:      ~/<ipython-input-13-124656e4654a>
Type:      function

用Tab补全的方式探索模块

IPython 用 Tab 键自动补全和探索对象、模块及命名空间的内容。

通配符匹配

IPython 提供了用 * 符号来实现的通配符匹配方法。

In [15]: *Warning?
BytesWarning
DeprecationWarning
FutureWarning
ImportWarning
PendingDeprecationWarning
ResourceWarning
RuntimeWarning
SyntaxWarning
UnicodeWarning
UserWarning
Warning

command	description
Ctrl + a	将光标移到本行的开始处
Ctrl + e	将光标移到本行的结尾处
Ctrl + b	将光标回退一个字符
Ctrl + f	将光标前进一个字符
Backspace	删除前一个字符
Ctrl + d	删除后一个字符
Ctrl + k	从光标开始剪切至行的末尾
Ctrl + u	从行的开头剪切至光标
Ctrl + y	粘贴剪切的文本
Ctrl + t	交换前两个字符
Ctrl + p	获取前一个历史命令
Ctrl + n	获取后一个历史命令
Ctrl + r	对历史命令的反向搜索
Ctrl + l	清除终端屏幕的内容
Ctrl + c	中断当前的 Python 命令
Ctrl + d	退出 IPython 会话

粘贴代码块：%paste和%cpaste

执行外部代码：%run

计算代码运行时间：%timeit

魔法函数的帮助：?、%magic和%lsmagic

IPython的输入和输出对象

下划线快捷键和以前的输出

代码段计时：%timeit和%time

分析整个脚本：%prun

用%lprun进行逐行分析

用%memit和%mprun进行内存分析

更新 pip & setuptools

1	python -m pip install -U pip setuptools

30段极简Python代码：这些小技巧你都Get了吗

重复元素判定

def all_unique(lst):
    return len(lst) == len(set(lst))


x = [1,1,2,2,3,2,3,4,5,6]
y = [1,2,3,4,5]
all_unique(x) # False
all_unique(y) # True

字符元素组成判定

from collections import Counter

def anagram(first, second):
    return Counter(first) == Counter(second)


anagram("abcd3", "3acdb") # True

内存占用

import sys 

variable = 30 
print(sys.getsizeof(variable)) # 24

字节占用

def byte_size(string):
    return(len(string.encode('utf-8')))


byte_size('😀') # 4
byte_size('Hello World') # 11

打印 N 次字符串

n = 2; 
s ="Programming"; 

print(s * n);
# ProgrammingProgramming

大写第一个字母

s = "programming is awesome"

print(s.title())
# Programming Is Awesome

分块

from math import ceil

def chunk(lst, size):
    return list(
        map(lambda x: lst[x * size:x * size + size],
            list(range(0, ceil(len(lst) / size)))))



chunk([1,2,3,4,5],2)
# [[1,2],[3,4],5]

压缩

def compact(lst):
    return list(filter(bool, lst))


compact([0, 1, False, 2, '', 3, 'a', 's', 34])
# [ 1, 2, 3, 'a', 's', 34 ]

解包

array = [['a', 'b'], ['c', 'd'], ['e', 'f']]
transposed = zip(*array)
print(transposed)
# [('a', 'c', 'e'), ('b', 'd', 'f')]

链式对比

1
2
3

a = 3
print( 2 < a < 8) # True
print(1 == a < 2) # False

逗号连接

hobbies = ["basketball", "football", "swimming"]

print("My hobbies are: " + ", ".join(hobbies))
# My hobbies are: basketball, football, swimminghobbies = ["basketball", "football", "swimming"]

print("My hobbies are: " + ", ".join(hobbies))
# My hobbies are: basketball, football, swimming

元音统计

import re

def count_vowels(str):
    return len(len(re.findall(r'[aeiou]', str, re.IGNORECASE)))

count_vowels('foobar') # 3
count_vowels('gym') # 0

首字母小写

def decapitalize(string):
    return str[:1].lower() + str[1:]


decapitalize('FooBar') # 'fooBar'
decapitalize('FooBar') # 'fooBar'

展开列表

def spread(arg):
    ret = []
    for i in arg:
        if isinstance(i, list):
            ret.extend(i)
        else:
            ret.append(i)
    return ret

def deep_flatten(lst):
    result = []
    result.extend(
        spread(list(map(lambda x: deep_flatten(x) if type(x) == list else x, lst))))
    return result


deep_flatten([1, [2], [[3], 4], 5]) # [1,2,3,4,5]

列表的差

def difference(a, b):
    set_a = set(a)
    set_b = set(b)
    comparison = set_a.difference(set_b)
    return list(comparison)


difference([1,2,3], [1,2,4]) # [3]

通过函数取差

def difference_by(a, b, fn):
    b = set(map(fn, b))
    return [item for item in a if fn(item) not in b]


from math import floor
difference_by([2.1, 1.2], [2.3, 3.4],floor) # [1.2]
difference_by([{ 'x': 2 }, { 'x': 1 }], [{ 'x': 1 }], lambda v : v['x'])
# [ { x: 2 } ]

链式函数调用

def add(a, b):
    return a + b

def subtract(a, b):
    return a - b

a, b = 4, 5
print((subtract if a > b else add)(a, b)) # 9

检查重复项

def has_duplicates(lst):
    return len(lst) != len(set(lst))


x = [1,2,3,4,5,5]
y = [1,2,3,4,5]
has_duplicates(x) # True
has_duplicates(y) # False

合并两个字典

def merge_two_dicts(a, b):
    c = a.copy()   # make a copy of a 
    c.update(b)    # modify keys and values of a with the ones from b
    return c


a = { 'x': 1, 'y': 2}
b = { 'y': 3, 'z': 4}
print(merge_two_dicts(a, b))
# {'y': 3, 'x': 1, 'z': 4}

将两个列表转化为字典

def to_dictionary(keys, values):
    return dict(zip(keys, values))


keys = ["a", "b", "c"]    
values = [2, 3, 4]
print(to_dictionary(keys, values))
# {'a': 2, 'c': 4, 'b': 3}

使用枚举

list = ["a", "b", "c", "d"]
for index, element in enumerate(list): 
    print("Value", element, "Index ", index, )

# ('Value', 'a', 'Index ', 0)
# ('Value', 'b', 'Index ', 1)
#('Value', 'c', 'Index ', 2)
# ('Value', 'd', 'Index ', 3)

执行时间

import time

start_time = time.time()

a = 1
b = 2
c = a + b
print(c) #3

end_time = time.time()
total_time = end_time - start_time
print("Time: ", total_time)

# ('Time: ', 1.1205673217773438e-05)

```

1
2

1
2

1
2

1
2

1
2

1
2

1
2

1
2

1
2

1
2


***

**Pandas 使用小技巧**

*随机选取一小部分数据，然后读入内存，快速了解数据。* 

使用 skiprows 参数，x > 0 确保首行读入，np.random.rand() > 0.01 表示 99% 的数据都会被随机过滤掉，只有全部数据的 1% 才有机会选入内存中。

import pandas as pd
import numpy as np
df = pd.read_csv(“big_data.csv”, skiprows = lambda x: x>0 and np.random.rand() > 0.01)
print(“The shape of the df is {}. It has been reduced 100 times!”.format(df.shape))
`

某列上使用 replace 方法和正则，快速完成值的清洗。

Reference :