List - A Complete Guide

Python's list Data Type: A Deep Dive With Examples – Real Python

List 的特点

  • Ordered: They contain elements or items that are sequentially arranged according to their specific insertion order.

  • Zero-based: They allow you to access their elements by indices that start from zero.

  • Mutable: They support in-place mutations or changes to their contained elements.

  • Heterogeneous: They can store objects of different types.

  • Growable and dynamic: They can grow or shrink dynamically, which means that they support the addition, insertion, and removal of elements.

  • Nestable: They can contain other lists, so you can have lists of lists.

  • Iterable: They support iteration, so you can traverse them using a loop or comprehension while you perform operations on each of their elements.

  • Sliceable: They support slicing operations, meaning that you can extract a series of elements from them.

  • Combinable: They support concatenation operations, so you can combine two or more lists using the concatenation operators.

  • Copyable: They allow you to make copies of their content using various techniques.

Ordered

lists are ordered, which means that they keep their elements in the order of insertion:

colors = [
    "red",
    "orange",
    "yellow",
    "green",
    "blue",
    "indigo",
    "violet"]

colors  #=['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

access item by index (position)

access an individual object in a list by its position or index in the sequence. Indices start from zero:

>>> colors[0]          #= 'red'
>>> colors[1]          #= 'orange'
>>> colors[2]          #= 'yellow'
>>> colors[3]          #= 'green'

heterogeneous

Lists can contain objects of different types. That’s why lists are heterogeneous collections:

[42, "apple", True, {"name": "John Doe"}, (1, 2, 3), [3.14, 2.78]]

Constructing Lists

Creating Lists Through Literals

[item_0, item_1, ..., item_n]
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
fruits = ["apple", "banana", "orange", "kiwi", "grape"]
cities = [
    "New York",
    "Los Angeles",
    "Chicago",
    "Houston",
    "Philadelphia"
]
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
inventory = [
    {"product": "phone", "price": 1000, "quantity": 10},
    {"product": "laptop", "price": 1500, "quantity": 5},
    {"product": "tablet", "price": 500, "quantity": 20}
]
functions = [print, len, range, type, enumerate]
empty = []

list() Constructor

list([iterable])
>>> list((0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list({"circle", "square", "triangle", "rectangle", "pentagon"})
['square', 'rectangle', 'triangle', 'pentagon', 'circle']

>>> list({"name": "John", "age": 30, "city": "New York"}.items())
[('name', 'John'), ('age', 30), ('city', 'New York')]

>>> list("Pythonista")
['P', 'y', 't', 'h', 'o', 'n', 'i', 's', 't', 'a']

>>> list()
[]
def fibonacci_generator(stop):
    current_fib, next_fib = 0, 1
    for _ in range(0, stop):
        fib_number = current_fib
        current_fib, next_fib = next_fib, current_fib + next_fib
        yield fib_number

fibonacci_generator(10)       #= <generator object fibonacci_generator at 0x10692f3d0>

list(fibonacci_generator(10))  #= [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

[*fibonacci_generator(10)]     #= [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Building Lists With List Comprehensions

[expression(item) for item in iterable]

Every list comprehension needs at least three components:

  1. expression() is a Python expression that returns a concrete value, and most of the time, that value depends on item. Note that it doesn’t have to be a function.

  2. item is the current object from iterable.

  3. iterable can be any Python iterable object, such as a list, tuple, set, string, or generator.

>>> [number ** 2 for number in range(1, 11)]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Accessing Items in a List: Indexing

Syntax:

list_object[index]

Example:

>>> languages = ["Python", "Java", "JavaScript", "C++", "Go", "Rust"]
>>> languages[0]    #=  'Python'
>>> languages[1]    #=  'Java'
>>> languages[2]    #=  'JavaScript'
>>> languages[3]    #=  'C++'
>>> languages[4]    #=  'Go'
>>> languages[5]    #=  'Rust'

"Python"

"Java"

"JavaScript"

"C++"

"Go"

"Rust"

0

1

2

3

4

5

>>> len(languages)  #= 6
>>> languages[6]
Traceback (most recent call last):
    ...
IndexError: list index out of range
>>> languages[-1]    #=  'Rust'
>>> languages[-2]    #=  'Go'
>>> languages[-3]    #=  'C++'
>>> languages[-4]    #=  'JavaScript'
>>> languages[-5]    #=  'Java'
>>> languages[-6]    #=  'Python'

"Python"

"Java"

"JavaScript"

"C++"

"Go"

"Rust"

-6

-5

-4

-3

-2

-1

>>> languages[-7]
Traceback (most recent call last):
    ...
IndexError: list index out of range

Compound list (List of lists):

>>> employees = [
...     ("John", 30, "Software Engineer"),
...     ("Alice", 25, "Web Developer"),
...     ("Bob", 45, "Data Analyst"),
...     ("Mark", 22, "Intern"),
...     ("Samantha", 36, "Project Manager")
... ]
list_of_sequences[index_0][index_1]...[index_n]
>>> employees[1][0]    #= 'Alice'
>>> employees[1][1]    #= 25
>>> employees[1][2]    #= 'Web Developer'

List of dicts:

>>> employees = [
...     {"name": "John", "age": 30, "job": "Software Engineer"},
...     {"name": "Alice", "age": 25, "job": "Web Developer"},
...     {"name": "Bob", "age": 45, "job": "Data Analyst"},
...     {"name": "Mark", "age": 22, "job": "Intern"},
...     {"name": "Samantha", "age": 36, "job": "Project Manager"}
... ]

>>> employees[3]["name"]    #=  'Mark'
>>> employees[3]["age"]     #=  22
>>> employees[3]["job"]     #=  Intern

Retrieving Multiple Items From a List: Slicing

list_object[start:stop:step]
  • start specifies the index at which you want to start the slicing. The resulting slice includes the item at this index.

  • stop specifies the index at which you want the slicing to stop extracting items. The resulting slice doesn’t include the item at this index.

  • step provides an integer value representing how many items the slicing will skip on each step. The resulting slice won’t include the skipped items.

Index

Default Value

start

0

stop

len(list_object)

step

1

letters = ["A", "a", "B", "b", "C", "c", "D", "d"]
upper_letters = letters[0::2] # Or [::2]
upper_letters                 #=  ['A', 'B', 'C', 'D']
lower_letters = letters[1::2]  #= ['a', 'b', 'c', 'd']
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
first_three = digits[:3]    #= [0, 1, 2]
middle_four = digits[3:7]   #= [3, 4, 5, 6]  
last_three = digits[-3:]    #= [7, 8, 9]
every_other = digits[::2]   #= [0, 2, 4, 6, 8]
every_three = digits[::3]   #= [0, 3, 6, 9]               

slice() 语法

slice(start, stop, step)
letters = ["A", "a", "B", "b", "C", "c", "D", "d"]
upper_letters = letters[slice(0, None, 2)]
upper_letters           #= ['A', 'B', 'C', 'D']
lower_letters = letters[slice(1, None, 2)]
lower_letters           #= ['a', 'b', 'c', 'd']

slice 可以传入 None,这样就会使用内部缺省值

start, stop 超出边界处理:

>>> colors = [
...     "red",
...     "orange",
...     "yellow",
...     "green",
...     "blue",
...     "indigo",
...     "violet"
... ]

len(colors) #= 7
colors[-8:] #= ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
colors[8:]  #= []
colors[:8]  #= ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

Copy of a List

Aliases of a List

countries = ["United States", "Canada", "Poland", "Germany", "Austria"]
nations = countries             #// 别名
id(countries) == id(nations)    #= True
countries[0] = "United States of America"
nations                         #= ['United States of America', 'Canada', 'Poland', 'Germany', 'Austria']

别名与真身是一个对象的两个不同名字

Shallow Copy of a List

  1. The slicing operator, [:]

  2. The .copy() method

  3. The copy() function from the copy module

[ : ] 切片

countries = ["United States", "Canada", "Poland", "Germany", "Austria"]
nations = countries[:]
nations                         #= ['United States', 'Canada', 'Poland', 'Germany', 'Austria']
id(countries) == id(nations)    #= False

However, the elements in nations are aliases of the elements in countries:

id(nations[0]) == id(countries[0])  #= True
id(nations[1]) == id(countries[1])  #= True

how would this impact the behavior of both lists?

当你修改其中一个,另一个也会变化吗?

countries[0] = "United States of America"
countries   #= ['United States of America', 'Canada', 'Poland', 'Germany', 'Austria']
nations     #= ['United States', 'Canada', 'Poland', 'Germany', 'Austria']
id(countries[0]) == id(nations[0])  #= False
id(countries[1]) == id(nations[1])  #= True

不会跟着一起变!Shallow Copy List 成员在赋值修改时,可独立变化。

.copy() method

countries = ["United States", "Canada", "Poland", "Germany", "Austria"]
nations = countries.copy()          #///
nations     #= ['United States', 'Canada', 'Poland', 'Germany', 'Austria']
id(countries) == id(nations)        #= False
id(countries[0]) == id(nations[0])  #= True
id(countries[1]) == id(nations[1])  #= True
countries[0] = "United States of America"
countries   #= ['United States of America', 'Canada', 'Poland', 'Germany', 'Austria']
nations     #= ['United States', 'Canada', 'Poland', 'Germany', 'Austria']

copy.copy()

from copy import copy
countries = ["United States", "Canada", "Poland", "Germany", "Austria"]
nations = copy(countries)
nations     #= ['United States', 'Canada', 'Poland', 'Germany', 'Austria']
id(countries) == id(nations)        #= False
id(countries[0]) == id(nations[0])  #= True
id(countries[1]) == id(nations[1])  #= True
countries[0] = "United States of America"
countries   #= ['United States of America', 'Canada', 'Poland', 'Germany', 'Austria']
nations     #= ['United States', 'Canada', 'Poland', 'Germany', 'Austria']

Deep Copies of a List

When you create a deep copy of a list, Python constructs a new list object and then inserts copies of the objects from the original list recursively.

deep copy

from copy import deepcopy
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
matrix_copy = deepcopy(matrix)
id(matrix) == id(matrix_copy)               #= False
id(matrix[0]) == id(matrix_copy[0])         #= False
id(matrix[1]) == id(matrix_copy[1])         #= False

shallow copy 在2维 List 深度时,仍然只是别名(跟随变化):

from copy import copy
matrix_copy = copy(matrix)
matrix_copy[0][0] = 100
matrix_copy[0][1] = 200
matrix_copy[0][2] = 300
matrix_copy     #= [[100, 200, 300], [4, 5, 6], [7, 8, 9]]
matrix          #= [[100, 200, 300], [4, 5, 6], [7, 8, 9]]

再看 deep copy,在2维 List 深度时,copy 与原对象是独立的:

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
matrix_copy = deepcopy(matrix)
matrix_copy[0][0] = 100
matrix_copy[0][1] = 200
matrix_copy[0][2] = 300
matrix_copy     #= [[100, 200, 300], [4, 5, 6], [7, 8, 9]]
matrix          #= [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Finally, it’s important to note that when you have a list containing immutable objects, such as numbers, strings, or tuples, the behavior of deepcopy() mimics what copy() does:

countries = ["United States", "Canada", "Poland", "Germany", "Austria"]
nations = deepcopy(countries)
id(countries) == id(nations)        #= False
id(countries[0]) == id(nations[0])  #= True
id(countries[1]) == id(nations[1])  #= True

扩展:如果 List 的成员是 Dict,结果会怎样?

Updating Items in Lists: Index Assignments

位置成员更新 list_object[index] = new_value

numbers = [1, 2, 3, 4]
numbers[0] = "one"          #= ['one', 2, 3, 4]
numbers[1] = "two"          #= ['one', 'two', 3, 4]
numbers[-1] = "four"        #= ['one', 'two', 3, 'four']                 
numbers[-2] = "three"       #= ['one', 'two', 'three', 'four']                    

值更新 .index(value)

fruits = ["apple", "banana", "orange", "kiwi", "grape"]
fruits[fruits.index("kiwi")] = "mango"
fruits                      #= ['apple', 'banana', 'orange', 'mango', 'grape']

切片更新list_object[start:stop:step] = iterable

numbers = [1, 2, 3, 4, 5, 6, 7]
numbers[1:4] = [22, 33, 44]
numbers                     #= [1, 22, 33, 44, 5, 6, 7]

扩充成员数:

numbers = [1, 5, 6, 7]
numbers[1:1] = [2, 3, 4]
numbers                     #= [1, 2, 3, 4, 5, 6, 7]

削减成员数:

numbers = [1, 2, 0, 0, 0, 0, 4, 5, 6, 7]
numbers[2:6] = [3]
numbers                     #= [1, 2, 3, 4, 5, 6, 7]

Growing and Shrinking Lists Dynamically

.append(single_value)

pets = ["cat", "dog"]
pets.append("parrot")       #= ['cat', 'dog', 'parrot']
pets.append("gold fish")    #= ['cat', 'dog', 'parrot', 'gold fish']
pets.append("python")       #= ['cat', 'dog', 'parrot', 'gold fish', 'python']

等效做法(但难用):

>>> pets[len(pets):] = ["hawk"]
>>> pets                #= ['cat', 'dog', 'parrot', 'gold fish', 'python', 'hawk']

append( list ) 行不行?

>>> pets.append(["hamster", "turtle"])
>>> pets
[   'cat',
    'dog',
    'parrot',
    'gold fish',
    'python',
    'hawk',
    ['hamster', 'turtle']
]

这不是我们想要的结果!

.extend(another_list)

fruits = ["apple", "pear", "peach"]
fruits.extend(["orange", "mango", "banana"])
fruits   #= ['apple', 'pear', 'peach', 'orange', 'mango', 'banana']

等效做法:

fruits = ["apple", "pear", "peach"]
fruits[len(fruits):] = ["orange", "mango", "banana"]
fruits   #= ['apple', 'pear', 'peach', 'orange', 'mango', 'banana']

.insert(postion, value)

letters = ["A", "B", "F", "G"]
letters.insert(2, "C")  #= ['A', 'B', 'C', 'F', 'G']
letters.insert(3, "D")  #= ['A', 'B', 'C', 'D', 'F', 'G']
letters.insert(4, "E")  #= ['A', 'B', 'C', 'D', 'E', 'F', 'G']

slice 赋值实现 insert:

list_object[index:index] = [item]

Deleting Items From a List

Method

Description

.remove(item)

Removes the first occurrence of item from the list. It raises a ValueError if there’s no such item.

.pop([index])

Removes the item at index and returns it back to the caller. If you don’t provide a target index, then .pop() removes and returns the last item in the list. Note that the square brackets around index mean that the argument is optional. The brackets aren’t part of the syntax.

.clear()

Removes all items from the list.

.remove(item)

sample = [12, 11, 10, 42, 14, 12, 42]
sample.remove(42) #= [12, 11, 10, 14, 12, 42]
sample.remove(42) #= [12, 11, 10, 14, 12]
sample.remove(42)
#==
# Traceback (most recent call last):
#     ...
# ValueError: list.remove(x): x not in list

.pop(index)

to_visit = [
    "https://realpython.com",
    "https://python.org",
    "https://stackoverflow.com",
]

visited = to_visit.pop()    # 'https://stackoverflow.com'
to_visit                    #= ['https://realpython.com', 'https://python.org']
visited = to_visit.pop(0)   #= 'https://realpython.com'
to_visit                    #= ['https://python.org']
visited = to_visit.pop(-1)  #= 'https://python.org'
to_visit                    #= []

.clear() all

cache = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
cache.clear()               #= []

clear() 的等效切片实现:

cache = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
cache[:] = []               #= []

del lst[index]

colors = [
    "red",
    "orange",
    "yellow",
    "green",
    "blue",
    "indigo",
    "violet"
]

del colors[1]   #= ['red', 'yellow', 'green', 'blue', 'indigo', 'violet']
del colors[-1]  #= ['red', 'yellow', 'green', 'blue', 'indigo']
del colors[2:4] #= ['red', 'yellow', 'indigo']
del colors[:]   #= []

Performance While Growing Lists

When you create a list, Python allocates enough space to store the provided items. It also allocates extra space to host future items. When you use the extra space by adding new items to that list with .append(), .extend(), or .insert(), Python automatically creates room for additional new items.

聪明的预分配策略:

from sys import getsizeof
numbers = []
for value in range(100):
    print(getsizeof(numbers))
    numbers.append(value)
#==
# 56
# 88
# 88
# 88
# 88
# 120
# 120
# 120
# 120
# 184
# 184
# ...

看起来增加量是 32, 32, 64, ...

Concatenating and Repeating Lists

Concatenating Lists

[0, 1, 2, 3] + [4, 5, 6] + [7, 8, 9]    #= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Whenever you use the concatenation operator, you get a new list object as a result:

digits = [0, 1, 2, 3, 4, 5]
id(digits)                     #= 4558758720
digits = digits + [6, 7, 8, 9]
id(digits)                     #= 4470412224

拼接对象必须同类:

[0, 1, 2, 3, 4, 5] + (6, 7, 8, 9)
#==
# Traceback (most recent call last):
#     ...
# TypeError: can only concatenate list (not "tuple") to list

自增 +=

digits = [0, 1, 2, 3, 4, 5]
id(digits)                   #= 
digits += [6, 7, 8, 9]
digits                       #= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
id(digits)

+= 和 =+ 有何区别?

digits = [0, 1, 2, 3, 4, 5]
id(digits)                  #= 4699578112
digits += [6, 7, 8, 9]
id(digits)                  #= 4699578112

Repeating the Content of a List

list * n

["A", "B", "C"] * 3         #= ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
3 * ["A", "B", "C"]         #= ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']

乘法没有顺序规定

list *= n

>>> letters = ["A", "B", "C"]
>>> letters *= 3
>>> letters
['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']

Reversing and Sorting Lists

reversed()

digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
reversed(digits)        #= <list_reverseiterator object at 0x10b261a50>
list(reversed(digits))  #= [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
digits                  #= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

注意 reversed() 并不立即返回结果 (容易踩坑吧?):

numbers = [1, 2, 3]
reversed_numbers = reversed(numbers)
next(reversed_numbers)  #= 3
numbers[1] = 222
next(reversed_numbers)  #= 222
next(reversed_numbers)  #= 1

实现用的 yeild? 验(查)证

.reverse()

digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits.reverse()
digits                  #= [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

.reverse() 无返回值

digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
reversed_digits = digits.reverse()
reversed_digits is None             #= True

reverse 的等效实现:

digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[::-1]            #= [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

sorted()

sorted() 数值列表:

numbers = [2, 9, 5, 1, 6]
sorted(numbers)         #= [1, 2, 5, 6, 9]
numbers                 #= [2, 9, 5, 1, 6]

sorted() 字符串列表

words = ["Hello,", "World!", "I", "am", "a", "Pythonista!"]
sorted(words)           #= ['Hello,', 'I', 'Pythonista!', 'World!', 'a', 'am']

https://realpython.com/python-sort/

混合列表可以用 sorted() 吗?

numbers = [2, "9", 5, "1", 6]
sorted(numbers)
#==
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: '<' not supported between instances of 'str' and 'int'

反向 sorted()

numbers = [2, 9, 5, 1, 6]
sorted(numbers, reverse=True)   #= [9, 6, 5, 2, 1]

应用:

def median(samples):
    n = len(samples)
    middle_index = n // 2
    sorted_samples = sorted(samples)
    # Odd number of values
    if n % 2:
        return sorted_samples[middle_index]
    # Even number of values
    lower, upper = middle_index - 1, middle_index + 1
    return sum(sorted_samples[lower:upper]) / 2

median([3, 5, 1, 4, 2])         #= 3
median([3, 5, 1, 4, 2, 6])      #= 3.5

sorted() lambda

按第2列(年龄)排序:

employees = [
    ("John", 30, "Designer", 75000),
    ("Jane", 28, "Engineer", 60000),
    ("Bob", 35, "Analyst", 50000),
    ("Mary", 25, "Service", 40000),
    ("Tom", 40, "Director", 90000)
]

sorted(employees, key=lambda employee: employee[1])
[
    ('Mary', 25, 'Service', 40000),
    ('Jane', 28, 'Engineer', 60000),
    ('John', 30, 'Designer', 75000),
    ('Bob', 35, 'Analyst', 50000),
    ('Tom', 40, 'Director', 90000)
]

.sort()

numbers = [2, 9, 5, 1, 6]
numbers.sort()
numbers                     #= [1, 2, 5, 6, 9]

.sort() 返回 None

numbers = [2, 9, 5, 1, 6]
sorted_numbers = numbers.sort()
sorted_numbers is None      #= True

Traversing Lists

适合一次性消耗,且不修改 list

colors = [ "red", "orange", "yellow", "green", "blue", "indigo", "violet"]

for color in colors:
    print(color)
#==
# red
# orange
# ...
# violet

适合修改,以及按索引处理:

for i in range(len(colors)):
    print(colors[i])
#==
# red
# orange
# ...
# violet

#enumerate 适合处理时计数,且不修改原 list:

for i, color in enumerate(colors):
    print(f"{i} is the index of '{color}'")
#==
# 0 is the index of 'red'
# 1 is the index of 'orange'
# ...
# 6 is the index of 'violet'

反向 traverse:

for color in reversed(colors):
    print(color)

#==
# violet
# indigo
# blue
# green
# yellow
# orange
# red

按值的顺序处理:

numbers = [2, 9, 5, 1, 6]
for number in sorted(numbers):
    print(number)

#==
# 1
# 2
# 5
# 6
# 9

#zip 同时 traverse 多个 list:

integers = [1, 2, 3]
letters = ["a", "b", "c"]
floats = [4.0, 5.0, 6.0]
for i, l, f in zip(integers, letters, floats):
    print(i, l, f)

#==
# 1 a 4.0
# 2 b 5.0
# 3 c 6.0

思考:如何 traverse 多个 list 的同时完成计数?

remove item while traversing problem

希望删除值为奇数的list成员。下面是错误的方法:

numbers = [2, 9, 5, 1, 6]
for number in numbers:
    if number % 2:
        numbers.remove(number)
        
numbers    #= [2, 5, 6]

常规解决办法是创建新的副本:

numbers = [2, 9, 5, 1, 6]
-for number in numbers:
+for number in numbers[:]:
    if number % 2:
        numbers.remove(number)
        
numbers    #= [2, 6]

a good practice for modifying list elements during iteration:

另一个方法:从最尾端一个开始计数和删除

创建新的副本,适合更广泛的需要修改 list 的场景:

numbers_as_strings = ["2", "9", "5", "1", "6"]
numbers_as_integers = []
for number in numbers_as_strings:
    numbers_as_integers.append(int(number))
numbers_as_integers     #= [2, 9, 5, 1, 6]

Building New Lists With Comprehensions

例如,简单的类型转换:

numbers = ["2", "9", "5", "1", "6"]
for i, number in enumerate(numbers):
    numbers[i] = int(number)
numbers                 #= [2, 9, 5, 1, 6]

更推荐的写法:

numbers = ["2", "9", "5", "1", "6"]
numbers = [int(number) for number in numbers]
numbers                 #= [2, 9, 5, 1, 6]

comprehension with if

integers = [20, 31, 52, 6, 17, 8, 42, 55]
even_numbers = [number for number in integers if number % 2 == 0]
even_numbers            #= [20, 52, 6, 8, 42]

Processing Lists With Functional Tools

map(fun, list)

numbers = ["2", "9", "5", "1", "6"]
numbers = list(map(int, numbers))
numbers                 #= [2, 9, 5, 1, 6]

filter(fun, list)

integers = [20, 31, 52, 6, 17, 8, 42, 55]
even_numbers = list(filter(lambda number: number % 2 == 0, integers))
even_numbers            #= [20, 52, 6, 8, 42]

When to use list comprehension

Exploring Other Features of Lists

Finding Items in a List

查找成员 in or not in:

usernames = ["john", "jane", "bob", "david", "eve"]
"linda" in usernames        #= False
"linda" not in usernames    #= True
"bob" in usernames          #= True
"bob" not in usernames      #= False
usernames = ["john", "jane", "bob", "david", "eve"]
usernames.index("eve")      #= 4

usernames.index("linda")
#==
# Traceback (most recent call last):
#     ...
# ValueError: 'linda' is not in list

.index()

sample = [12, 11, 10, 50, 14, 12, 50]
sample.index(12)            #= 0
sample.index(50)            #= 3

.count()

sample = [12, 11, 10, 50, 14, 12, 50]
sample.count(12)            #= 2
sample.count(11)            #= 1
sample.count(100)           #= 0

Getting the Length, Maximum, and Minimum of a Lis

grades = [80, 97, 86, 100, 98, 82]
n = len(grades)
sum(grades) / n             #= 90.5
min([3, 5, 9, 1, -5])       #= -5
max([3, 5, 9, 1, -5])       #= 9

Comparing Lists

[2, 3] == [2, 3]            #= True
[5, 6] != [5, 6]            #= False

[5, 6, 7] < [7, 5, 6]       #= True
[5, 6, 7] > [7, 5, 6]       #= False
[4, 3, 2] <= [4, 3, 2]      #= True
[4, 3, 2] >= [4, 3, 2]      #= True

[5, 6, 7] < [8]             #= True
[5, 6, 7] == [5]            #= False

<, > 比较时,只比较首成员吗?

Common Gotchas of Python Lists

常见错误

  • Confusing aliases of a list with copies: This can cause issues because changes to one alias affect others. Take a look at the Aliases of a List section for practical examples of this issue.

  • Forgetting that most list methods mutate the list in place and return None rather than a new list: This commonly leads to issues when you assign the return value of a list method to a variable, thinking that you have a new list, but you really get None. Check out the Reversing and Sorting Lists section for practical examples of this gotcha.

  • Confusing .append() with .extend(): This can cause issues because .append() adds a single item to the end of the list, while the .extend() method unpacks and adds multiple items. Have a look at the Growing and Shrinking Lists Dynamically section for details on how these methods work.

  • Using an empty list as a default argument value in function definitions: This can lead to unexpected behaviors because default argument values get defined when Python first parses the function.

下面的结果是不是会很诡异?

def append_to(item, target=[]):
    target.append(item)
    return target

append_to(1)    #= [1]
append_to(2)    #= [1, 2]
append_to(3)    #= [1, 2, 3]

明明在入参中有 target=[],那么 target 应该就是函数内部的局部变量啊?!

修改方法:

def append_to(item, target=None):
    if target is None:
        target = []
    target.append(item)
    return target

append_to(1)    #= [1]
append_to(2)    #= [2]
append_to(3)    #= [3]

Subclassing the Built-In list Class

给 list 一个能计算平均分的新功能

class GradeList(list):
    def average(self):
        return sum(self) / len(self)

grades = GradeList([80, 97, 86, 100, 98])
grades.append(82)
grades.average()        #= 90.5
grades[0] = 95
grades.average()        #= 93.0

增加数据校验功能(每个成员值必须在 [0, 100] 之间 )会牵涉到修改众多内置的 list 方法,是非常容易出错的:

# grades.py

class GradeList(list):
    def __init__(self, grades):
        grades = [self._validate(grade) for grade in grades]
        super().__init__(grades)

    def __setitem__(self, index, grade):
        if isinstance(index, slice):
            start, stop, step = index.indices(len(self))
            grades = [self._validate(grade) for grade in grade]
            return super().__setitem__(slice(start, stop, step), grades)
        super().__setitem__(index, self._validate(grade))

    def __add__(self, grades):
        grades = [self._validate(grade) for grade in grades]
        grades = super().__add__(grades)
        return self.__class__(grades)

    __radd__ = __add__

    def __iadd__(self, grades):
        grades = [self._validate(grade) for grade in grades]
        return super().__iadd__(grades)

    def append(self, grade):
        return super().append(self._validate(grade))

    def extend(self, grades):
        grades = [self._validate(grade) for grade in grades]
        return super().extend(grades)

    def average(self):
        return sum(self) / len(self)

    def _validate(self, value):
        if not isinstance(value, (int, float)):
            raise TypeError("grades must be numeric")
        if not (0 <= value <= 100):
            raise ValueError("grade must be between 0 and 100")
        return value

增加了数据校验这后的效果:

>>> from grades import GradeList

>>> grades = GradeList([80, 97, 86, 200])
Traceback (most recent call last):
    ...
ValueError: grade must be between 0 and 100

>>> grades = GradeList([80, 97, 86, 100])
>>> grades.average()
90.75

>>> grades[0] = 955
Traceback (most recent call last):
    ...
ValueError: grade must be between 0 and 100

>>> grades[0] = 95
>>> grades
[95, 97, 86, 100]

>>> grades.append(-98)
Traceback (most recent call last):
    ...
ValueError: grade must be between 0 and 100

>>> grades.append(98)
>>> grades
[95, 97, 86, 100, 98]

>>> grades += [88, 100]
>>> grades
[95, 97, 86, 100, 98, 88, 100]

>>> grades[:3] = [100, 100, 100]
>>> grades
[100, 100, 100, 100, 98, 88, 100]

>>> grades.average()
98.0

Putting Lists Into Action

Removing Repeated Items From a List

List 成员去重:

def get_unique_items(list_object):
    result = []
    for item in list_object:
        if item not in result:
            result.append(item)
    return result

get_unique_items([2, 4, 5, 2, 3, 5])    #= [2, 4, 5, 3]

上面的结果的问题是 not in 在作用于大的 list 时非常慢,于是引入临时变量(set)加速:

def get_unique_items(list_object):
    result = []
    unique_items = set()
    for item in list_object:
        if item not in unique_items:
            result.append(item)
            unique_items.add(item)
    return result
len(get_unique_items(range(100_000)))   #= 100000

更好的做法:

>>> list(set([2, 4, 5, 2, 3, 5]))       #= [2, 3, 4, 5]

最后的方法是选择完全相信 set()

Creating Multidimensional Lists

要求创建 5x5 全零的2维 list,用于将来存放结果。

看起来简单,但实际上有问题的解法:

>>> matrix = [[0] * 5] * 5
>>> matrix
[
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
]

>>> matrix[0][0] = 1
>>> matrix
[
    [1, 0, 0, 0, 0],
    [1, 0, 0, 0, 0],
    [1, 0, 0, 0, 0],
    [1, 0, 0, 0, 0],
    [1, 0, 0, 0, 0]
]

其它解法:

1 两层 for 循环

>>> matrix = []
>>> for row in range(5):
...     matrix.append([])
...     for _ in range(5):
...         matrix[row].append(0)
...

>>> matrix
[
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
]

2 两层 comprehension

>>> [[0 for _ in range(5)] for _ in range(5)]
[
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
]

3 comprehension 与 list * 结合

>>> [[0] * 5 for _ in range(5)]
[
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
]

解法3使用了 comprehension 与 list * 结合,功能正确,表达简洁

Flattening Multidimensional Lists

将多维 list 扁平化之后,针对成员的循环会更方便。需求:

INPUT:  [[0, 1, 2], [10, 11, 12], [20, 21, 22]]

OUTPUT: [0, 1, 2, 10, 11, 12, 20, 21, 22]

Solution:

matrix = [[0, 1, 2], [10, 11, 12], [20, 21, 22]]
flattened_list = []
for row in matrix:
    flattened_list.extend(row)

flattened_list  #= [0, 1, 2, 10, 11, 12, 20, 21, 22]

这是已知2层嵌的,如果是未知层数呢?

More detail: How to Flatten a List of Lists in Python

Splitting Lists Into Chunks

是上面的 flatten 的逆过程,需求:

INPUT:  [0, 1, 2, 10, 11, 12, 20, 21, 22]

OUTPUT: [[0, 1, 2], [10, 11, 12], [20, 21, 22]]

Solution:

def split_list(list_object, chunk_size):
    chunks = []
    for start in range(0, len(list_object), chunk_size):
        stop = start + chunk_size
        chunks.append(list_object[start:stop])
    return chunks
split_list([1, 2, 3, 4, 5, 6, 7, 8, 9], 3) #= [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

How to Split a Python List or Iterable Into Chunks

Using a List as a Stack or Queue

stack = []
stack.append("Copy")
stack.append("Paste")
stack.append("Remove")
stack                   #= ['Copy', 'Paste', 'Remove']
stack.pop()             #= 'Remove'
stack.pop()             #= 'Paste'
stack.pop()             #= 'Copy'
stack                   #= []

How to Implement a Python Stack

queue = []
queue.append("John")
queue.append("Jane")
queue.append("Linda")
queue                   #= ['John', 'Jane', 'Linda']
queue.pop(0)            #= 'John'
queue.pop(0)            #= 'Jane'
queue.pop(0)            #= 'Linda'

Python Stacks, Queues, and Priority Queues in Practice

Deciding Whether to Use Lists

  • Keep your data ordered: Lists maintain the order of insertion of their items.

  • Store a sequence of values: Lists are a great choice when you need to store a sequence of related values.

  • Mutate your data: Lists are mutable data types that support multiple mutations.

  • Access random values by index: Lists allow quick and easy access to elements based on their index.

In contrast, avoid using lists when you need to:

  • Store immutable data: In this case, you should use a tuple. They’re immutable and more memory efficient.

  • Represent database records: In this case, consider using a tuple or a data class.

  • Store unique and unordered values: In this scenario, consider using a set or dictionary. Sets don’t allow duplicated values, and dictionaries can’t hold duplicated keys.

  • Run many membership tests where item doesn’t matter: In this case, consider using a set. Sets are optimized for this type of operation.

  • Run advanced array and matrix operations: In these situations, consider using NumPy’s specialized data structures.

  • Manipulate your data as a stack or queue: In those cases, consider using deque from the collections module or Queue, LifoQueue, or PriorityQueue. These data types are thread-safe and optimized for fast inserting and removing on both ends.