python中的Collections模块之Counter
虽然工作中常用Python,但都是些基本操作,对于这种高阶的工具包,一直是只知道有那么个东西,没调用过,每次都是自己造轮子。
人生苦短, 我用Python,为毛还重复造轮子,装什么C呢。
看下collections的init
__all__ = ['deque', 'defaultdict', 'namedtuple', 'UserDict', 'UserList',
'UserString', 'Counter', 'OrderedDict', 'ChainMap']
挑个今天眼热的模块,开始鼓捣鼓捣。
一、Counter
猜名字,是跟计数有关的玩意儿
看源码中类的介绍
1 class Counter(dict): 2 '''Dict subclass for counting hashable items. Sometimes called a bag 3 or multiset. Elements are stored as dictionary keys and their counts 4 are stored as dictionary values.'''
大概就是,字典的子类,为哈希元素提供计数功能,新生成的字典,元素为key,计数为values,按原来的key顺序进行的排序。
import collections ret=collections.Counter("abbbbbccdeeeeeeeeeeeee") ret Out[12]: Counter({'a': 1, 'b': 5, 'c': 2, 'd': 1, 'e': 13})
看源码中给的案例:
>>> c = Counter('abcdeabcdabcaba') # count elements from a string # 返回最多的3个key和value >>> c.most_common(3) # three most common elements [('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c) # list all unique elements ['a', 'b', 'c', 'd', 'e'] >>> ''.join(sorted(c.elements())) # list elements with repetitions 'aaaaabbbbcccdde' >>> sum(c.values()) # total of all counts 15 >>> c['a'] # count of letter 'a' 5
#对元素的更新 >>> for elem in 'shazam': # update counts from an iterable ... c[elem] += 1 # by adding 1 to each element's count >>> c['a'] # now there are seven 'a' 7 >>> del c['b'] # remove all 'b' >>> c['b'] # now there are zero 'b' 0 >>> d = Counter('simsalabim') # make another counter >>> c.update(d) # add in the second counter >>> c['a'] # now there are nine 'a' 9 >>> c.clear() # empty the counter >>> c Counter() Note: If a count is set to zero or reduced to zero, it will remain in the counter until the entry is deleted or the counter is cleared: >>> c = Counter('aaabbc') >>> c['b'] -= 2 # reduce the count of 'b' by two >>> c.most_common() # 'b' is still in, but its count is zero [('a', 3), ('c', 1), ('b', 0)]
常用API:
1、most_common(num),返回计数最多的num个元素,如果不传参数,则返回所以元素
def most_common(self, n=None): '''List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts. >>> Counter('abcdeabcdabcaba').most_common(3) [('a', 5), ('b', 4), ('c', 3)] ''' # Emulate Bag.sortedByCount from Smalltalk if n is None: return sorted(self.items(), key=_itemgetter(1), reverse=True) return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
2、elements
返回一个迭代器,迭代对象是所有的元素,只不过给你按原始数据的顺排了一下序,一样的给你放一起了
>>> c = Counter('ABCABC')
>>> c.elements() ==》['A', 'A', 'B', 'B', 'C', 'C'] #这里不是排序,是你恰好参数顺序是ABC,官方给的这例子容易误导人
>>> list(c.elements())
>>>
d=Counter('ACBCDEFT')
list(d.elements())
['A','C','B','D','E','F','T']
所以,要想带顺序,自己在调用sorted一下
sorted(d.elements()) => ['A','B','C','D','E','F','T']
3、subtract,感觉在leetcode刷题时,会比较实用
def subtract(*args, **kwds): '''Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts. Source can be an iterable, a dictionary, or another Counter instance.
啥意思,就是更新你的Counter对象,怎么更新,基于你传入的参数,它给你做减法,参数是可迭代对象,字典,或者另一个Counter
看官方的例子
c=Counter("which")
out:>> Counter({'w': 1, 'h': 2, 'i': 1, 'c': 1}) c.subtract('witch') #传入一个迭代对象,对迭代对象的每一个元素,对原对象进行减法,注意,t是原对象没有的 out:>> Counter({'w': 0, 'h': 1, 'i': 0, 'c': 0, 't': -1}) c.subtract(Counter('watch')) #传入另一个Counter对象 out:>> Counter({'w': -1, 'h': 0, 'i': 0, 'c': -1, 't': -2, 'a': -1})
c.subtract({'h':3,'q':5}) #传入一个字典,value是个数 也就是减去多少个key out:>> Counter({'w': -1, 'h': -3, 'i': 0, 'c': -1, 't': -2, 'a': -1, 'q': -5})
其他好像没啥好玩的了,以为几分钟就搞定了Collections的所有模块,想多了,先写个Counter,后边的几个慢慢补。