day2 数据及文件操作

1. 列表、元组操作

定义列表

names = ['felix','jack li','danie']

通过下标访问列表中的元素，如 names[0]

切片:取多个元素

 1 >>> names = ["felix","Tenglan","Eric","Rain","Tom","Amy"]
 2 >>> names[1:4]  #取下标1至下标4之间的数字，包括1，不包括4
 3 ['Tenglan', 'Eric', 'Rain']
 4 >>> names[1:-1] #取下标1至-1的值，不包括-1
 5 ['Tenglan', 'Eric', 'Rain', 'Tom']
 6 >>> names[0:3] 
 7 ['felix', 'Tenglan', 'Eric']
 8 >>> names[:3] #如果是从头开始取，0可以忽略，跟上句效果一样
 9 ['felix', 'Tenglan', 'Eric']
10 >>> names[3:] #如果想取最后一个，必须不能写-1，只能这么写
11 ['Rain', 'Tom', 'Amy'] 
12 >>> names[3:-1] #这样-1就不会被包含了
13 ['Rain', 'Tom']
14 >>> names[0::2] #后面的2是代表，每隔一个元素，就取一个
15 ['felix', 'Eric', 'Tom'] 
16 >>> names[::2] #和上句效果一样
17 ['felix', 'Eric', 'Tom']

追加

1 names = ['felix','tom','jack','xie','amy']
2 names.append("新来的")
3 names
4 ['felix', 'tom', 'jack', 'xie', 'amy', '新来的']

插入

1 names
2 ['felix', 'tom', 'jack', 'xie', 'amy', '新来的']
3 names.insert(2,"从jack前面插入")
4 names
5 ['felix', 'tom', '从jack前面插入', 'jack', 'xie', 'amy', '新来的']

修改

1 names
2 ['felix', 'tom', '从jack前面插入', 'jack', 'xie', '从amy前插入', 'amy', '新来的']
3 names[2] ="wilson"
4 names
5 ['felix', 'tom', 'wilson', 'jack', 'xie', '从amy前插入', 'amy', '新来的']

删除

 1 names
 2 ['felix', 'tom', 'wilson', 'jack', 'xie', '从amy前插入', 'amy', '新来的']
 3 del names[1]
 4 names
 5 ['felix', 'wilson', 'jack', 'xie', '从amy前插入', 'amy', '新来的']
 6 names.remove("从amy前插入")  #删除指定元素
 7 names
 8 ['felix', 'wilson', 'jack', 'xie', 'amy', '新来的']
 9 names.pop()   #删除最后一个元素值
10 '新来的'
11 names
12 ['felix', 'wilson', 'jack', 'xie', 'amy']

扩展

1 names
2 ['felix', 'wilson', 'jack', 'xie', 'amy']
3 test =[1,2,3,4,5]
4 names.extend(test)
5 names
6 ['felix', 'wilson', 'jack', 'xie', 'amy', 1, 2, 3, 4, 5]

拷贝

1 names
2 ['felix', 'wilson', 'jack', 'xie', 'amy', 1, 2, 3, 4, 5]
3 aa = names.copy()
4 aa
5 ['felix', 'wilson', 'jack', 'xie', 'amy', 1, 2, 3, 4, 5]

统计

1 names
2 ['felix', 'wilson', 'jack', 'felix', 'xie', 'amy', 'xie', 1, 2, 3, 4, 'felix', 5]
3 names.count('felix')
4 3

排序&翻转

1 names
2 ['amy', 'felix', 'felix', 'jack', 'wilson', 'xie', 'xie', '1', '2', '3', '4', 'felix', '5']
3 names.sort()
4 names
5 ['1', '2', '3', '4', '5', 'amy', 'felix', 'felix', 'felix', 'jack', 'wilson', 'xie', 'xie']
6 names.reverse()
7 names
8 ['xie', 'xie', 'wilson', 'jack', 'felix', 'felix', 'felix', 'amy', '5', '4', '3', '2', '1']

获取下标

1 names
2 ['xie', 'xie', 'wilson', 'jack', 'felix', 'felix', 'felix', 'amy', '5', '4', '3', '2', '1']
3 names.index('jack')
4 3

元组

元组是组只读的列表数据

1 name
2 ('xie', 'xie', 'wilson', 'jack', 'felix', 'felix', 'felix', 'amy', '5', '4', '3', '2', '1')
3 name.index('xie')
4 0
5 name.count('felix')
6 3

2. 字符串操作　

特性：不可修改

name.capitalize()  首字母大写
name.casefold()   大写全部变小写
name.center(50,"-")  输出 '---------------------Alex Li----------------------'
name.count('lex') 统计 lex出现次数
name.encode()  将字符串编码成bytes格式
name.endswith("Li")  判断字符串是否以 Li结尾
 "Alex\tLi".expandtabs(10) 输出'Alex      Li'， 将\t转换成多长的空格 
 name.find('A')  查找A,找到返回其索引， 找不到返回-1 

format :
    >>> msg = "my name is {}, and age is {}"
    >>> msg.format("alex",22)
    'my name is alex, and age is 22'
    >>> msg = "my name is {1}, and age is {0}"
    >>> msg.format("alex",22)
    'my name is 22, and age is alex'
    >>> msg = "my name is {name}, and age is {age}"
    >>> msg.format(age=22,name="ale")
    'my name is ale, and age is 22'
format_map
    >>> msg.format_map({'name':'alex','age':22})
    'my name is alex, and age is 22'


msg.index('a')  返回a所在字符串的索引
'9aA'.isalnum()   True

'9'.isdigit() 是否整数
name.isnumeric  
name.isprintable
name.isspace
name.istitle
name.isupper
 "|".join(['alex','jack','rain'])
'alex|jack|rain'


maketrans
    >>> intab = "aeiou"  #This is the string having actual characters. 
    >>> outtab = "12345" #This is the string having corresponding mapping character
    >>> trantab = str.maketrans(intab, outtab)
    >>> 
    >>> str = "this is string example....wow!!!"
    >>> str.translate(trantab)
    'th3s 3s str3ng 2x1mpl2....w4w!!!'

 msg.partition('is')   输出 ('my name ', 'is', ' {name}, and age is {age}') 

 >>> "alex li, chinese name is lijie".replace("li","LI",1)
     'alex LI, chinese name is lijie'

 msg.swapcase 大小写互换


 >>> msg.zfill(40)
'00000my name is {name}, and age is {age}'



>>> n4.ljust(40,"-")
'Hello 2orld-----------------------------'
>>> n4.rjust(40,"-")
'-----------------------------Hello 2orld'


>>> b="ddefdsdff_哈哈" 
>>> b.isidentifier() #检测一段字符串可否被当作标志符，即是否符合变量命名规则
True

3. 字典操作

字典一种key - value 的数据类型，语法:

info = {
    'stu1101': "TengLan Wu",
    'stu1102': "LongZe Luola",
    'stu1103': "XiaoZe Maliya",
}

字典的特性：

dict是无序的
key必须是唯一的,天生去重

增加

info["stu1104"] = "空空空"
info
{'stu1102': 'LongZe Luola', 'stu1104': '空空空', 'stu1103': 'XiaoZe Maliya', 'stu1101': 'TengLan Wu'}

修改

info['stu1101'] = "李小龙"
info
{'stu1101': '李小龙', 'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya', 'stu1104': '空空空'}

删除

info
{'stu01': 'linux ubuntu', 'stu02': 'linux centos', 'stu03': 'redhat', 'stu04': 'windows2012'}
info.pop('stu04')  #删除一
'windows2012'
del info['stu02']   #删除二
info
{'stu01': 'linux ubuntu', 'stu03': 'redhat'}
info
{'stu01': 'linux ubuntu', 'stu02': 'linux centos', 'stu03': 'redhat', 'stu04': 'windows2012'}
info.popitem()    #随机删除
('stu04', 'windows2012')
info
{'stu01': 'linux ubuntu', 'stu02': 'linux centos', 'stu03': 'redhat'}

查找

info
{'stu01': 'linux ubuntu', 'stu02': 'linux centos', 'stu03': 'redhat', 'stu04': 'windows2012'}
"stu02" in info  #标准判断是否存在
True
info.get('stu02')  #获取值，key不存在, 返回none
'linux centos'
info["stu02"]  #获取值, key不存在，会报错
'linux centos'
info["stu05"]
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'stu05'
info.get('stu05')

多级字典嵌套及操作

catalog = {
    "欧美":{
        "www.youporn.com": ["古老的","质量一般"],
        "www.pornhub.com": ["高科技","质量比yourporn高点"],
        "letmedothistoyou.com": ["高质量图片","资源不多,更新慢"],
        "x-art.com":["质量很高,真的很高","全部收费,屌比请绕过"]
    },
    "日韩":{
        "tokyo-hot":["质量怎样不清楚,个人已经不喜欢日韩范了","听说是收费的"]
    },
    "大陆":{
        "1024":["全部免费","服务器在国外,慢"]
    }
}
catalog["大陆"]["1024"][1] +=",可以用爬虫爬下来"
print(catalog["大陆"]["1024"])
['全部免费', '服务器在国外,慢,可以用爬虫爬下来']

其它方法

#查看key,value
info.keys()
info.values()

#setdefault() 函数和 get()方法 类似, 如果键不存在于字典中，将会添加键并将值设为默认值。
info.setdefault("stu04","win7")
'windows2012'
info
{'stu01': 'linux ubuntu', 'stu02': 'linux centos', 'stu03': 'redhat', 'stu04': 'windows2012'}

#update
info
{'stu01': 'linux ubuntu', 'stu02': 'linux centos', 'stu03': 'redhat', 'stu04': 'windows2012'}
a 
{1: 'a', 2: 'b', 'stu01': 'open bsd'}
info.update(a)
info
{'stu01': 'open bsd', 'stu02': 'linux centos', 'stu03': 'redhat', 'stu04': 'windows2012', 1: 'a', 2: 'b'}

#items 函数以列表返回可遍历的(键, 值) 元组数组。
for a,b in info.items():
    print(a,b);
    
stu01 open bsd
stu02 linux centos
stu03 redhat
stu04 windows2012
1 a
2 b

#循环dict
for key in info:
    print(key,info[key])
    
stu01 open bsd
stu02 linux centos
stu03 redhat
stu04 windows2012
1 a
2 b

4.集合操作

集合是一个无序的，不重复的数据组合，它的主要作用如下：

去重，把一个列表变成集合，就自动去重了
关系测试，测试两组数据之前的交集、差集、并集等关系

常用方法：

s = set([3,5,9,10])      #创建一个数值集合
t = set("Hello")         #创建一个唯一字符的集合
a = t | s          # t 和 s的并集  
b = t & s          # t 和 s的交集 
c = t - s          # 求差集（项在t中，但不在s中）
d = t ^ s          # 对称差集（项在t或s中，但不会同时出现在二者中）
print(a,b,c,d)
{3, 5, 'o', 9, 10, 'e', 'H', 'l'} set() {'o', 'H', 'e', 'l'} {3, 5, 9, 10, 'o', 'e', 'H', 'l'}
t.add('x')            # 添加一项  
s.update([10,37,42])  # 在s中添加多项
t.remove('H')  ＃remove()可以删除一项
len(s) 　　＃set 的长度 
x in s 　　＃测试 x 是否是 s 的成员
x not in s　　＃测试 x 是否不是 s 的成员 
s.issubset(t) 　＃s <= t  测试是否 s 中的每一个元素都在 t 中
s.issuperset(t) 　＃s >= t 测试是否 t 中的每一个元素都在 s 中  
s.union(t) ＃s | t　返回一个新的 set 包含 s 和 t 中的每一个元素  
s.intersection(t)　　＃s & t　返回一个新的 set 包含 s 和 t 中的公共元素
s.difference(t)　　　＃s - t 返回一个新的 set 包含 s 中有但是 t 中没有的元素
s.symmetric_difference(t)　　＃s ^ t 返回一个新的 set 包含 s 和 t 中不重复的元素 
s.copy()    ＃返回 set “s”的一个浅复制

5. 文件操作

对文件操作流程

打开文件，得到文件句柄并赋值给一个变量
通过句柄对文件进行操作
关闭文件

f = open('lyrics') #打开文件
first_line = f.readline()
print('first line:',first_line) #读一行
print('我是分隔线'.center(50,'-'))
data = f.read()# 读取剩下的所有内容,文件大时不要用
print(data) #打印文件
f.close() #关闭文件

打开文件的模式有：

r，只读模式（默认）。
w，只写模式。【不可读；不存在则创建；存在则删除内容；】
a，追加模式。【可读；不存在则创建；存在则只追加内容；】

"+" 表示可以同时读写某个文件

r+，可读写文件。【可读；可写；可追加】
w+，写读
a+，同a

"U"表示在读取时，可以将 \r \n \r\n自动转换成 \n （与 r 或 r+ 模式同使用）

"b"表示处理二进制文件（如：FTP发送上传ISO镜像文件，linux可忽略，windows处理二进制文件时需标注）

with语句

为了避免打开文件后忘记关闭，可以通过管理上下文

with open('log','r') as f: 当with代码块执行完毕时，内部会自动关闭并释放文件资源。并且可同时打开多个文件 read()、readline()、readlines()

一、read方法

　　特点是：读取整个文件，将文件内容放到一个字符串变量中。

　　劣势是：如果文件非常大，尤其是大于内存时，无法使用read()方法。

read()直接读取字节到字符串中，包括了换行符

二、readline方法

　　特点：readline()方法每次读取一行；返回的是一个字符串对象，保持当前行的内存

　　缺点：比readlines慢得多

readline() 读取整行，包括行结束符，并作为字符串返回

三、readlines方法

特点：一次性读取整个文件；自动将文件内容分析成一个行的列表。

readlines()读取所有行然后把它们作为一个字符串列表返回。

6. 字符编码与转码

1.在python2默认编码是ASCII, python3里默认是unicode

2.unicode 分为 utf-32(占4个字节),utf-16(占两个字节)，utf-8(占1-4个字节)， so utf-16就是现在最常用的unicode版本，不过在文件里存的还是utf-8，因为utf8省空间

3.在py3中encode,在转码的同时还会把string 变成bytes类型，decode在解码的同时还会把bytes变回string

上图仅适用于py2

import sys
print(sys.getdefaultencoding())

msg = "我爱北京天安门"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")

print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)

#-*-coding:gb2312 -*-   #这个也可以去掉

import sys
print(sys.getdefaultencoding())

msg = "我爱北京天安门"
#msg_gb2312 = msg.decode("utf-8").encode("gb2312")
msg_gb2312 = msg.encode("gb2312") #默认就是unicode,不用再decode,喜大普奔
gb2312_to_unicode = msg_gb2312.decode("gb2312")
gb2312_to_utf8 = msg_gb2312.decode("gb2312").encode("utf-8")

print(msg)
print(msg_gb2312)
print(gb2312_to_unicode)
print(gb2312_to_utf8)

python自动化