pytorch 社区踩坑汇总
pytorch 社区踩坑汇总
文中代码不一定能直接应用,仅仅记录思路
model.eval() 和 with torch.no_grad() 的区别
- model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode.
- torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).
转换 tensor 类型
- tensor_one.float() : converts the tensor_one type to torch.float32
- tensor_one.double() : converts the tensor_one type to torch.float64
- tensor_one.int() : converts the tensor_one type to torch.int32
在指定维度连接 tensor
third_tensor = torch.cat((first_tensor, second_tensor), 0)
序列化模型,加载以便再次训练
@Bixqu You can check the ImageNet Example line 139
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.state_dict(),
'best_prec1': best_prec1,
'optimizer' : optimizer.state_dict(),
}, is_best)
With
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
torch.save(state, filename)
if is_best:
shutil.copyfile(filename, 'model_best.pth.tar')
Loading/Resuming from the dictionary is there
if args.resume:
if os.path.isfile(args.resume):
print("=> loading checkpoint '{}'".format(args.resume))
checkpoint = torch.load(args.resume)
args.start_epoch = checkpoint['epoch']
best_prec1 = checkpoint['best_prec1']
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
print("=> loaded checkpoint '{}' (epoch {})"
.format(args.resume, checkpoint['epoch']))
else:
print("=> no checkpoint found at '{}'".format(args.resume))
释放 GPU 资源
直接清 cuda 缓存: torch.cuda.empty_cache()
把 优化器 扔到 cpu 上:
https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27
I was about to ask a question but I found my issue. Maybe it will help others.
I was on Google Colab and finding that I could train my model several times, but that on the 3rd or 4th time I’d run into the memory error. Using torch.cuda.empty_cache() between runs did not help. All I could do was restart my kernel.
I had a setup of the sort:
class Fitter:
def __init__(self, model):
self.model = model
optimizer = # init optimizer here
The point is that I was carrying the model over in between runs but making a new optimizer (in my case I was making new instances of Fitter). And in my case, the (Adam) optimizer state actually took up more memory than my model!
So to fix it I tried some things.
This did not work:
def wipe_memory(self): # DOES NOT WORK
self.optimizer = None
torch.cuda.empty_cache()
Neither did this:
def wipe_memory(self): # DOES NOT WORK
del self.optimizer
self.optimizer = None
gc.collect()
torch.cuda.empty_cache()
This did work!
def wipe_memory(self): # DOES WORK
self._optimizer_to(torch.device('cpu'))
del self.optimizer
gc.collect()
torch.cuda.empty_cache()
def _optimizer_to(self, device):
for param in self.optimizer.state.values():
# Not sure there are any global tensors in the state dict
if isinstance(param, torch.Tensor):
param.data = param.data.to(device)
if param._grad is not None:
param._grad.data = param._grad.data.to(device)
elif isinstance(param, dict):
for subparam in param.values():
if isinstance(subparam, torch.Tensor):
subparam.data = subparam.data.to(device)
if subparam._grad is not None:
subparam._grad.data = subparam._grad.data.to(device)
I got that optimizer_to function from here
交换坐标轴
a = torch.rand(1,2,3,4)
print(a.transpose(0,3).transpose(1,2).size())
print(a.permute(3,2,1,0).size())
Variables 转 numpy
Variable's can’t be transformed to numpy, because they’re wrappers around tensors that save the operation history, and numpy doesn’t have such objects. You can retrieve a tensor held by the Variable, using the .data attribute. Then, this should work: var.data.numpy().
(Variable(x).data).cpu().numpy()
为啥 transform.Normalize()
Normalize does the following for each channel:
image = (image - mean) / std
The parameters mean, std are passed as 0.5, 0.5 in your case. This will normalize the image in the range [-1,1]. For example, the minimum value 0 will be converted to (0-0.5)/0.5=-1, the maximum value of 1 will be converted to (1-0.5)/0.5=1.
if you would like to get your image back in [0,1] range, you could use,
image = ((image * std) + mean)
Denormalize
pip install kornia
(一个配合 pytorch 使用的可微图像库)
pip install DatasetsHelper==0.0.3
(非必须,只是用于获取normalization的值)
推荐使用:
mean, std = [torch.tensor(i) for i in NormalizeValues('cifar10')()]
# 注意 denormalize的输入是batch,4维的
kornia_img = kornia.enhance.denormalize(t(img).unsqueeze_(0), mean, std)
plt.imshow(kornia_img.squeeze_(0).permute(1, 2, 0))
简单示例
from matplotlib import pyplot as plt
from torchvision.utils import make_grid
from torchvision.transforms import transforms as T
import torch
from DatasetsHelperQ import get_dataset_mean_std
from DatasetsHelperQ import tensor_to_rgb_image_without_normalization
import kornia
from torchvision import datasets
from torch.utils.data import Dataset
train_transform = T.Compose([T.ToTensor()])
train_set = datasets.CIFAR10(root="./cifar10", train=True, download=True, transform=train_transform)
train_iter = iter(train_set)
img, _ = next(train_iter)
# torch.manual_seed(1234)
# img = torch.randn(3, 4, 4).abs_()
# print(img)
# print(img.type)
t = T.Compose([
# T.ToPILImage(),
# T.ToTensor(),
T.Normalize(*NormalizeValues('cifar10')()),
])
# batch = torch.cat([torch.unsqueeze(img, 0), torch.unsqueeze(t(img), 0)], 0)
# new_img = make_grid([img, t(img)])
# print(new_img.shape)
pil = T.ToPILImage()
plt.figure(figsize=(16, 16))
plt.subplot(141)
plt.title("ORIG")
plt.xticks([])
plt.yticks([])
# print(img.permute(1, 2, 0))
plt.imshow(img.permute(1, 2, 0).numpy())
# plt.imshow(pil(img))
plt.subplot(143)
plt.title("kornia_denorm")
plt.xticks([])
plt.yticks([])
# temp = T.ToPILImage(img.double().div_(255))()
# print(img.double().div_(255))
mean, std = [torch.tensor(i) for i in NormalizeValues('cifar10')()]
kornia_img = kornia.enhance.denormalize(t(img).unsqueeze_(0), mean, std)
plt.imshow(kornia_img.squeeze_(0).permute(1, 2, 0))
# plt.imshow(pil(kornia_img.squeeze_(0)))
plt.subplot(142)
plt.title("Norm")
plt.xticks([])
plt.yticks([])
unorm = UnNormalize(*NormalizeValues('cifar10')())
# plt.imshow(t(img).permute(1, 2, 0))
plt.imshow(pil(t(img)))
plt.subplot(144)
plt.title("PIL")
plt.xticks([])
plt.yticks([])
plt.imshow(pil(img))
pretrained model 基于啥 dataset
Imagenet-12
加载部分 pretrained model
After model_dict.update(pretrained_dict)
, the model_dict
may still have keys that pretrained_model
doesn’t have, which will cause a error.
Assume following situation:
pretrained_dict: ['A', 'B', 'C', 'D']
model_dict: ['A', 'B', 'C', 'E']
After pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict} and model_dict.update(pretrained_dict), they are:
pretrained_dict: ['A', 'B', 'C']
model_dict: ['A', 'B', 'C', 'E']
So when performing model.load_state_dict(pretrained_dict)
, model_dict
still has key E that pretrained_dict
doen’t have.
So how about using model.load_state_dict(model_dict)
instead of model.load_state_dict(pretrained_dict)
?
The complete snippet is therefore as follow:
pretrained_dict = ...
model_dict = model.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict)
# 3. load the new state dict
model.load_state_dict(model_dict)
numpy 转 tensor 的最佳方式
When you are on GPU, torch.Tensor()
will convert your data type to Float.
Actually, torch.Tensor
and torch.FloatTensor
both do same thing.
But I think better way is using torch.tensor()
(note the case of ‘t’ character). It converts your data to tensor but retains data type which is crucial in some methods. You may know that PyTorch and numpy are switchable to each other so if your array is int, your tensor should be int too unless you explicitly change type.
But on top of all these, torch.tensor
is convention because you can define following variables: device, dtype, requires_grad, etc.
Note: using torch.tensor()
allocates new memory to copy the data of tensor. So if you want to avoid copying, use torch.as_tensor(numpy_ndarray)
.
PIL ? Tensor
pil_img = Image.open(img)
print(pil_img.size)
pil_to_tensor = transforms.ToTensor()(img).unsqueeze_(0)
print(pil_to_tensor.shape)
tensor_to_pil = transforms.ToPILImage()(pil_to_tensor.squeeze_(0))
print(tensor_to_pil.size)
只使用特定 gpu
CUDA_VISIBLE_DEVICES=1,2 python myscript.py
如何提取特征图
once you have a trained model, if you want to extract the result of an intermediate layer (say fc7 after the relu), you have a couple of possibilities.
You can either reconstruct the classifier once the model was instantiated, as in the following example:
import torch
import torch.nn as nn
from torchvision import models
model = models.alexnet(pretrained=True)
# remove last fully-connected layer
new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])
model.classifier = new_classifier
Or, if instead you want to extract other parts of the model, you might need to recreate the model structure, and reusing the parts of the pre-trained model in the new model.
import torch
import torch.nn as nn
from torchvision import models
original_model = models.alexnet(pretrained=True)
class AlexNetConv4(nn.Module):
def __init__(self):
super(AlexNetConv4, self).__init__()
self.features = nn.Sequential(
# stop at conv4
*list(original_model.features.children())[:-3]
)
def forward(self, x):
x = self.features(x)
return x
model = AlexNetConv4()
Training with Half Precision
直接看论坛
https://discuss.pytorch.org/t/training-with-half-precision/11815/2
pytorch 数据增广
imgaug
log_softmax or softmax?
推荐log版更稳定
How are optimizer.step() and loss.backward() related?
placeholder
Convert int into one-hot format
https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507/4
nn.ModuleList
& nn.Sequential()
nn.ModuleList
就像一个 python 列表,用于存储 nn.Module,使用案例如下
class LinearNet(nn.Module):
def __init__(self, input_size, num_layers, layers_size, output_size):
super(LinearNet, self).__init__()
self.linears = nn.ModuleList([nn.Linear(input_size, layers_size)])
self.linears.extend([nn.Linear(layers_size, layers_size) for i in range(1, self.num_layers-1)])
self.linears.append(nn.Linear(layers_size, output_size)
nn.Sequential
可以按顺序构建一个神经网络
class Flatten(nn.Module):
def forward(self, x):
N, C, H, W = x.size() # read in N, C, H, W
return x.view(N, -1)
simple_cnn = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=7, stride=2),
nn.ReLU(inplace=True),
Flatten(),
nn.Linear(5408, 10),
)
Not really. Maybe there are some situations where you could use both, but the main idea is the following:
In nn.Sequential, the nn.Module's stored inside are connected in a cascaded way. For instance, in the example that I gave, I define a neural network that receives as input an image with 3 channels and outputs 10 neurons. That network is composed by the following blocks, in the following order: Conv2D -> ReLU -> Linear layer. Moreover, an object of type nn.Sequential has a forward() method, so if I have an input image x I can directly call y = simple_cnn(x) to obtain the scores for x. When you define an nn.Sequential you must be careful to make sure that the output size of a block matches the input size of the following block. Basically, it behaves just like a nn.Module
On the other hand, nn.ModuleList does not have a forward() method, because it does not define any neural network, that is, there is no connection between each of the nn.Module's that it stores. You may use it to store nn.Module's, just like you use Python lists to store other types of objects (integers, strings, etc). The advantage of using nn.ModuleList's instead of using conventional Python lists to store nn.Module's is that Pytorch is “aware” of the existence of the nn.Module's inside an nn.ModuleList, which is not the case for Python lists. If you want to understand exactly what I mean, just try to redefine my class LinearNet using a Python list instead of a nn.ModuleList and train it. When defining the optimizer() for that net, you’ll get an error saying that your model has no parameters, because PyTorch does not see the parameters of the layers stored in a Python list. If you use a nn.ModuleList instead, you’ll get no error.
optim.zero_grad()
optimizer.zero_grad()意思是把梯度置零,也就是把loss关于weight的导数变成0.在学习pytorch的时候注意到,对于每个batch大都执行了这样的操作:
optimizer.zero_grad() ## 梯度清零
preds = model(inputs) ## inference
loss = criterion(preds, targets) ## 求解loss
loss.backward() ## 反向传播求解梯度
optimizer.step() ## 更新权重参数
- 由于pytorch的动态计算图,当我们使用loss.backward()和opimizer.step()进行梯度下降更新参数的时候,梯度并不会自动清零。并且这两个操作是独立操作。
- backward():反向传播求解梯度。
- step():更新权重参数。
基于以上几点,正好说明了pytorch的一个特点是每一步都是独立功能的操作,因此也就有需要梯度清零的说法,如若不显式地进 optimizer.zero_grad()这一步操作,backward()的时候就会累加梯度。
tensor.cuda()
该方法不会直接把原始 tensor 放到 gpu 中:
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: True
## | 100%
In [3]: a = torch.zeros(1,2,3,4) ## | 100%
## | 100%
In [4]: a
Out[4]:
tensor([[[[0., 0., 0., 0.],
[0., 0., 0., 0.], [11:05]
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]]])
In [5]: a.cuda()
Out[5]:
tensor([[[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]]], device='cuda:0')
In [6]: a
Out[6]:
tensor([[[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]]])