pytorch 社区踩坑汇总


pytorch 社区踩坑汇总

文中代码不一定能直接应用,仅仅记录思路

model.eval() 和 with torch.no_grad() 的区别

  • model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode.
  • torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

转换 tensor 类型

  • tensor_one.float() : converts the tensor_one type to torch.float32
  • tensor_one.double() : converts the tensor_one type to torch.float64
  • tensor_one.int() : converts the tensor_one type to torch.int32

在指定维度连接 tensor

third_tensor = torch.cat((first_tensor, second_tensor), 0)

序列化模型,加载以便再次训练

@Bixqu You can check the ImageNet Example line 139

        save_checkpoint({
            'epoch': epoch + 1,
            'arch': args.arch,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
            'optimizer' : optimizer.state_dict(),
        }, is_best)

With

def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, 'model_best.pth.tar')

Loading/Resuming from the dictionary is there

    if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.resume))

释放 GPU 资源

直接清 cuda 缓存: torch.cuda.empty_cache()

把 优化器 扔到 cpu 上:

https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27
I was about to ask a question but I found my issue. Maybe it will help others.

I was on Google Colab and finding that I could train my model several times, but that on the 3rd or 4th time I’d run into the memory error. Using torch.cuda.empty_cache() between runs did not help. All I could do was restart my kernel.

I had a setup of the sort:

class Fitter:
    def __init__(self, model):
        self.model = model
        optimizer = # init optimizer here

The point is that I was carrying the model over in between runs but making a new optimizer (in my case I was making new instances of Fitter). And in my case, the (Adam) optimizer state actually took up more memory than my model!

So to fix it I tried some things.

This did not work:

def wipe_memory(self): # DOES NOT WORK
    self.optimizer = None
    torch.cuda.empty_cache()

Neither did this:

def wipe_memory(self): # DOES NOT WORK
    del self.optimizer
    self.optimizer = None
    gc.collect()
    torch.cuda.empty_cache()

This did work!

def wipe_memory(self): # DOES WORK
    self._optimizer_to(torch.device('cpu'))
    del self.optimizer
    gc.collect()
    torch.cuda.empty_cache()

def _optimizer_to(self, device):
    for param in self.optimizer.state.values():
        # Not sure there are any global tensors in the state dict
        if isinstance(param, torch.Tensor):
            param.data = param.data.to(device)
            if param._grad is not None:
                param._grad.data = param._grad.data.to(device)
        elif isinstance(param, dict):
            for subparam in param.values():
                if isinstance(subparam, torch.Tensor):
                    subparam.data = subparam.data.to(device)
                    if subparam._grad is not None:
                        subparam._grad.data = subparam._grad.data.to(device)

I got that optimizer_to function from here

交换坐标轴

a = torch.rand(1,2,3,4)
print(a.transpose(0,3).transpose(1,2).size())
print(a.permute(3,2,1,0).size())

Variables 转 numpy

Variable's can’t be transformed to numpy, because they’re wrappers around tensors that save the operation history, and numpy doesn’t have such objects. You can retrieve a tensor held by the Variable, using the .data attribute. Then, this should work: var.data.numpy().

(Variable(x).data).cpu().numpy()

为啥 transform.Normalize()

Normalize does the following for each channel:

image = (image - mean) / std

The parameters mean, std are passed as 0.5, 0.5 in your case. This will normalize the image in the range [-1,1]. For example, the minimum value 0 will be converted to (0-0.5)/0.5=-1, the maximum value of 1 will be converted to (1-0.5)/0.5=1.

if you would like to get your image back in [0,1] range, you could use,

image = ((image * std) + mean)

Denormalize

pip install kornia (一个配合 pytorch 使用的可微图像库)

pip install DatasetsHelper==0.0.3 (非必须,只是用于获取normalization的值)

推荐使用:

mean, std = [torch.tensor(i) for i in NormalizeValues('cifar10')()]
# 注意 denormalize的输入是batch,4维的
kornia_img = kornia.enhance.denormalize(t(img).unsqueeze_(0), mean, std)
plt.imshow(kornia_img.squeeze_(0).permute(1, 2, 0))

简单示例

from matplotlib import pyplot as plt

from torchvision.utils import make_grid
from torchvision.transforms import transforms as T
import torch
from DatasetsHelperQ import get_dataset_mean_std
from DatasetsHelperQ import tensor_to_rgb_image_without_normalization
import kornia
from torchvision import datasets
from torch.utils.data import Dataset

train_transform = T.Compose([T.ToTensor()])
train_set = datasets.CIFAR10(root="./cifar10", train=True, download=True, transform=train_transform)
train_iter = iter(train_set)
img, _ = next(train_iter)
# torch.manual_seed(1234)
# img = torch.randn(3, 4, 4).abs_()
# print(img)
# print(img.type)
t = T.Compose([
    # T.ToPILImage(),
    # T.ToTensor(),
    T.Normalize(*NormalizeValues('cifar10')()),
])
# batch = torch.cat([torch.unsqueeze(img, 0), torch.unsqueeze(t(img), 0)], 0)
# new_img = make_grid([img, t(img)])
# print(new_img.shape)
pil = T.ToPILImage()

plt.figure(figsize=(16, 16))

plt.subplot(141)
plt.title("ORIG")
plt.xticks([])
plt.yticks([])
# print(img.permute(1, 2, 0))
plt.imshow(img.permute(1, 2, 0).numpy())
# plt.imshow(pil(img))


plt.subplot(143)
plt.title("kornia_denorm")
plt.xticks([])
plt.yticks([])
# temp = T.ToPILImage(img.double().div_(255))()
# print(img.double().div_(255))
mean, std = [torch.tensor(i) for i in NormalizeValues('cifar10')()]
kornia_img = kornia.enhance.denormalize(t(img).unsqueeze_(0), mean, std)
plt.imshow(kornia_img.squeeze_(0).permute(1, 2, 0))
# plt.imshow(pil(kornia_img.squeeze_(0)))

plt.subplot(142)
plt.title("Norm")
plt.xticks([])
plt.yticks([])
unorm = UnNormalize(*NormalizeValues('cifar10')())
# plt.imshow(t(img).permute(1, 2, 0))
plt.imshow(pil(t(img)))

plt.subplot(144)
plt.title("PIL")
plt.xticks([])
plt.yticks([])
plt.imshow(pil(img))

pretrained model 基于啥 dataset

Imagenet-12

加载部分 pretrained model

After model_dict.update(pretrained_dict), the model_dict may still have keys that pretrained_model doesn’t have, which will cause a error.

Assume following situation:

pretrained_dict: ['A', 'B', 'C', 'D']
model_dict: ['A', 'B', 'C', 'E']

After pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict} and model_dict.update(pretrained_dict), they are:

pretrained_dict: ['A', 'B', 'C']
model_dict: ['A', 'B', 'C', 'E']

So when performing model.load_state_dict(pretrained_dict), model_dict still has key E that pretrained_dict doen’t have.

So how about using model.load_state_dict(model_dict) instead of model.load_state_dict(pretrained_dict)?

The complete snippet is therefore as follow:

pretrained_dict = ...
model_dict = model.state_dict()

# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict) 
# 3. load the new state dict
model.load_state_dict(model_dict)

numpy 转 tensor 的最佳方式

When you are on GPU, torch.Tensor() will convert your data type to Float.
Actually, torch.Tensor and torch.FloatTensor both do same thing.
But I think better way is using torch.tensor() (note the case of ‘t’ character). It converts your data to tensor but retains data type which is crucial in some methods. You may know that PyTorch and numpy are switchable to each other so if your array is int, your tensor should be int too unless you explicitly change type.

But on top of all these, torch.tensor is convention because you can define following variables: device, dtype, requires_grad, etc.

Note: using torch.tensor() allocates new memory to copy the data of tensor. So if you want to avoid copying, use torch.as_tensor(numpy_ndarray).

PIL ? Tensor

pil_img = Image.open(img)
print(pil_img.size)

pil_to_tensor = transforms.ToTensor()(img).unsqueeze_(0)
print(pil_to_tensor.shape)

tensor_to_pil = transforms.ToPILImage()(pil_to_tensor.squeeze_(0))
print(tensor_to_pil.size)

只使用特定 gpu

CUDA_VISIBLE_DEVICES=1,2 python myscript.py

如何提取特征图

once you have a trained model, if you want to extract the result of an intermediate layer (say fc7 after the relu), you have a couple of possibilities.

You can either reconstruct the classifier once the model was instantiated, as in the following example:

import torch
import torch.nn as nn
from torchvision import models

model = models.alexnet(pretrained=True)

# remove last fully-connected layer
new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])
model.classifier = new_classifier

Or, if instead you want to extract other parts of the model, you might need to recreate the model structure, and reusing the parts of the pre-trained model in the new model.

import torch
import torch.nn as nn
from torchvision import models

original_model = models.alexnet(pretrained=True)

class AlexNetConv4(nn.Module):
            def __init__(self):
                super(AlexNetConv4, self).__init__()
                self.features = nn.Sequential(
                    # stop at conv4
                    *list(original_model.features.children())[:-3]
                )
            def forward(self, x):
                x = self.features(x)
                return x

model = AlexNetConv4()

Training with Half Precision

直接看论坛
https://discuss.pytorch.org/t/training-with-half-precision/11815/2

pytorch 数据增广

imgaug

log_softmax or softmax?


推荐log版更稳定

placeholder

Convert int into one-hot format

https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507/4

nn.ModuleList & nn.Sequential()

nn.ModuleList 就像一个 python 列表,用于存储 nn.Module,使用案例如下

class LinearNet(nn.Module):
  def __init__(self, input_size, num_layers, layers_size, output_size):
     super(LinearNet, self).__init__()

     self.linears = nn.ModuleList([nn.Linear(input_size, layers_size)])
     self.linears.extend([nn.Linear(layers_size, layers_size) for i in range(1, self.num_layers-1)])
     self.linears.append(nn.Linear(layers_size, output_size)

nn.Sequential 可以按顺序构建一个神经网络

class Flatten(nn.Module):
  def forward(self, x):
    N, C, H, W = x.size() # read in N, C, H, W
    return x.view(N, -1)

simple_cnn = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=7, stride=2),
            nn.ReLU(inplace=True),
            Flatten(), 
            nn.Linear(5408, 10),
          )

Not really. Maybe there are some situations where you could use both, but the main idea is the following:

In nn.Sequential, the nn.Module's stored inside are connected in a cascaded way. For instance, in the example that I gave, I define a neural network that receives as input an image with 3 channels and outputs 10 neurons. That network is composed by the following blocks, in the following order: Conv2D -> ReLU -> Linear layer. Moreover, an object of type nn.Sequential has a forward() method, so if I have an input image x I can directly call y = simple_cnn(x) to obtain the scores for x. When you define an nn.Sequential you must be careful to make sure that the output size of a block matches the input size of the following block. Basically, it behaves just like a nn.Module

On the other hand, nn.ModuleList does not have a forward() method, because it does not define any neural network, that is, there is no connection between each of the nn.Module's that it stores. You may use it to store nn.Module's, just like you use Python lists to store other types of objects (integers, strings, etc). The advantage of using nn.ModuleList's instead of using conventional Python lists to store nn.Module's is that Pytorch is “aware” of the existence of the nn.Module's inside an nn.ModuleList, which is not the case for Python lists. If you want to understand exactly what I mean, just try to redefine my class LinearNet using a Python list instead of a nn.ModuleList and train it. When defining the optimizer() for that net, you’ll get an error saying that your model has no parameters, because PyTorch does not see the parameters of the layers stored in a Python list. If you use a nn.ModuleList instead, you’ll get no error.

optim.zero_grad()

optimizer.zero_grad()意思是把梯度置零,也就是把loss关于weight的导数变成0.在学习pytorch的时候注意到,对于每个batch大都执行了这样的操作:

optimizer.zero_grad()             ## 梯度清零
preds = model(inputs)             ## inference
loss = criterion(preds, targets)  ## 求解loss
loss.backward()                   ## 反向传播求解梯度
optimizer.step()                  ## 更新权重参数
  1. 由于pytorch的动态计算图,当我们使用loss.backward()和opimizer.step()进行梯度下降更新参数的时候,梯度并不会自动清零。并且这两个操作是独立操作。
  2. backward():反向传播求解梯度。
  3. step():更新权重参数。

基于以上几点,正好说明了pytorch的一个特点是每一步都是独立功能的操作,因此也就有需要梯度清零的说法,如若不显式地进 optimizer.zero_grad()这一步操作,backward()的时候就会累加梯度。

tensor.cuda()

该方法不会直接把原始 tensor 放到 gpu 中:

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True
                                                                                                                                                      ## | 100%  
In [3]: a = torch.zeros(1,2,3,4)                                                                                                                      ## | 100%  
                                                                                                                                                      ## | 100%  
In [4]: a
Out[4]: 
tensor([[[[0., 0., 0., 0.],
          [0., 0., 0., 0.],                                                                                                                             [11:05]
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]]]])

In [5]: a.cuda()
Out[5]: 
tensor([[[[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]]]], device='cuda:0')

In [6]: a
Out[6]: 
tensor([[[[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]]]])