正在更新中……
秋招在即,用这篇博客记录一下算法岗求职过程中的一些必备知识汇总。
CNN
CNN的感受野
CNN的参数量计算
,最后一项是偏置项。
Numpy手搓卷积
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33def filter2d(image, kernel):
# 获取输入图像和卷积核的维度
image_height, image_width = image.shape
kernel_height, kernel_width = kernel.shape
# 计算滤波结果的维度
output_height = image_height - kernel_height + 1
output_width = image_width - kernel_width + 1
# 初始化滤波结果数组
filtered_image = np.zeros((output_height, output_width))
# 进行滤波操作(实际上是二维卷积操作)
for i in range(output_height):
for j in range(output_width):
# 提取当前窗口的像素值
window = image[i:i+kernel_height, j:j+kernel_width]
# 计算当前位置的卷积和
filtered_image[i, j] = np.sum(window * kernel)
return filtered_image
# 定义一个图像和卷积核(滤波器)
image = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
kernel = np.array([[1, 0],
[0, -1]])
# 应用滤波器
filtered_image = filter2d(image, kernel)
print(filtered_image)
目标检测
IOU计算及手写
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34def bbox_iou(box1, box2):
"""
Calculate the Intersection over Union (IoU) of two bounding boxes.
:param box1: (x1, y1, x2, y2) - coordinates of the first bounding box
:param box2: (x1, y1, x2, y2) - coordinates of the second bounding box
:return: IoU of the two bounding boxes
"""
# Determine the coordinates of the intersection rectangle
x1_inter = max(box1[0], box2[0])
y1_inter = max(box1[1], box2[1])
x2_inter = min(box1[2], box2[2])
y2_inter = min(box1[3], box2[3])
# Compute the area of intersection
width_inter = max(0, x2_inter - x1_inter)
height_inter = max(0, y2_inter - y1_inter)
area_inter = width_inter * height_inter
# Compute the area of both bounding boxes
area_box1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
area_box2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
# Compute the intersection over union by taking the intersection
# area and dividing it by the sum of both areas minus the intersection area
iou = area_inter / float(area_box1 + area_box2 - area_inter)
return iou
# Example usage:
box1 = (1, 1, 4, 4)
box2 = (2, 2, 5, 5)
iou = bbox_iou(box1, box2)
print(f"IoU: {iou}")NMS描述及手写
1 | def nms(boxes, scores, iou_threshold): |
Focal Loss
Focal loss解决的问题
类别样本不均衡或Hard examples学习不好。
普通CE对Well-classified的sample的loss依旧很大,并且通常这些sample很多(background),这导致模型对hard example的梯度反传较小。Focal loss公式
通常,取2时效果好;在正样本中取0.25,负样本中取0.75。
手撕Focal loss
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class FocalLoss(nn.Module):
def __init__(self, gamma=0, alpha=None, size_average=True):
super(FocalLoss, self).__init__()
self.gamma = gamma
self.alpha = alpha
if isinstance(alpha,(float,int,long)): self.alpha = torch.Tensor([alpha,1-alpha])
if isinstance(alpha,list): self.alpha = torch.Tensor(alpha)
self.size_average = size_average
def forward(self, input, target):
if input.dim()>2:
input = input.view(input.size(0),input.size(1),-1) # N,C,H,W => N,C,H*W
input = input.transpose(1,2) # N,C,H*W => N,H*W,C
input = input.contiguous().view(-1,input.size(2)) # N,H*W,C => N*H*W,C
target = target.view(-1,1)
logpt = F.log_softmax(input)
logpt = logpt.gather(1,target)
logpt = logpt.view(-1)
pt = Variable(logpt.data.exp())
if self.alpha is not None:
if self.alpha.type()!=input.data.type():
self.alpha = self.alpha.type_as(input.data)
at = self.alpha.gather(0,target.data.view(-1))
logpt = logpt * Variable(at)
loss = -1 * (1-pt)**self.gamma * logpt
if self.size_average: return loss.mean()
else: return loss.sum()
ViT
- ViT的结构描述
- 将图片patchify成P*P的patch,共N个patch。将每个patch进行flatten之后过一层线性层之后得到embedding。
- 与BERT类似,在patched embedding序列开头附加一个可学习的[class] token,来表示整个图片representation。
- 使用的位置编码为learnable 1D position embedding。
- 整体的结构为Transformer Encoder。
- 最后将[class] token的embedding过分类头。
- 微调时通常会使用更高分辨率,此时保持patch size不变,这样sequence length会变大,position embedding会不够用。文中采取的做法是进行2D插值拟合得到position embedding。
- 手撕ViT的patchify
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26class PatchEmbed(nn.Module):
"""
2D Image to Patch Embedding
"""
def __init__(self, img_size=224, patch_size=16, in_c=3, embed_dim=768, norm_layer=None):
super().__init__()
img_size = (img_size, img_size)
patch_size = (patch_size, patch_size)
self.img_size = img_size
self.patch_size = patch_size
self.grid_size = (img_size[0] // patch_size[0], img_size[1] // patch_size[1])
self.num_patches = self.grid_size[0] * self.grid_size[1]
self.proj = nn.Conv2d(in_c, embed_dim, kernel_size=patch_size, stride=patch_size)
self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()
def forward(self, x):
B, C, H, W = x.shape
assert H == self.img_size[0] and W == self.img_size[1], \
f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
# flatten: [B, C, H, W] -> [B, C, HW]
# transpose: [B, C, HW] -> [B, HW, C]
x = self.proj(x).flatten(2).transpose(1, 2)
x = self.norm(x)
return x