抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

正在更新中……

秋招在即,用这篇博客记录一下算法岗求职过程中的一些必备知识汇总。

CNN

  • CNN的感受野

  • CNN的参数量计算

    Cin×Cout×K×K+CoutC_{in}\times C_{out}\times K\times K + C_{out},最后一项是偏置项。
  • Numpy手搓卷积

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    def filter2d(image, kernel):
    # 获取输入图像和卷积核的维度
    image_height, image_width = image.shape
    kernel_height, kernel_width = kernel.shape

    # 计算滤波结果的维度
    output_height = image_height - kernel_height + 1
    output_width = image_width - kernel_width + 1

    # 初始化滤波结果数组
    filtered_image = np.zeros((output_height, output_width))

    # 进行滤波操作(实际上是二维卷积操作)
    for i in range(output_height):
    for j in range(output_width):
    # 提取当前窗口的像素值
    window = image[i:i+kernel_height, j:j+kernel_width]
    # 计算当前位置的卷积和
    filtered_image[i, j] = np.sum(window * kernel)

    return filtered_image

    # 定义一个图像和卷积核(滤波器)
    image = np.array([[1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]])
    kernel = np.array([[1, 0],
    [0, -1]])

    # 应用滤波器
    filtered_image = filter2d(image, kernel)

    print(filtered_image)

目标检测

  • IOU计算及手写

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    def bbox_iou(box1, box2):
    """
    Calculate the Intersection over Union (IoU) of two bounding boxes.
    :param box1: (x1, y1, x2, y2) - coordinates of the first bounding box
    :param box2: (x1, y1, x2, y2) - coordinates of the second bounding box
    :return: IoU of the two bounding boxes
    """

    # Determine the coordinates of the intersection rectangle
    x1_inter = max(box1[0], box2[0])
    y1_inter = max(box1[1], box2[1])
    x2_inter = min(box1[2], box2[2])
    y2_inter = min(box1[3], box2[3])

    # Compute the area of intersection
    width_inter = max(0, x2_inter - x1_inter)
    height_inter = max(0, y2_inter - y1_inter)
    area_inter = width_inter * height_inter

    # Compute the area of both bounding boxes
    area_box1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area_box2 = (box2[2] - box2[0]) * (box2[3] - box2[1])

    # Compute the intersection over union by taking the intersection
    # area and dividing it by the sum of both areas minus the intersection area
    iou = area_inter / float(area_box1 + area_box2 - area_inter)

    return iou

    # Example usage:
    box1 = (1, 1, 4, 4)
    box2 = (2, 2, 5, 5)
    iou = bbox_iou(box1, box2)
    print(f"IoU: {iou}")
  • NMS描述及手写

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def nms(boxes, scores, iou_threshold):
"""
Perform Non-Maximum Suppression (NMS) on the given bounding boxes.
:param boxes: a list of bounding boxes (each box is [x1, y1, x2, y2])
:param scores: a list of scores for each bounding box
:param iou_threshold: a float representing the IoU threshold for NMS
:return: a list of indices of the selected bounding boxes
"""
# Get the indices of the boxes sorted by their scores in descending order
indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)

# List to hold the indices of the selected boxes
selected_indices = []

while indices:
# Pick the box with the highest score and add its index to the list of selected indices
current_index = indices[0]
selected_indices.append(current_index)

# Compute the IoU of the selected box with the rest of the boxes
ious = [bbox_iou(boxes[current_index], boxes[i]) for i in indices[1:]]

# Remove the indices of the boxes that have a high IoU with the selected box
indices = [i for i, io in zip(indices[1:], ious) if io < iou_threshold]

return selected_indices

# Example usage:
boxes = [
[1, 1, 4, 4],
[2, 2, 5, 5],
[5, 5, 6, 6],
[10, 10, 12, 12]
]
scores = [0.9, 0.75, 0.6, 0.95]
iou_threshold = 0.5

selected_boxes = nms(boxes, scores, iou_threshold)
print(f"Selected boxes: {selected_boxes}")

Focal Loss

  • Focal loss解决的问题

    类别样本不均衡或Hard examples学习不好。
    普通CE对Well-classified的sample的loss依旧很大,并且通常这些sample很多(background),这导致模型对hard example的梯度反传较小。

  • Focal loss公式

    FL(pt)=αt(1pt)γlog(pt)FL(p_t) = -\alpha_t(1-p_t)^\gamma log(p_t)

    通常,γ\gamma取2时效果好;αt\alpha_t在正样本中取0.25,负样本中取0.75。

  • 手撕Focal loss

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from torch.autograd import Variable

    class FocalLoss(nn.Module):
    def __init__(self, gamma=0, alpha=None, size_average=True):
    super(FocalLoss, self).__init__()
    self.gamma = gamma
    self.alpha = alpha
    if isinstance(alpha,(float,int,long)): self.alpha = torch.Tensor([alpha,1-alpha])
    if isinstance(alpha,list): self.alpha = torch.Tensor(alpha)
    self.size_average = size_average

    def forward(self, input, target):
    if input.dim()>2:
    input = input.view(input.size(0),input.size(1),-1) # N,C,H,W => N,C,H*W
    input = input.transpose(1,2) # N,C,H*W => N,H*W,C
    input = input.contiguous().view(-1,input.size(2)) # N,H*W,C => N*H*W,C
    target = target.view(-1,1)

    logpt = F.log_softmax(input)
    logpt = logpt.gather(1,target)
    logpt = logpt.view(-1)
    pt = Variable(logpt.data.exp())

    if self.alpha is not None:
    if self.alpha.type()!=input.data.type():
    self.alpha = self.alpha.type_as(input.data)
    at = self.alpha.gather(0,target.data.view(-1))
    logpt = logpt * Variable(at)

    loss = -1 * (1-pt)**self.gamma * logpt
    if self.size_average: return loss.mean()
    else: return loss.sum()

ViT

  • ViT的结构描述
  • 将图片patchify成P*P的patch,共N个patch。将每个patch进行flatten之后过一层线性层之后得到embedding。
  • 与BERT类似,在patched embedding序列开头附加一个可学习的[class] token,来表示整个图片representation。
  • 使用的位置编码为learnable 1D position embedding。
  • 整体的结构为Transformer Encoder。
  • 最后将[class] token的embedding过分类头。
  • 微调时通常会使用更高分辨率,此时保持patch size不变,这样sequence length会变大,position embedding会不够用。文中采取的做法是进行2D插值拟合得到position embedding。
  • 手撕ViT的patchify
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    class PatchEmbed(nn.Module):
    """
    2D Image to Patch Embedding
    """
    def __init__(self, img_size=224, patch_size=16, in_c=3, embed_dim=768, norm_layer=None):
    super().__init__()
    img_size = (img_size, img_size)
    patch_size = (patch_size, patch_size)
    self.img_size = img_size
    self.patch_size = patch_size
    self.grid_size = (img_size[0] // patch_size[0], img_size[1] // patch_size[1])
    self.num_patches = self.grid_size[0] * self.grid_size[1]

    self.proj = nn.Conv2d(in_c, embed_dim, kernel_size=patch_size, stride=patch_size)
    self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()

    def forward(self, x):
    B, C, H, W = x.shape
    assert H == self.img_size[0] and W == self.img_size[1], \
    f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."

    # flatten: [B, C, H, W] -> [B, C, HW]
    # transpose: [B, C, HW] -> [B, HW, C]
    x = self.proj(x).flatten(2).transpose(1, 2)
    x = self.norm(x)
    return x

SwinT

自监督预训练

MAE

BEiT

MoCo

SimCLR

DINO

SAM