Global feature vector

Residual Attention: A Simple but Effective Method for Multi-Label Recognition
According to the paper, the base_logit (denoted as g in the paper) should be computing the global feature vector by averaging the features over all spatial locations. This is stated in the equation:

$$
\mathbf{g}=\frac{1}{49} \sum_{k=1}^{49} \mathbf{x}_{k}
$$

Here, xk​ represents the feature at location k, and we sum over all locations (49 in this case) and then take the average. This operation is class-agnostic, meaning it’s not specific to any class and is the same for all classes. The global feature vector g represents the overall content of the image, irrespective of specific classes. It serves as a baseline representation of the image content.

In my Implementation i have this: 

```python
     def forward(self, x):
        B, _, H, W = x.size()  # batch size, _, height, width

        # Compute class-specific attention scores
        logits = self.classifier(x)  # size: (B, C, H, W)
        logits = logits.view(B, self.C, -1)  # size: (B, C, H*W)

        # Compute class-specific feature vectors
        x_flatten = x.view(B, self.d, -1)  # size: (B, d, H*W)

        # Compute global feature vector
        g = torch.mean(x_flatten, dim=2)  # size: (B, d) 
```        
        
        
        
I am computing base_logit (or g) as per the paper’s method. The original implementation seems to be computing something different for base_logit, which doesn’t align with the paper’s description. It’s computing the average class-specific score for each class across all spatial locations, which is not what g represents according to the paper.

This is the original implementation: 

```python
      def forward(self, x):
         # x (B d H W)
         # normalize classifier
         # score (B C HxW)
         score = self.head(x) / torch.norm(self.head.weight, dim=1, keepdim=True).transpose(0,1)
         score = score.flatten(2)
         base_logit = torch.mean(score, dim=2)  # size: (B, C) 
```

Is there a reason for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global feature vector #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Global feature vector #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions