PolyFootNet

For preprint paper PolyFootNet.

PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images

Extract polygonal building footprints and explore the multiple extraction solutions.

0. News

I am currently preparing for my PhD Qualification Exam (QE) at City University of Hong Kong, which runs until 5 August 2025. After the exam, I will fly directly to IGARSS 2025 in Brisbane, Australia, supported by an IGARSS 2025 travel grant. I will give two oral presentations on OBM, the precursor to PolyFootNet, and I warmly invite you all to attend and ask questions. Once IGARSS and a short trip around Australia are over, I will finalize and release the open‑source code and model weights for PolyFootNet. See you in IGARSS 2025.

Time 1: 13:15 - 14:30 6th Aug, 2025. Plaza: Room P5, 3MT Final Round. (Come and Vote!)

Time 2: 10:30 - 10:45 8th Aug, 2025. Mezzanine: Room M2.

0.1. PolyFootNet Model Weight

OneDrive BaiduDisk

0.2. Note for inference

For your convenience, we have now open-sourced the PolyFootNet model. You can directly use the inference configuration file and code from OBM to perform inference tasks. Please note that you will need to modify the data configuration as shown below to help you complete the task and evaluate the correction effect of PolyFootNet.

Although the model weights include those for keypoint prediction, the OBM code cannot generate the polygons due to some code missing, as this algorithm is currently involved in commercial use. If you wish to use functions such as keypoint prediction, please contact me through my personal homepage and state your purpose.

data = dict(
    test=dict(
        bbox_type='roof',
        mask_type='roof',
        )
)

If you want to use our SOFA_vector in various models, you can manually register SOFA in OBM and then call it in your model, and call it in forward_test using the following code:

### Step 1: Replace and add codes in OBM/mmdet/models/roi_heads/twoway_mask_offset_decoder.py as follows:
@HEADS.register_module()
class MaskDecoder_seg(nn.Module):
    def __init__(
        self,
        *,
        transformer_dim: int,
        transformer: nn.Module,
        num_multimask_outputs: int = 3,
        sofa_head= None, ##### add this line in setting
        loss_masks=dict(
            type='SAMHQLoss',),
        loss_offset=dict(type='SmoothL1Loss', loss_weight=8*2.0),
        offset_coder=dict(
            type='DeltaXYOffsetCoder_Transformer',
            image_size = (200,200),
            target_means=[0.0, 0.0],
            target_stds=[0.5, 0.5]),
        iou_head_depth: int = 3,
        iou_head_hidden_dim: int = 256,
        offset_aug = None,
        hidden_dim = 256,
        # sofa_head=None,
    ) :
########### add following code #######
        if sofa_head is not None:
            self.sofa_head_base = build_head(sofa_head)
        else:
            self.sofa_head_base = None

     def forward_test(
                self,
                image_embeddings,
                image_pe,
                sparse_prompt_embeddings,
                dense_prompt_embeddings,
    ):
        masks, prob, offset, aug_offsets = self.predict_offset_masks(image_embeddings, image_pe, sparse_prompt_embeddings, dense_prompt_embeddings,)
        if self.sofa_head_base is not None:
            aug_offsets.append(offset)
            out_offsets = self.sofa_head_base(torch.cat(aug_offsets,0))
            out_offsets = out_offsets.reshape(self.offset_aug_length+1, -1, 2)
            aug_offsets = [out_offsets[i] for i in range(self.offset_aug_length)]
            offset = out_offsets[-1]
        offset = self.offset_coder.decode(offset)  
        
        if self.offset_aug_length:
            offset_indicator = torch.norm(offset, dim=1)>10
            aug_offsets = [self.offset_aug_coder[i].decode(aug_offsets[i]) for i in range(self.offset_aug_length)]
            # aug_offsets.append(offset)
            # 按照offset的大小排序
            selector = [coder.image_size[0]>self.offset_coder.image_size[0] for coder in self.offset_aug_coder]
            # offset[offset_indicator] = (sum(aug_offsets[selector.index(True)][offset_indicator])+offset[offset_indicator])/(1+sum(selector))
            # offset[~offset_indicator] = (sum(aug_offsets[selector.index(False)][~offset_indicator])+offset[~offset_indicator])/(1+sum(selector.index(False)))
            for idx, sl in enumerate(selector):
                if sl:
                    offset[offset_indicator] += aug_offsets[idx][offset_indicator]
                else:
                    offset[~offset_indicator] += aug_offsets[idx][~offset_indicator]
            offset[offset_indicator] = offset[offset_indicator]/(1+sum(selector))
            offset[~offset_indicator] = offset[~offset_indicator]/(1+len(selector)-sum(selector))  
            
        return offset, prob, masks

### Step 2: add the sofa_head.py to OBM/mmdet/models/dense_heads/sofa_head.py, and include the sofa in OBM/mmdet/models/dense_heads/__init__.py

### Step 3: Turn on the `sofa_vector` in YOUR/CONFIG.py as:
model = dict(
     mask_decoder = dict(
        type = 'MaskDecoder_seg',
        sofa_head=dict(
                type='SOFA_vector',
                trainable=False),)
)

1. Paper Contributions

Proposed the first polygonal building footprint extraction (BFE) network for the off-nadir scenery.
Explored the multiple solutions to Building Footprint Extraction (BFE) problems.
Designed a mathematical but interpretable module, Self Offset Attention (SOFA), to improve offset predictions.
Our method exhibits SOTA performance on three datasets.


(a) PolyFootNet	(b) SOFA

2. Architectural Insights & Design Rationale

The figures illustrate the structure of PolyFootNet and SOFA.

2.1 Design Motivation of the Self‑Offset Attention (SOFA) Module

The conception of SOFA is driven by two complementary observations:

Attention as learnable pooling
Dr. Mu Li formalises attention layers as kernel‑based pooling—essentially a data‑adaptive weighted average grounded in Nadaraya–Watson regression.
This interpretation frames attention not as a novel operator, but as a principled pooling mechanism.
Empirical behaviour in BFE tasks
In Building Footprint Extraction (BFE) models, roof‑to‑footprint offsets for taller buildings (i.e. longer vectors) consistently exhibit lower angular error than those for shorter buildings in the same image.

SOFA leverages these insights to let reliable long offsets refine less‑reliable short offsets. (or say: pooling knowledges from longer offsets or concentrate more on them)

2.2 Why Explore Multiple Formulations (“Multi‑Solutions”) of the BFE Task?

In this figure, (a) is a commonly used solution for BFE. (b) and (c) is that we studied new cases, and (d) is what PolyFootNet can do. Two complementary considerations motivate our investigation.

Empirical evidence from pilot studies
Preliminary experiments contrasted masks for entire buildings versus partial‑roof regions. Intersection‑over‑Union (IoU) scores were consistently higher for full‑building masks. Visual inspection suggests that models struggle to distinguish the roof–facade seam, whereas the building–ground boundary is far more salient. This observation encouraged us to search for alternative factorizations—representations that bypass the ambiguous roof edge yet still recover precise footprints.
Heterogeneous labelling conventions across public datasets
A decade of BFE research has produced multiple benchmarks with dissimilar annotations. For example, BANDON supplies only roof and facade polygons, omitting footprints altogether. By developing mathematically consistent “multi‑solution” formulations, we can project heterogeneous labels into a common space, thereby
- unlocking legacy datasets that would otherwise be unusable,
- constructing unified training pipelines, and
- improving robustness and generalisation through greater data diversity.

In short, studying multi‑solutions provides both a pragmatic route to higher accuracy and a unifying framework for disparate remote‑sensing datasets.

3. Installation

Please follow the instructions of OBM and BONAI.

4. To Use

5. Citing

@ARTICLE{li2023obm,
  author={Li, Kai and Deng, Yupeng and Kong, Yunlong and Liu, Diyou and Chen, Jingbo and Meng, Yu and Ma, Junxian and Wang, Chenhao},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={Prompt-Driven Building Footprint Extraction in Aerial Images With Offset-Building Model}, 
  year={2024},
  volume={62},
  number={},
  pages={1-15},
  keywords={Buildings;Prediction algorithms;Production;Data models;Data mining;Remote sensing;Instance segmentation;Feature extraction;Training;Three-dimensional displays;Building footprint extraction (BFE);nonmaximum suppression (NMS);roof segmentation;roof-to-footprint offset extraction;segment anything model (SAM)},
  doi={10.1109/TGRS.2024.3487652}}

@article{li2024polyfootnet,
  author={Li, Kai and Deng, Yupeng and Chen, Jingbo and Meng, Yu and Xi, Zhihao and Ma, Junxian and Wang, Chenhao and Wang, Maolin and Zhao, Xiangyu},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images}, 
  year={2025},
  volume={},
  number={},
  pages={1-1},
  keywords={Building footprint extraction;Building detection;Segment Anything Model (SAM);Off-nadir aerial image;Nadaraya-Watson regression;Oblique monocular images},
  doi={10.1109/TGRS.2025.3590054}}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
icons		icons
LICENSE		LICENSE
README.md		README.md
offset_mask.py		offset_mask.py
sofa_head.py		sofa_head.py

Resource	Link
Attention Pooling (English, D2L)	https://d2l.ai/chapter_attention-mechanisms-and-transformers/attention-pooling.html
注意力池化 (中文, D2L)	https://zh.d2l.ai/chapter_attention-mechanisms/nadaraya-waston.html
Lecture video (YouTube)	https://www.youtube.com/watch?v=EUFhCYuD3gk
Lecture video (Bilibili)	https://www.bilibili.com/video/BV1264y1i7R1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolyFootNet

PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images

0. News

0.1. PolyFootNet Model Weight

0.2. Note for inference

1. Paper Contributions

2. Architectural Insights & Design Rationale

2.1 Design Motivation of the Self‑Offset Attention (SOFA) Module

Further Reading

2.2 Why Explore Multiple Formulations (“Multi‑Solutions”) of the BFE Task?

3. Installation

4. To Use

5. Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PolyFootNet

PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images

0. News

0.1. PolyFootNet Model Weight

0.2. Note for inference

1. Paper Contributions

2. Architectural Insights & Design Rationale

2.1 Design Motivation of the Self‑Offset Attention (SOFA) Module

Further Reading

2.2 Why Explore Multiple Formulations (“Multi‑Solutions”) of the BFE Task?

3. Installation

4. To Use

5. Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages