Skip to content

A Python Implemented Project of Visualization of Active Learning Sample Querying Algorithms in 3D Point Clouds

License

Notifications You must be signed in to change notification settings

TioSisai/ALQuery3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ALQuery3D

็ฎ€ไฝ“ไธญๆ–‡ English

A pure Python implementation of high-dimensional embeddings generator designed for active learning research. Features a web interface for parameter control and 3D visualization, supports FPS (Farthest Point Sampling) algorithm and multiple dimensionality reduction methods. Backend implemented using CPU-only libraries like numpy, scipy, and scikit-learn.

๐ŸŽฏ Key Features

  • ๐Ÿง  Neural Network Encoder Simulation: Simulates various characteristics of real neural network encoders
  • ๐ŸŽ›๏ธ Precise Parameter Control: 11 parameters for precise control of geometric and statistical properties
  • ๐ŸŒ Modern Web Interface: Dark theme, responsive design, professional research tool experience
  • ๐Ÿ“Š Multiple Dimensionality Reduction: Supports PCA, t-SNE, UMAP algorithms
  • ๐ŸŽฏ FPS Sampling Algorithm: Complete Farthest Point Sampling implementation with 5 distance metrics
  • ๐Ÿ’พ Intelligent Caching System: HDF5 caching improves performance, avoids redundant computation
  • ๐Ÿ”ง Flexible Dimension Support: 3-2048 dimensional embedding generation

๐Ÿ“ Project Structure

ALQuery3D/
โ”œโ”€โ”€ src/                          # Source code directory
โ”‚   โ”œโ”€โ”€ data/                     # Data processing modules
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ embedding_generator.py  # High-dimensional embeddings generator
โ”‚   โ”œโ”€โ”€ algorithms/               # Algorithm implementations
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ fps.py               # FPS Farthest Point Sampling algorithm
โ”‚   โ”œโ”€โ”€ web/                      # Web interface
โ”‚   โ”‚   โ”œโ”€โ”€ app.py               # Flask backend
โ”‚   โ”‚   โ””โ”€โ”€ templates/
โ”‚   โ”‚       โ””โ”€โ”€ index.html       # Web frontend interface
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ data/                         # Data cache directory
โ”‚   โ””โ”€โ”€ tmp_data.h5              # HDF5 cache file (generated at runtime)
โ”œโ”€โ”€ examples/                     # Example code
โ”‚   โ””โ”€โ”€ generate_embeddings_demo.py  # Embeddings generation demo
โ”œโ”€โ”€ tests/                        # Test directory
โ”‚   โ””โ”€โ”€ test_embedding_generator.py  # Unit tests
โ”œโ”€โ”€ run_web.py                    # Web application startup script
โ”œโ”€โ”€ requirements.txt              # Project dependencies
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ LICENSE

๐Ÿš€ Quick Start

Install Dependencies

pip install -r requirements.txt

Or manually install core dependencies:

pip install numpy scikit-learn matplotlib scipy flask plotly h5py umap-learn

Launch Web Interface

python run_web.py

Then visit http://localhost:5000 in your browser

Programming Interface Usage

from src.data.embedding_generator import EmbeddingGenerator

# Create generator
generator = EmbeddingGenerator(embedding_dim=128, random_state=42)

# Generate embeddings (all parameters normalized to 0-1 range)
embeddings, labels = generator.generate_clustered_embeddings(
    n_samples_per_class=[100, 150, 120],  # Number of samples per class
    dispersion=0.6,                       # Dispersion (0.0-1.0)
    curvature=0.2,                        # Curvature (0.0-1.0)
    flatness=0.7,                         # Flatness (0.0-1.0)
    inter_class_distance=0.8,             # Inter-class distance (0.0-1.0)
    intra_class_correlation=0.4           # Intra-class correlation (0.0-1.0)
)

print(f"Generated embeddings shape: {embeddings.shape}")
print(f"Number of classes: {len(np.unique(labels))}")

๐Ÿง  Core Features

1. High-dimensional Embeddings Generator

Simulates high-dimensional embeddings generated by neural network encoders with the following characteristics:

Basic Geometric Properties

  • Dispersion: Controls the spread of intra-class samples
  • Curvature: Controls nonlinear deformation, forming cone-like distributions
  • Flatness: Controls compression in certain dimensions, approaching hyperplanes
  • Inter-class Distance: Controls distance between different class centers
  • Intra-class Correlation: Controls correlation between intra-class features
  • Inter-hyperplane Parallelism: Controls parallelism between class hyperplanes

Neural Network Encoder Characteristics

  • Manifold Complexity: Simulates nonlinear activation function effects in neural networks
  • Feature Sparsity: Simulates feature sparsity caused by ReLU and other activation functions
  • Noise Level: Simulates information loss during encoding process
  • Boundary Sharpness: Controls clarity of class boundaries
  • Dimensional Anisotropy: Importance differences across different dimensions

2. Multiple Dimensionality Reduction Methods

  • PCA: Principal Component Analysis, preserves maximum variance
  • t-SNE: t-distributed Stochastic Neighbor Embedding, preserves local structure
  • UMAP: Uniform Manifold Approximation and Projection, balances global and local structure

3. FPS Farthest Point Sampling Algorithm

Complete FPS (Farthest Point Sampling) implementation:

  • Multiple Distance Metrics: Euclidean, cosine, Chebyshev, Manhattan, Minkowski distances
  • Interactive Point Selection: Click any point in 3D visualization to set starting position
  • Path Visualization: Cyan gradient display of complete FPS traversal path
  • Range View Function: View any continuous subsequence of FPS path
  • Statistical Analysis: Path distances, class distribution, sampling quality assessment

4. Web Interface Features

  • ๐ŸŽ›๏ธ Parameter Control: Intuitive sliders and input boxes control all parameters
  • ๐Ÿ“Š Real-time Visualization: 3D interactive charts with rotation and zoom support
  • ๐Ÿ”„ Dimensionality Reduction Switching: One-click switching between PCA, t-SNE, UMAP
  • ๐Ÿ’พ Intelligent Caching: Automatic caching of dimensionality reduction results for improved response speed
  • ๐Ÿ“ˆ Statistical Information: Real-time display of data statistics and dimension information
  • ๐ŸŽฏ FPS Sampling: Complete FPS sampling and visualization functionality

๐Ÿ“Š Parameter Description

Basic Parameters

Parameter Type Range Description
n_samples_per_class List[int] 10-5000 Number of samples per class
embedding_dim int 3-2048 Embedding dimension

Geometric Control Parameters (Support per-class independent setting)

Parameter Range Internal Mapping Description
dispersion 0.0-1.0 0.001-20.0 Dispersion, controls intra-class sample spread
curvature 0.0-1.0 0.0-5.0 Curvature, controls nonlinear deformation
flatness 0.0-1.0 0.001-1.0 Flatness, controls dimensional compression
intra_class_correlation 0.0-1.0 0.0-0.99 Intra-class correlation, controls feature correlation

Neural Network Characteristic Parameters

Parameter Range Internal Mapping Description
manifold_complexity 0.0-1.0 0.0-2.0 Manifold complexity, simulates nonlinear activation functions
feature_sparsity 0.0-1.0 0.0-0.9 Feature sparsity, simulates ReLU activation
noise_level 0.0-1.0 0.0-0.5 Noise level, simulates information loss
boundary_sharpness 0.0-1.0 0.0-5.0 Boundary sharpness, controls decision boundary clarity
dimensional_anisotropy 0.0-1.0 0.0-0.8 Dimensional anisotropy, simulates feature importance differences

Global Parameters

Parameter Range Internal Mapping Description
inter_class_distance 0.0-1.0 0.1-50.0 Inter-class distance, controls distance between class centers
inter_hyperplane_parallelism 0.0-1.0 0.0-0.99 Inter-hyperplane parallelism

๐Ÿ’ก Usage Examples

Per-class Independent Control

# Set different parameters for each class
embeddings, labels = generator.generate_clustered_embeddings(
    n_samples_per_class=[80, 100, 120],
    dispersion=[0.3, 0.6, 0.9],           # Different dispersion per class
    curvature=[0.1, 0.3, 0.5],            # Different curvature per class
    flatness=[0.4, 0.7, 1.0],             # Different flatness per class
    inter_class_distance=0.7,             # Global inter-class distance
    intra_class_correlation=[0.2, 0.5, 0.8]  # Different correlation per class
)

Neural Network Characteristic Simulation

# Simulate real neural network encoder
embeddings, labels = generator.generate_clustered_embeddings(
    n_samples_per_class=[200, 200, 200],
    dispersion=0.5,
    curvature=0.3,
    flatness=0.6,
    manifold_complexity=0.3,              # Moderate nonlinearity
    feature_sparsity=0.2,                 # Slight sparsity
    noise_level=0.05,                     # Small amount of noise
    boundary_sharpness=0.7,               # Clear boundaries
    dimensional_anisotropy=0.4            # Moderate anisotropy
)

Dimensionality Reduction Visualization

# PCA dimensionality reduction to 3D
reduced_pca = generator.reduce_dimensions(n_components=3, method='pca')

# t-SNE dimensionality reduction to 3D  
reduced_tsne = generator.reduce_dimensions(n_components=3, method='tsne')

# UMAP dimensionality reduction to 3D
reduced_umap = generator.reduce_dimensions(n_components=3, method='umap')

# Get dimensionality reduction information
info = generator.dimensionality_reduction_info
print(f"Dimensionality reduction method: {info['method']}")

FPS Sampling Usage

from src.algorithms.fps import create_fps_sampler

# Create FPS sampler
fps_sampler = create_fps_sampler()

# Execute FPS sampling
selected_indices = fps_sampler.sample(
    embeddings,           # Original high-dimensional data
    start_idx=0,         # Starting point index
    num_samples=50,      # Number of samples
    distance_metric='euclidean'  # Distance metric
)

# Get statistical information
stats = fps_sampler.get_path_statistics(
    embeddings, selected_indices, labels, 'euclidean'
)
print(f"Sampled {stats['total_points']} points")
print(f"Total path length: {stats['total_distance']:.3f}")

๐ŸŽฎ Web Interface Usage Workflow

1. Configure Parameters

  • Select number of classes (1-10)
  • Set embedding dimension (3-2048)
  • Set independent parameters for each class
  • Choose dimensionality reduction method (PCA/t-SNE/UMAP)

2. Generate Data

  • Click "Generate Embeddings" button
  • Wait for backend processing (loading animation displayed)
  • View 3D visualization results on the right

3. FPS Sampling (Optional)

  • Click any point in 3D plot to set starting position
  • Configure sampling parameters (quantity, distance metric)
  • Click "Start FPS Sampling" to execute sampling
  • View FPS path visualization and statistical information

4. Range View (Optional)

  • Set view range (start and end indices)
  • Click "View Range" to view specified range
  • Observe statistical information within the range

๐Ÿ”ง Technical Features

Performance Optimization

  • HDF5 Caching: Intelligent caching of dimensionality reduction results, avoiding redundant computation
  • Data Standardization: Automatic standardization to -1~1 range, preserving relative relationships
  • Memory Management: Efficient data structure design, supports large-scale data

Visualization Effects

  • Cyan Gradient: FPS paths displayed with cyan gradient
  • Interactive 3D: High-quality interactive charts with Plotly
  • Responsive Design: Adapts to different screen sizes

Extensibility

  • Modular Design: Easy to add new distance metrics and dimensionality reduction methods
  • API Friendly: Provides complete programming interface
  • Test Coverage: Complete unit test suite

๐Ÿ“ˆ Application Scenarios

1. Active Learning Research

  • Generate datasets with specific characteristics
  • Test effectiveness of different sampling strategies
  • Visualize sampling results and data distribution

2. Dimensionality Reduction Algorithm Comparison

  • Compare PCA, t-SNE, UMAP effects on same data
  • Study impact of different parameters on dimensionality reduction results

3. Neural Network Characteristic Analysis

  • Simulate different types of neural network encoder outputs
  • Study geometric properties of high-dimensional features

4. Data Visualization Teaching

  • Intuitively demonstrate high-dimensional data characteristics
  • Understand impact of different parameters on data distribution

๐Ÿงช Running Demos

Web Interface Demo

# Launch web application
python run_web.py

Programming Interface Demo

# Run complete demo
python examples/generate_embeddings_demo.py

# Run unit tests
python tests/test_embedding_generator.py

๐Ÿ“ Notes

  1. First Use: t-SNE and UMAP first-time computation is slow, please be patient
  2. Large Samples: t-SNE with 5000 samples may take several minutes to compute
  3. Cache Cleanup: Regenerating data automatically cleans old cache
  4. Memory Usage: For large datasets, recommend closing other programs to free memory
  5. Parameter Effects: Extreme parameter values may produce unexpected data distributions

๐Ÿ” Troubleshooting

Common Issues

  1. Port Occupied: Modify port number in run_web.py
  2. Missing Dependencies: Run pip install -r requirements.txt
  3. Network Access: Ensure firewall allows port 5000
  4. Browser Compatibility: Recommend using latest Chrome/Firefox

Performance Optimization

  • For large datasets, recommend reducing sample count
  • t-SNE and UMAP computation is slow, please be patient
  • Close other browser tabs to free memory

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

Welcome to submit Issues and Pull Requests to improve this project!


ALQuery3D - Providing powerful high-dimensional data generation and visualization tools for active learning research! ๐Ÿš€

Star History

Star History Chart

About

A Python Implemented Project of Visualization of Active Learning Sample Querying Algorithms in 3D Point Clouds

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published