Understanding Pareto Front Selection

When using HpMocd for community detection, you often get multiple equally valid solutions along a Pareto front. Each solution represents a different trade-off between competing objectives (like intra-community cohesion vs. inter-community separation). This guide shows you how to systematically select the best solution for your specific needs.

Quick Start: Generate a Pareto Front

First, let’s generate a Pareto front using the enhanced algorithm:

  import networkx as nx
import numpy as np
from pymocd import HpMocd
from sklearn.metrics import adjusted_rand_score
import matplotlib.pyplot as plt

G = nx.karate_club_graph()

def generate_robust_pareto_front(G, max_attempts=3):
    best_front = None
    best_size = 0

    for attempt in range(max_attempts):
        # Recommendation: Try different parameter combinations
        params = {
            'num_gens': [150, 200, 250][attempt],
            'pop_size': [120, 150, 100][attempt],
            'mut_rate': [0.2, 0.4, 0.8][attempt],
            'cross_rate': [0.9, 1.0, 0.8][attempt]
        }

        alg = HpMocd(G, **params)
        front = alg.generate_pareto_front()

        if len(front) > best_size:
            best_front = front
            best_size = len(front)

    return best_front

# Generate the front
pareto_front = generate_robust_pareto_front(G)
print(f"Generated {len(pareto_front)} Pareto-optimal solutions")
  

(i) Coherence-Based Selection

Use case: When you have an approximation of the ground truth or domain knowledge about the expected community structure.

Implementation

  def select_by_coherence(pareto_front, G, coherence_metric='adjusted_rand'):
    true_labels = []
    for node in sorted(G.nodes()):
        true_labels.append(0 if G.nodes[node]['club'] == 'Mr. Hi' else 1)

    best_solution = None
    best_coherence = -1
    coherence_scores = []

    for assignment, objectives in pareto_front:
        pred_labels = [assignment[node] for node in sorted(G.nodes())]

        if coherence_metric == 'adjusted_rand':
            coherence = adjusted_rand_score(true_labels, pred_labels)
        coherence_scores.append(coherence)

        if coherence > best_coherence:
            best_coherence = coherence
            best_solution = (assignment, objectives)

    return best_solution, best_coherence, coherence_scores

best_solution, coherence, all_coherences = select_by_coherence(pareto_front, G)
print(f"Best coherence: {coherence:.3f}")
print(f"Selected solution has {len(set(best_solution[0].values()))} communities")
  

When to Use?

Problems that you may encounter.

Here, we have a lot of solutions with high coerence, but none 1. Why?

  • Because the best ground truth solution, it is not a non-dominated solution;
  • Your “best” solution probably had less intra or inter values, so, another one dominate it.


(ii) Modularity-Based Selection

Use case: When you want to maximize the classical modularity (or another) metric, balancing internal connections vs. external connections.

Implementation

  def select_by_modularity(pareto_front, G):
    best_solution = None
    best_modularity = -1
    modularity_scores = []

    for assignment, objectives in pareto_front:
        # Convert assignment to community structure
        communities = {}
        for node, comm_id in assignment.items():
            if comm_id not in communities:
                communities[comm_id] = []
            communities[comm_id].append(node)

        modularity = nx.community.modularity(G, communities.values())
        modularity_scores.append(modularity)
        if modularity > best_modularity:
            best_modularity = modularity
            best_solution = (assignment, objectives)

    return best_solution, best_modularity, modularity_scores

mod_solution, modularity, all_modularities = select_by_modularity(pareto_front, G)
print(f"Best modularity: {modularity:.3f}")
print(f"Selected solution has {len(set(mod_solution[0].values()))} communities")
  

When to Use


(iii) Community Count Preference

Use case: When you have prior knowledge about the expected number of communities.

Implementation

  def select_by_community_count(pareto_front, target_communities=None, strategy='closest'):
    community_counts = []
    for assignment, objectives in pareto_front:
        count = len(set(assignment.values()))
        community_counts.append(count)

    if strategy == 'max':
        # Solution with most communities (finest granularity)
        target_idx = np.argmax(community_counts)
    elif strategy == 'min':
        # Solution with fewest communities (coarsest granularity)
        target_idx = np.argmin(community_counts)
    elif strategy == 'closest' and target_communities:
        # Solution closest to target number
        distances = [abs(count - target_communities) for count in community_counts]
        target_idx = np.argmin(distances)
    elif strategy == 'range':
        # Show distribution of community counts
        from collections import Counter
        count_dist = Counter(community_counts)
        print("Community count distribution:", dict(sorted(count_dist.items())))
        return None, None
    else:
        raise ValueError("Invalid strategy or missing target_communities")

    selected_solution = pareto_front[target_idx]
    return selected_solution, community_counts[target_idx]

select_by_community_count(pareto_front, strategy='range')

targets = [2, 3, 4]
for target in targets:
    solution, actual_count = select_by_community_count(pareto_front, target, 'closest')
    if solution:
        print(f"Target: {target}, Actual: {actual_count}, Objectives: {solution[1]}")

for strategy in ['min', 'max']:
    solution, count = select_by_community_count(pareto_front, strategy=strategy)
    print(f"{strategy.capitalize()} communities: {count}, Objectives: {solution[1]}")
  

When to Use?


Recommendations


flowchart TD
    A["Do you have ground truth approximation?"]
    A -->|YES| B["Use Coherence-Based Selection"]
    A -->|NO| C["Do you know expected community count?"]

    C -->|YES| D["Use Community Count Preference"]
    C -->|NO| E["Do you prefer one objective over another?"]

    E -->|YES| F["Use Objective-Based Selection"]
    E -->|NO| G["Want mathematically principled choice?"]

    G -->|YES| H["Use Knee Point Detection"]
    G -->|NO| I["Use TOPSIS Multi-Criteria"]

    G --> J["Default: Use Modularity-Based Selection"]


ScenarioRecommended Approach
Small networks (<100 nodes)Try all strategies, visual inspection
Large networks (>1000 nodes)Focus on modularity or objectives
Multiple runsUse ensemble: select most frequent choice
Uncertain objectivesStart with knee point, then refine

Start with modularity or knee point detection, then refine based on your specific needs. Remember: the “best” solution is the one that works best for your application!


References

  • HpMocd Documentation: Basic Usage Guide
  • NetworkX Community Detection: Community Detection Algorithms
  • TOPSIS Method: Hwang, C.L. and Yoon, K. (1981). Multiple Attribute Decision Making
  • Pareto Optimality: Miettinen, K. (2012). Nonlinear Multiobjective Optimization

Last updated 22 Aug 2025, 20:08 -0300 . history