Jump to content

Leiden algorithm

From Wikipedia, the free encyclopedia

The Leiden algorithm is a community detection algorithm developed by Traag et al [1] at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity.

Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm, a local node moving step (though, the method by which nodes are considered in Leiden is more efficient[1]) and a graph aggregation step. However, to address the issues with poorly connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected.

Graph components

[edit]

Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph.

Vertices and edges

[edit]

A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let be the set of vertices, and be the set of edges:

where is the directed edge from vertex to vertex . We can also write this as an ordered pair:


Community

[edit]

A community is a unique set of nodes:

and the union of all communities must be the total set of vertices:

Partition

[edit]

A partition is the set of all communities:


Partition Quality

[edit]

How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities.

Modularity

[edit]

Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as:[2]

where:

  • represents the edge weight between nodes and ; see Adjacency matrix;
  • and are the sum of the weights of the edges attached to nodes and , respectively;
  • is the sum of all of the edge weights in the graph;
  • and are the communities to which the nodes and belong; and
  • is Kronecker delta function:

Reichardt Bornholdt Potts Model (RB)

[edit]

One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB).[3] This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition.[4][5] This model introduces a resolution parameter and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as:[4]

where:

  • represents a linear resolution parameter

Constant Potts Model (CPM)

[edit]

Another metric similar to RB, is the Constant Potts Model (CPM). This metric also relies on a resolution parameter [6] The quality function is defined as:

Understanding Potts Model resolution parameters/Resolution limit

[edit]
The image depicts 2 separate community interpretations of the graph shown. The first graph shows 2 separate communities(1 blue, 1 red) that attempt to maximize modularity. The second graph interpretation shows 3 communities(1 blue, 1 red, 1 green) that show a more accurate representation of the substructures within the graph

Typically Potts models such as RB or CPM include a resolution parameter in their calculation[3][6]. Potts models are introduced as a response to the resolution limit problem that is present in modularity maximization based community detection. The resolution limit problem is that, for some graphs, maximizing modularity may cause substructures of a graph to merge and become a single community and thus smaller structures are lost.[7] These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity.

The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph.

Algorithm

[edit]
A graph depicting the steps of the Leiden algorithm.


The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as . Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of . Then an aggregate network (d) is created by turning each community into a node. is used as the basis for the aggregate network while is used to create its initial partition. Because we use the original partition in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm.

In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new , forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change took place, resulting in partition (f). Then the nodes of partition (f) would once again be aggregated using the same method as before, with the original partition still being retained. This portion of the algorithm repeats until each aggregate node is in its own individual network; this means that no further improvements can be made.

The Leiden algorithm consists of three main steps: local moving of nodes, refinement of the partition, and aggregation of the network based on the refined partition. All of the functions in the following steps are called using our main function Leiden, depicted below:

function Leiden(Graph G, Partition P)
  do
    # move nodes into their own individual communities
    P = MoveNodes(G, P)
    # stop when each community contains only a single node
    done = (|P| == |V(G)|)
    if not done then:
      # refine the partition P
      P_refined = RefinePartition(G, P)
      # create an aggregate graph where each partition is a node
      G = AggregateGraph(G, P_refined)
      # but retain the original partition P
      P = {{v | v ⊆ C, v ∈ V(G)} | C ∈ P}
    end if
  while not done
  return flat * P
end function


Step 1 of the Leiden algorithm (local moving of nodes).

Step 1: Local Moving of Nodes

First, we move the nodes from into neighboring communities to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities).

function MoveNodes(Graph G, Partition P)
  Q = Queue(V(G))
  do
    # determine next node to visit
    v = Q.remove()
    # use modularity score to determine best community for node v
    # note that C ∈ P ∪ ∅
    C′ = argmax(∆Hp(v -> C))
    # only perform movements that yield a positive modularity score
    if ∆Hp(v -> C′) > 0 then
      v -> C′
      # identify neighboring nodes that aren't in community C'
      N = {u | (u,v) ∈ E(G), u !∈ C′}
      # add these neighbors to the queue to make sure they get visited
      Q.add(N −Q)
    end if
  # end the loop if there are no nodes left to visit
  while Q != ∅
  return P
end function
Step 2 of the Leiden algorithm (refinement of the partition).

Step 2: Refinement of the Partition

Next, each node in the network is assigned to its own individual community and then moved them from one community to another to maximize modularity. This occurs iteratively until each node has been visited and moved, and each community has been refined. This forms our initial partition for .

function RefinePartition(Graph G, Partition P)
  P_refined = SingletonPartition(G)
  for C ∈ P do
    P_refined = MergeSubset(G, P_refined, C)
  end for
  return P_refined
end function

function SingletonPartition(G)
  # assign each node to its own community
  return {{v} | v ∈ V(G)}
end function

function MergeSubset(Graph G, Partition P, Subset S)
  # consider only nodes that are well-connected within subset S
  R = {v |v ∈S,E(v,S −v) ≥γ∥v∥·(∥S∥−∥v∥)}
  # visit nodes in random order
  for v ∈ R do
    # only consider nodes that haven't been merged yet
    if v in singleton community then
      # only consider well-connected communities
      T = {C | C ∈ P, C ⊆ S, E(C, S−C) ≥γ∥C∥·(∥S∥−∥C∥)}
      # choose random community C'
      Pr(C' = C) ~ {exp( (1/θ)∆Hp(v → C)) if ∆Hp(v → C) ≥ 0, otherwise 0} for C ∈ T
      # move v to community C'
      v → C'
    end if
  end for
  return P
end function
Step 3 of the Leiden algorithm (aggregation of the network).

Step 3: Aggregation of the Network

We then convert each community in into a single node.

function AggregateGraph(Graph G, Partition P)
  # communities become nodes in aggregate graph
  V = P
  # E is a multiset
  E = {(C, D) | (u, v) ∈ E(G), u ∈ C ∈ P, v ∈ D ∈ P}
  return Graph(V, E)
end function

We repeat these steps until each community contains only one node, with each of these nodes representing an aggregate of nodes from the original network that are strongly connected with each other.

References

[edit]

[3]

  1. ^ a b Traag, Vincent A; Waltman, Ludo; van Eck, Nees Jan (26 March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC 6435756. PMID 30914743.
  2. ^ Clauset, Aaron and Newman, M. E. J. and Moore, Cristopher (2004). "Finding community structure in very large networks". Phys. Rev. E. 70 (6): 066111. arXiv:cond-mat/0408187. Bibcode:2004PhRvE..70f6111C. doi:10.1103/PhysRevE.70.066111. PMID 15697438. S2CID 8977721.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  3. ^ a b c Reichardt, Jörg; Bornholdt, Stefan (2004-11-15). "Detecting Fuzzy Community Structures in Complex Networks with a Potts Model". Physical Review Letters. 93 (21). doi:10.1103/PhysRevLett.93.218701. ISSN 0031-9007.
  4. ^ a b "Reference — leidenalg 0.10.3.dev0+gcb0bc63.d20240122 documentation". leidenalg.readthedocs.io. Retrieved 2024-11-23.
  5. ^ https://cran.r-project.org/web/packages/leiden/leiden.pdf
  6. ^ a b Traag, Vincent A; Van Dooren, Paul; Nesterov, Yurii (29 July 2011). "Narrow scope for resolution-limit-free community detection". Physical Review E. 84 (1): 016114. arXiv:1104.3083. Bibcode:2011PhRvE..84a6114T. doi:10.1103/PhysRevE.84.016114. PMID 21867264.
  7. ^ Fortunato, Santo; Barthélemy, Marc (2007-01-02). "Resolution limit in community detection". Proceedings of the National Academy of Sciences. 104 (1): 36–41. doi:10.1073/pnas.0605965104. ISSN 0027-8424. PMC 1765466. PMID 17190818.