2.12. CALL Clause¶
The CALL clause executes procedures from a catalog of available procedures. Procedures optionally take arguments as inputs and either produce resulting arguments or modify the system through side effects. Procedures that have output arguments produce rows of results where the output arguments define the columns of the rows. The YIELD sub-clause defines which output columns of the procedure are included in the result and their order. The syntax for the CALL clause is
CALL procedure_name([input_param][, input_param...])
YIELD output_param [, output_param...]
The YIELD sub-clause is usually required. The CALL clause must also usually be followed by a RETURN, INTO, or WITH statement. Here are a couple of examples of valid CALL usage.
CALL procedure_name(input_param1, input_param2)
YIELD output_param1
RETURN output_param1
CALL procedure_name(input_param1, input_param2)
YIELD output_param1, output_param2
INTO Results
CALL procedure_name(input_param1)
YIELD output_param1, output_param2
WITH output_param1, output_param2
WHERE output_param1 > 7
RETURN output_param1, output_param2
The one exception to requiring YIELD and RETURN or WITH clauses is a stand-alone CALL clause. In this case, the RETURN clause or both the YIELD and RETURN clauses may be omitted, and all the outputs of the procedure are returned in the default order. Examples are
CALL procedure_name(input_param1, input_param2)
CALL procedure_name(input_param1, input_param2)
YIELD output_param1
A CALL can also be followed by a WHERE clause to filter the results.
CALL procedure_name(input_param1)
YIELD output_param1, output_param2
WHERE output_param1 > 7
RETURN output_param1, output_param2
A CALL clause can follow a WITH statement. This is useful because it allows preceding statements to generate inputs for a CALL statement.
MATCH (v:VertexFrame)
WHERE v.prop1 = 50
WITH v.prop2 AS input_param1
CALL procedure_name(input_param1)
YIELD output_param1, output_param2
RETURN output_param1, output_param2
Warning
When a CALL statement follows a WITH clause, the procedure is executed once for each row generated by the WITH. For whole graph algorithms this means the algorithm runs multiple times, which can be very expensive. Take care to ensure the preceding statements produce only a single row if a single algorithm execution is intended.
2.12.1. Whole Graph Algorithms¶
xGT supports the following whole graph algorithms:
One common concept used in the graph algorithms is the graph on which an algorithm operates. For a whole graph algorithm, a graph is a collection of vertex frames and edge frames that are connected. An edge frame is not included in a graph if one of its source or target vertex frames is not included. A vertex frame without any attached edge frames contains only isolated vertices and is not included. The frames in the graph will often be all connected, but it is possible to have multiple separate components.
Graphs can be given to whole graph algorithms in several ways:
List of edge frames
Map with edge frames as keys and OpenCypher filtering fragments as values
A user might want to execute a graph algorithm on a subset of the edges in a set of frames. xGT supports this by allowing a user to apply OpenCypher filtering fragments to the edge frames defining a graph. Only edges that pass the WHERE conditions from the filtering fragment will be considered by the algorithm. Filtering is supported by giving the graph as a map from edge frames to OpenCypher filtering fragments. Using OpenCypher filtering fragments to run an algorithm on a subset of edges from a frame is described in detail in section Executing Algorithms on a Filtered Graph.
All graph algorithms have one required argument and two optional arguments.
The first argument describes the graph on which the algorithm operates.
The second is an optional map giving any additional options to the algorithm; if omitted, all options use their default values.
The third optional argument determines how edge filtering operates for an algorithm.
Here is an example call for page_rank():
CALL page_rank("graph_namespace",
{ tolerance : .0001 })
YIELD vertex, rank
RETURN vertex, rank
INTO Results
The first argument can be a string literal representing a namespace or graph frame, a list of string literals representing a list of edge frames, or a map between edge frame names and OpenCypher filtering fragments.
When a namespace name is given, all the edge frames the user has access to in the namespace along with their source and target vertex frames the user has access to in the namespace are used to define the graph. If one of the source or target vertex frames of an edge frame is not in the given namespace or the user doesn’t have access to the source or target vertex frame, that edge frame is not included. The example above shows calling a whole graph algorithm giving a namespace to define the graph.
When a graph frame name is given, all the edge frames the user has access to in the graph frame along with their source and target vertex frames are used to define the graph. If the user doesn’t have access to either the source or target vertex frame of an edge frame, that edge frame is not included.
When a list of edge frames or a map of edge frames to filtering fragments is given, those frames along with their source and target vertex frames are used to define a graph. If a user gives an edge frame they don’t have access to or if the user doesn’t have access to one of the source or target vertex frames, a security error is thrown.
This example shows calling a whole graph algorithm giving a list of edge frames to define the graph.
Passing an empty string, list, or map causes all edge frames the user has access to in all namespaces in the system along with their source and target vertex frames to be used to define the graph. As before, if the user doesn’t have access to the source or target vertex frame of one of the edge frames, that edge frame is not included.
CALL breadth_first_search(["Graph__Edge1", "Graph__Edge2"],
{ sources : id(Graph__Vertex1, "key1"),
max_depth : 2 })
YIELD vertex, parent, edge, depth
RETURN vertex, parent, edge, depth
INTO Results
The following shows the equivalent example using a graph frame:
graph = server.create_graph('BFSGraph', { 'Graph__Edge1', 'Graph__Edge2' })
CALL breadth_first_search("BFSGraph",
{ sources : id(Graph__Vertex1, "key1"),
max_depth : 2 })
YIELD vertex, parent, edge, depth
RETURN vertex, parent, edge, depth
INTO Results
2.12.1.1. Vertex and Edge Representation¶
Whole graph algorithms use vertices and edges as both options and output columns. As inputs, vertices and edges can be bound variables or row IDs. The above example shows giving the source of a BFS using the id() function to construct a vertex row ID from a frame and vertex key. See Type Construction Functions for a detailed description of using the id() function.
The following example shows generating a bound variable in a preceding MATCH that is used as the source of a BFS. This example uses the same source vertex as the previous one.
MATCH (v:Graph__Vertex1)
WHERE v.key = "key1"
WITH v AS source
CALL breadth_first_search(["Graph__Edge1", "Graph__Edge2"],
{ sources : [source],
max_depth : 2 })
YIELD vertex, parent, edge, depth
RETURN vertex, parent, edge, depth
INTO Results
Note that this example uses a list for the sources. Because it has a single source, it could be written without the list notation.
MATCH (v:Graph__Vertex1)
WHERE v.key = "key1"
WITH v AS source
CALL breadth_first_search(["Graph__Edge1", "Graph__Edge2"],
{ sources : source,
max_depth : 2 })
YIELD vertex, parent, edge, depth
RETURN vertex, parent, edge, depth
INTO Results
Since the BFS is run for each row generated by the MATCH, the user must take care to give WHERE conditions that generate a single row if they only want to run a single BFS.
Another example uses a preceding MATCH with a collect() to generate a list of vertices used as the sources for a BFS.
MATCH (v:Graph__Vertex1)
WHERE v.prop > 100
WITH collect(v) AS sources
CALL breadth_first_search(["Graph__Edge1", "Graph__Edge2"],
{ sources : sources,
max_depth : 2 })
YIELD vertex, parent, edge, depth
RETURN vertex, parent, edge, depth
INTO Results
Vertices and edges are also output columns of whole graph algorithms. These outputs are bound variables just like the variables generated by a MATCH clause and can be used in the same ways. For example, this query returns properties from the vertex and edge outputs of a BFS:
CALL breadth_first_search(["Graph__Edge1"],
{ sources : [id(Graph__Vertex1, "key1")] })
YIELD vertex AS v1, parent AS v2, edge
RETURN v1.key, v1.prop, v2.key, v2.prop, edge.prop1, edge.prop2
INTO Results
The vertex and edge output columns can come from any vertex or edge frame in the search graph. When there are multiple frames, some rules apply for property access. See Property Extraction from Multiple Candidate Frames for details.
2.12.1.2. All Simple Paths¶
The algorithm all_simple_paths() finds all the simple paths between a set of source vertices and a set of target vertices.
A simple path does not visit a vertex more than once.
Note that finding all the simple paths can be computationally expensive!
We suggest always providing a max depth to limit the cost of the search.
The algorithm supports the following entries in the options map:
Name |
Type |
Default |
Description |
|---|---|---|---|
sources |
List of row IDs or bound variables or single row ID or variable |
N/A |
A list of source vertices or a single source vertex. |
targets |
List of row IDs or bound variables or single row ID or variable |
N/A |
A list of target vertices or a single target vertex. |
max_depth |
Integer |
No limit |
The depth at which to stop the search. |
direction |
String |
forward |
The direction to perform the search. Valid values are “forward”, “reverse”, or “undirected”. |
The “sources” and “targets” options are required. The other options are optional. The “sources” option indicates the vertices from which the search starts. It is either a list of bound variables or row IDs or a single variable or row ID. The “targets” option indicates the vertices at which the search terminates. It is either a list of bound variables or row IDs or a single variable or row ID.
The “max_depth” option indicates the depth at which to stop the search. A depth of 1 indicates one level out from the sources. The default behavior is to search until a target vertex is discovered. When a max depth is provided, the search terminates if either the max depth is reached or a target vertex is discovered.
The “direction” option indicates the direction the search traverses incident edges. The default is the “forward” direction which traverses outgoing edges of vertices. The “reverse” direction traverses incoming edges of vertices. The “undirected” direction traverses both outgoing and incoming edges of vertices essentially treating the graph as if it were undirected. For a forward search, the parent is always the source of the edge. For a reverse search, the parent is always the target of the edge. For an undirected search, the parent could be either the source or target of the edge.
All simple paths has the following output columns:
Name |
Type |
Description |
|---|---|---|
path |
path variable |
An alternating list of vertices and edges describing a simple path from a source to a target. |
Each row has a single column that is a simple path from a source vertex to a target vertex. The path is returned as a list of row IDs representing the alternating vertices and edges on the path from a source to a target.
2.12.1.3. Breadth First Search¶
The algorithm breadth_first_search() performs a breadth first search on a graph.
This algorithm starts from one or more source vertices and visits all the vertices reachable from the source vertices by traversing incident edges, expanding out a level of neighbors at a time.
The algorithm supports the following entries in the options map:
Name |
Type |
Default |
Description |
|---|---|---|---|
sources |
List of row IDs or bound variables or single row ID or variable |
N/A |
A list of source vertices or a single source vertex. |
max_depth |
Integer |
No limit |
The depth at which to stop the search. |
direction |
String |
forward |
The direction to perform the search. Valid values are “forward”, “reverse”, or “undirected”. |
all_edges |
Boolean |
false |
False returns only edges in the BFS tree. True returns all edges traversed by the search. |
The “sources” option is required. All other options are optional. The “sources” option indicates the vertices from which the search starts. It is either a list of bound variables or row IDs or a single variable or row ID.
The “max_depth” option indicates the depth at which to stop the search. A depth of 1 indicates one level out from the sources. The default behavior is to continue searching until no new vertices are discovered.
The “direction” option indicates the direction the search traverses incident edges. The default is the “forward” direction which traverses outgoing edges of vertices. The “reverse” direction traverses incoming edges of vertices. The “undirected” direction traverses both outgoing and incoming edges of vertices essentially treating the graph as if it were undirected. For a forward search, the parent is always the source of the edge. For a reverse search, the parent is always the target of the edge. For an undirected search, the parent could be either the source or target of the edge.
The “all_edges” option determines which edges traversed by the search are included in the results. The default value of “false” returns only the edges in the BFS tree. These are the edges where the child vertex is newly discovered by the search. This behavior returns a row for each unique vertex discovered by the search. A value of “true” returns all the edges traversed by the search. This behavior returns a row for each unique edge visited by the search. Child vertices are not unique with this behavior.
For all combinations of options, the search traverses edges only once. Thus, edge uniqueness is guaranteed in the results. For an all_edges search, the same pair of vertex and parent could appear multiple times in a search as multiple edges can exist between any two vertices. However, the edge output column will always be unique.
Breadth first search has the following output columns:
Name |
Type |
Description |
|---|---|---|
vertex |
Bound vertex variable |
A vertex found by the search. |
parent |
Bound vertex variable |
The parent of a vertex found by the search. |
edge |
Bound edge variable |
The edge connecting the parent to the child vertex. |
depth |
Integer |
Depth at which the vertex was found. |
direction |
String |
Direction the search traversed the edge. Either “forward” or “reverse”. |
The breadth first search algorithm outputs the vertices it finds along with the parent vertex, the edge connecting the parent to the child vertex, the depth the vertex was found at, and the direction the edge between the child and parent vertices was traversed.
When the direction output is “forward” that means the search traversed an edge with the parent as the source and the child as the target. When the direction output is “reverse” that means the search traversed an edge with the child as the source and the parent as the target. For a search in the “forward” direction, the direction output is always “forward”. For a search in the “reverse” direction, the direction output is always “reverse”. The direction output is more useful when the search is in the “undirected” direction as the search can traverse edges in both directions.
2.12.1.4. Depth First Search¶
The algorithm depth_first_search() performs a depth first search on a graph.
This algorithm starts from one or more source vertices and visits all the vertices reachable from the source vertices by traversing incident edges.
The search progresses as deep as possible before backtracking.
The algorithm supports the following entries in the options map:
Name |
Type |
Default |
Description |
|---|---|---|---|
sources |
List of row IDs or bound variables or single row ID or variable |
N/A |
A list of source vertices or a single source vertex. |
max_depth |
Integer |
No limit |
The depth at which to stop the search. |
direction |
String |
forward |
The direction to perform the search. Valid values are “forward”, “reverse”, or “undirected”. |
all_edges |
Boolean |
false |
False returns only edges in the DFS tree. True returns all edges traversed by the search. |
The “sources” option is required. All other options are optional. The “sources” option indicates the vertices from which the search starts. It is either a list of bound variables or row IDs or a single variable or row ID.
The “max_depth” option indicates the depth at which to stop the search. A depth of 1 indicates one level out from the sources. The default behavior is to continue searching until no new vertices are discovered.
The “direction” option indicates the direction the search traverses incident edges. The default is the “forward” direction which traverses outgoing edges of vertices. The “reverse” direction traverses incoming edges of vertices. The “undirected” direction traverses both outgoing and incoming edges of vertices essentially treating the graph as if it were undirected. For a forward search, the parent is always the source of the edge. For a reverse search, the parent is always the target of the edge. For an undirected search, the parent could be either the source or target of the edge.
The “all_edges” option determines which edges traversed by the search are included in the results. The default value of “false” returns only the edges in the DFS tree. These are the edges where the child vertex is newly discovered by the search. This behavior returns a row for each unique vertex discovered by the search. A value of “true” returns all the edges traversed by the search. This behavior returns a row for each unique edge visited by the search. Child vertices are not unique with this behavior.
For all combinations of options, the search traverses edges only once. Thus, edge uniqueness is guaranteed in the results. For an all_edges search, the same pair of vertex and parent could appear multiple times in a search as multiple edges can exist between any two vertices. However, the edge output column will always be unique.
Depth first search has the following output columns:
Name |
Type |
Description |
|---|---|---|
vertex |
Bound vertex variable |
A vertex found by the search. |
parent |
Bound vertex variable |
The parent of a vertex found by the search. |
edge |
Bound edge variable |
The edge connecting the parent to the child vertex. |
depth |
Integer |
Depth at which the vertex was found. |
direction |
String |
Direction the search traversed the edge. Either “forward” or “reverse”. |
The depth first search algorithm outputs the vertices it finds along with the parent vertex, the edge connecting the parent to the child vertex, the depth the vertex was found at, and the direction the edge between the child and parent vertices was traversed.
When the direction output is “forward” that means the search traversed an edge with the parent as the source and the child as the target. When the direction output is “reverse” that means the search traversed an edge with the child as the source and the parent as the target. For a search in the “forward” direction, the direction output is always “forward”. For a search in the “reverse” direction, the direction output is always “reverse”. The direction output is more useful when the search is in the “undirected” direction as the search can traverse edges in both directions.
2.12.1.5. Louvain Clustering¶
The algorithm louvain() performs community detection on a graph using the Louvain method, a multi-level modularity optimization algorithm.
It alternates between a local moving phase, where vertices move to the neighbor community with the best modularity gain, and an aggregation phase, where communities collapse into super-vertices forming a coarsened graph.
The process repeats until modularity converges.
The algorithm supports both directed and undirected modularity formulations. By default, the directed formulation is used, which considers the direction of edges when computing modularity. The undirected formulation treats the directed multi-graph as undirected by summing the weights of all directed edges between each pair of vertices into a single undirected edge.
The algorithm supports the following optional entries in the options map:
Name |
Type |
Default |
Description |
|---|---|---|---|
resolution |
Float |
1.0 |
Resolution value for modularity. Higher values produce more, smaller communities. Must be greater than 0. |
max_iterations |
Integer |
10 |
Maximum number of local moving passes per level. Must be at least 1. |
tolerance |
Float |
1e-7 |
The algorithm stops when the modularity improvement between levels is less than this value. Must be between 0 and 1 (exclusive). |
max_levels |
Integer |
No limit |
Maximum number of aggregation levels. Must be at least 1. |
weight_property |
String |
“” |
Name of a float edge property to use as edge weights. If empty, all edges have weight 1.0. |
undirected |
Boolean |
false |
If true, use undirected modularity (symmetrize the graph). If false, use directed modularity. |
All of these options are optional. The entire options map may be omitted if the defaults are acceptable.
The “resolution” option controls the granularity of the detected communities. Higher values favor more, smaller communities, while lower values favor fewer, larger communities.
The “max_iterations” option limits the number of local moving passes within each level of the algorithm. More iterations allow the algorithm to explore more vertex moves, potentially improving modularity at the cost of runtime.
The “tolerance” option controls when the algorithm stops adding new levels. If the modularity improvement from one level to the next is less than the tolerance, the algorithm terminates.
The “max_levels” option limits the number of aggregation levels. Each level coarsens the graph by collapsing communities into super-vertices.
The “weight_property” option specifies an edge property to use as edge weights. The property must be of type float. When not specified or set to an empty string, all edges are treated as having weight 1.0. When a graph has multiple directed edges between the same pair of vertices, each edge contributes its weight independently.
The “undirected” option selects between directed and undirected modularity. The default is false (directed), which uses the directed modularity formulation where in-degree and out-degree are tracked separately. When set to true, the algorithm treats the graph as undirected by summing the weights of all directed edges between each pair of vertices into a single undirected edge weight.
Louvain clustering has the following output columns:
Name |
Type |
Description |
|---|---|---|
vertex |
Bound vertex variable |
Vertex from the graph. |
community |
Integer |
Integer representing the community. |
modularity |
Float |
Modularity of the final partition. |
The Louvain algorithm outputs all the vertices in the graph along with an integer representing the community each vertex belongs to and the modularity of the final partition. Each community has a unique integer, and the integers are not necessarily contiguous. The modularity value is the same for all vertices and represents the quality of the overall partition, with higher values indicating better community structure. Modularity values typically range from -0.5 to 1.0.
Here is an example of running directed Louvain clustering (the default):
CALL louvain(["Graph__Edge1", "Graph__Edge2"])
YIELD vertex, community, modularity
INTO Results
Here is an example of running undirected Louvain clustering with custom options:
CALL louvain(["Graph__Edge1", "Graph__Edge2"],
{ undirected : true,
resolution : 1.5,
weight_property : "weight" })
YIELD vertex, community, modularity
INTO Results
Here is an example that collects the vertices in each community into lists, producing one row per community:
CALL louvain(["Graph__Edge1", "Graph__Edge2"])
YIELD vertex, community, modularity
WITH community, collect(vertex) AS members, modularity
RETURN community, members, modularity
INTO Communities
References:
Blondel, V. D., et al. “Fast unfolding of communities in large networks.” Journal of Statistical Mechanics (2008).
Leicht, E. A. and Newman, M. E. J. “Community structure in directed weighted networks.” Physical Review Letters, 100(11), 118703 (2008).
2.12.1.6. PageRank¶
The algorithm page_rank() assigns a score to each vertex in the graph based on the number of incoming edges to the vertex and the scores of the sources of those edges.
This is the well-known PageRank algorithm developed by Larry Page.
The algorithm supports the following optional entries in the options map:
Name |
Type |
Default |
Description |
|---|---|---|---|
damping_factor |
Float |
0.85 |
The damping factor used by PageRank. |
max_iterations |
Integer |
20 |
The maximum number of iterations before stopping the algorithm. |
tolerance |
Float |
0.0001 |
The algorithm stops after the norm of the difference between the score vectors of successive iterations is less than this value. |
norm |
String |
infinity |
The norm used to decide when to stop the algorithm. Valid values are “L1”, “L2”, and “infinity”. |
sources |
List of row IDs or bound variables or single row ID or variable |
Empty list |
Personalization vertices to run personalized PageRank. |
All of these options are optional. The entire options map may be omitted if the defaults are acceptable.
The “damping_factor” option is the standard PageRank damping factor that balances between visiting an adjacent vertex or teleporting to a random vertex. PageRank is an iterative algorithm. When the iteration stops is determined by a combination of the “max_iterations”, “tolerance”, and “norm” options. The iteration stops when either the maximum number of iterations has been reached or the norm of the difference between the score vectors of successive iterations is less than the tolerance. Each iteration generates a score vector that contains a score for each vertex in the graph. Taking the norm of the difference between successive score vectors gives a value to how much the last iteration improved the scores, and the algorithm stops if this value is less than a tolerance. The standard “L1”, “L2”, and “infinity” norms are supported. The “sources” option is used to perform personalized PageRank which restricts the vertices visited by a teleportation to the set of vertices given.
PageRank has the following output columns:
Name |
Type |
Description |
|---|---|---|
vertex |
Bound vertex variable |
Vertex from the graph. |
rank |
Float |
PageRank of a vertex. |
The PageRank algorithm outputs all the vertices in the graph along with the PageRank score for each vertex.
2.12.1.7. Strongly Connected Components¶
The algorithm strongly_connected_components() finds the strongly connected components of a directed graph.
A strong component is a subset of the vertices of a graph that are all reachable from each other both by traversing only forward edges and by traversing only reverse edges.
The algorithm has no entries in the options map.
Strongly connected components has the following output columns:
Name |
Type |
Description |
|---|---|---|
vertex |
Bound vertex variable |
Vertex from the graph. |
component |
Integer |
Integer representing the component. |
The strongly connected components algorithm outputs all the vertices in the graph along with an integer representing the component each vertex belongs to. Each component has a unique integer, and the integers are not necessarily contiguous.
CALL strongly_connected_components(["Graph__Edge1", "Graph__Edge2"])
YIELD vertex, component
INTO Results
2.12.1.8. Weakly Connected Components¶
The algorithm weakly_connected_components() finds the weakly connected components of a directed graph.
A weak component is a subset of the vertices of a graph that are all reachable from each other by traversing edges, ignoring direction.
The algorithm has no entries in the options map.
Weakly connected components has the following output columns:
Name |
Type |
Description |
|---|---|---|
vertex |
Bound vertex variable |
Vertex from the graph. |
component |
Integer |
Integer representing the component. |
The weakly connected components algorithm outputs all the vertices in the graph along with an integer representing the component each vertex belongs to. Each component has a unique integer, and the integers are not necessarily contiguous.
CALL weakly_connected_components(["Graph__Edge1", "Graph__Edge2"])
YIELD vertex, component
INTO Results
2.12.2. Executing Algorithms on a Filtered Graph¶
All whole graph algorithms can be run on a subset of the edges in a set of frames by using OpenCypher Fragments. OpenCypher fragments select which vertices and edges from a graph are considered by the algorithm. The fragments in this sense filter the graph provided to the algorithm, which is analogous to how OpenCypher fragments are used for input and output operations (see section OpenCypher Fragments).
OpenCypher fragments are provided to a graph algorithm by using a map of edge frame names to OpenCypher strings instead of a list of frames:
CALL breadth_first_search({ Graph__Edge1 : "MATCH ()-[e]->() WHERE e.prop > 10",
Graph__Edge2 : "",
Graph__Edge3 : "MATCH (s)-[]->() WHERE s.size < 25" },
{ sources : [id(Graph__Vertex1, "key1")],
max_depth : 2 })
YIELD vertex, parent, edge, depth
RETURN vertex, parent, edge, depth
INTO Results
In this example, each one of the edge frames has an OpenCypher filtering fragment mapped to it.
The fragment for Edge1 will only select edges whose property prop has a value greater than 10.
The empty fragment string for Edge2 applies no filtering to that frame.
The fragment for Edge3 selects only edges whose source vertex has a property size with a value less than 25.
A filter string consists of an OpenCypher MATCH clause with a single edge pattern and a WHERE clause. The edge pattern has three components: the source vertex, the edge and the target vertex. As the pattern is used only to apply WHERE conditions to each of the three components, the direction of the edge is ignored. The source and target vertex components will be matched to the source and target vertex frames of the edge frame that the filter is mapped to. Each component can have a bound variable, with a user chosen name, which will be used to access its properties in the WHERE clause. A bound variable is only required if it is used in the WHERE clause.
The WHERE clause can have any valid OpenCypher expression that results in a boolean value. The properties associated to each component of the MATCH clause can be used in the WHERE expression together with OpenCypher constants and operators. If the WHERE clause is not present, then no filtering is applied to the frame. All properties of the edge and the source and target vertices are available for evaluation in the WHERE clause.
Other OpenCypher clauses are not allowed in filtering fragments.
2.12.2.1. Performance Considerations¶
The default filtering behavior is direct filtering, which applies the OpenCypher fragment to each edge as it is processed by the underlying whole graph algorithm. This strategy minimizes the memory required for execution of whole graph algorithm with the filter but can have negative performance impact when edges are traversed multiple times.
The other filtering behavior is to pre-create the filtered subgraph, which will create a copy of the original graph containing only the edges that passed the respective filter.
This avoids evaluating the filters for edges multiple times at the cost of using more memory.
The underlying algorithm is then run on the filtered subgraph.
As shown in the example below, filtered subgraph creation is enabled by passing the boolean constant false as the third argument to the whole graph algorithm.
CALL page_rank({ Graph__Edge1 : "MATCH (a)-[b]->(c) WHERE b.prop > 10",
Graph__Edge2 : "MATCH (v)-[e]->(w)" },
{ sources : [id(Graph__Vertex1, "key1")] }, false)
YIELD vertex, rank
RETURN vertex, rank
INTO Results
The third argument is optional and defaults to true, which uses direct filtering instead of creating the subgraph.
The performance of using direct filtering versus creating the filtered subgraph can vary widely depending on the algorithm, the selectivity of the filters (i.e. how many edges do they exclude), as well as the number of times each edge is visited.
Algorithms that do not visit edges multiple times, such as Breadth First Search, will likely run slower when pre-creating the filtered subgraph.