causalexplain.estimators.pc package#

Submodules#

power_divergence(X, Y, Z, data, boolean=True, lambda_='cressie-read', **kwargs)[source]#

Computes the Cressie-Read power divergence statistic [1]. The null hypothesis for the test is X is independent of Y given Z. A lot of the frequency comparision based statistics (eg. chi-square, G-test etc) belong to power divergence family, and are special cases of this test.

Parameters:

X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list, array-like) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
lambda (float or string) –
The lambda parameter for the power_divergence statistic. Some values of lambda_ results in other well known tests:

”pearson” 1 “Chi-squared test” “log-likelihood” 0 “G-test or log-likelihood” “freeman-tuckey” -1/2 “Freeman-Tuckey Statistic” “mod-log-likelihood” -1 “Modified Log-likelihood” “neyman” -2 “Neyman’s statistic” “cressie-read” 2/3 “The value recommended in the paper[1]”
boolean (bool) –

If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns:

If boolean = False, Returns 3 values –

chi: float
The chi-squre test statistic.

p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int
The degrees of freedom of the test.
If boolean = True, returns –

independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

[1] Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> chi_square(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False

chi_square(X, Y, Z, data, boolean=True, **kwargs)[source]#

Chi-square conditional independence test. Tests the null hypothesis that X is independent from Y given Zs.

This is done by comparing the observed frequencies with the expected frequencies if X,Y were conditionally independent, using a chisquare deviance statistic. The expected frequencies given independence are \(P(X,Y,Zs) = P(X|Zs)*P(Y|Zs)*P(Zs)\). The latter term can be computed as :math:`P(X,Zs)*P(Y,Zs)/P(Zs).

Parameters:

X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list, array-like) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –

If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns:

If boolean = False, Returns 3 values –

chi: float
The chi-squre test statistic.

p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X u27C2 Y | Zs.

dof: int
The degrees of freedom of the test.
If boolean = True, returns –

independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

[1] https://en.wikipedia.org/wiki/Chi-squared_test

Examples

>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> chi_square(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False

pearsonr(X, Y, Z, data, boolean=True, **kwargs)[source]#

Computes Pearson correlation coefficient and p-value for testing non-correlation. Should be used only on continuous data. In case when \(Z != \null\) uses linear regression and computes pearson coefficient on residuals.

Parameters:

X (str) – The first variable for testing the independence condition X u27C2 Y | Z
Y (str) – The second variable for testing the independence condition X u27C2 Y | Z
Z (list/array-like) – A list of conditional variable for testing the condition X u27C2 Y | Z
data (pandas.DataFrame) – The dataset in which to test the indepenedence condition.
boolean (bool) –

If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the pearson correlation coefficient and p_value
of the test.

Returns:

Pearson’s correlation coefficient (float)
p-value (float)

References

[1] https://en.wikipedia.org/wiki/Pearson_correlation_coefficient [2] https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression

class DAG(ebunch=None, latents={})[source]#

Bases: DiGraph

Base class for all Directed Graphical Models.

Each node in the graph can represent either a random variable, Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.

Parameters:

data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list or any Networkx graph object.

Attributes:

adj

Graph adjacency object holding the neighbors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.adj[3][2][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.adj behaves like a dict. Useful idioms include for nbr, datadict in G.adj[n].items():.

The neighbor information is also provided by subscripting the graph. So for nbr, foovalue in G[node].data(‘foo’, default=1): works.

For directed graphs, G.adj holds outgoing (successor) info.

degree

A DegreeView for the Graph as G.degree or G.degree().

The node degree is the number of edges adjacent to the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iterator for (node, degree) as well as lookup for the degree for a single node.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
weightstring or None, optional (default=None): The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

DiDegreeView or int: If multiple nodes are requested (the default), returns a DiDegreeView mapping nodes to their degree. If a single node is requested, returns the degree of the node as an integer.

in_degree, out_degree

>>> G = nx.DiGraph()  # or MultiDiGraph
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.degree(0)  # node 0 with degree 1
1
>>> list(G.degree([0, 1, 2]))
[(0, 1), (1, 2), (2, 2)]

edges

An OutEdgeView of the DiGraph as G.edges or G.edges().

edges(self, nbunch=None, data=False, default=None)

The OutEdgeView provides set-like operations on the edge-tuples as well as edge attribute lookup. When called, it also provides an EdgeDataView object which allows control of access to edge attributes (but does not provide set-like operations). Hence, G.edges[u, v][‘color’] provides the value of the color attribute for edge (u, v) while for (u, v, c) in G.edges.data(‘color’, default=’red’): iterates through all the edges yielding the color attribute with default ‘red’ if no color attribute exists.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges from these nodes.
datastring or bool, optional (default=False): The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).
defaultvalue, optional (default=None): Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

edgesOutEdgeView: A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

in_edges, out_edges

Nodes in nbunch that are not in the graph will be (quietly) ignored. For directed graphs this returns the out-edges.

>>> G = nx.DiGraph()  # or MultiDiGraph, etc
>>> nx.add_path(G, [0, 1, 2])
>>> G.add_edge(2, 3, weight=5)
>>> [e for e in G.edges]
[(0, 1), (1, 2), (2, 3)]
>>> G.edges.data()  # default data is {} (empty dict)
OutEdgeDataView([(0, 1, {}), (1, 2, {}), (2, 3, {'weight': 5})])
>>> G.edges.data("weight", default=1)
OutEdgeDataView([(0, 1, 1), (1, 2, 1), (2, 3, 5)])
>>> G.edges([0, 2])  # only edges originating from these nodes
OutEdgeDataView([(0, 1), (2, 3)])
>>> G.edges(0)  # only edges from node 0
OutEdgeDataView([(0, 1)])

in_degree

An InDegreeView for (node, in_degree) or in_degree for single node.

The node in_degree is the number of edges pointing to the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iteration over (node, in_degree) as well as lookup for the degree for a single node.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
weightstring or None, optional (default=None): The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

If a single node is requested deg : int

In-degree of the node

OR if multiple nodes are requested nd_iter : iterator

The iterator returns two-tuples of (node, in-degree).

degree, out_degree

>>> G = nx.DiGraph()
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.in_degree(0)  # node 0 with degree 0
0
>>> list(G.in_degree([0, 1, 2]))
[(0, 0), (1, 1), (2, 1)]

in_edges

A view of the in edges of the graph as G.in_edges or G.in_edges().

in_edges(self, nbunch=None, data=False, default=None):

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
datastring or bool, optional (default=False): The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).
defaultvalue, optional (default=None): Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

in_edgesInEdgeView or InEdgeDataView: A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

>>> G = nx.DiGraph()
>>> G.add_edge(1, 2, color="blue")
>>> G.in_edges()
InEdgeView([(1, 2)])
>>> G.in_edges(nbunch=2)
InEdgeDataView([(1, 2)])

edges

name

String identifier of the graph.

nodes

A NodeView of the Graph as G.nodes or G.nodes().

Can be used as G.nodes for data lookup and for set-like operations. Can also be used as G.nodes(data=’color’, default=None) to return a NodeDataView which reports specific node data but no set operations. It presents a dict-like interface as well with G.nodes.items() iterating over (node, nodedata) 2-tuples and G.nodes[3][‘foo’] providing the value of the foo attribute for node 3. In addition, a view G.nodes.data(‘foo’) provides a dict-like interface to the foo attribute of each node. G.nodes.data(‘foo’, default=1) provides a default for nodes that do not have attribute foo.

datastring or bool, optional (default=False): The node attribute returned in 2-tuple (n, ddict[data]). If True, return entire node attribute dict as (n, ddict). If False, return just the nodes n.
defaultvalue, optional (default=None): Value used for nodes that don’t have the requested attribute. Only relevant if data is not True or False.

NodeView

Allows set-like operations over the nodes as well as node attribute dict lookup and calling to get a NodeDataView. A NodeDataView iterates over (n, data) and has no set operations. A NodeView iterates over n and includes set operations.

When called, if data is False, an iterator over nodes. Otherwise an iterator of 2-tuples (node, attribute value) where the attribute is specified in data. If data is True then the attribute becomes the entire data dictionary.

If your node data is not needed, it is simpler and equivalent to use the expression for n in G, or list(G).

There are two simple ways of getting a list of all nodes in the graph:

>>> G = nx.path_graph(3)
>>> list(G.nodes)
[0, 1, 2]
>>> list(G)
[0, 1, 2]

To get the node data along with the nodes:

>>> G.add_node(1, time="5pm")
>>> G.nodes[0]["foo"] = "bar"
>>> list(G.nodes(data=True))
[(0, {'foo': 'bar'}), (1, {'time': '5pm'}), (2, {})]
>>> list(G.nodes.data())
[(0, {'foo': 'bar'}), (1, {'time': '5pm'}), (2, {})]

>>> list(G.nodes(data="foo"))
[(0, 'bar'), (1, None), (2, None)]
>>> list(G.nodes.data("foo"))
[(0, 'bar'), (1, None), (2, None)]

>>> list(G.nodes(data="time"))
[(0, None), (1, '5pm'), (2, None)]
>>> list(G.nodes.data("time"))
[(0, None), (1, '5pm'), (2, None)]

>>> list(G.nodes(data="time", default="Not Available"))
[(0, 'Not Available'), (1, '5pm'), (2, 'Not Available')]
>>> list(G.nodes.data("time", default="Not Available"))
[(0, 'Not Available'), (1, '5pm'), (2, 'Not Available')]

If some of your nodes have an attribute and the rest are assumed to have a default attribute value you can create a dictionary from node/attribute pairs using the default keyword argument to guarantee the value is never None:

>>> G = nx.Graph()
>>> G.add_node(0)
>>> G.add_node(1, weight=2)
>>> G.add_node(2, weight=3)
>>> dict(G.nodes(data="weight", default=1))
{0: 1, 1: 2, 2: 3}

out_degree

An OutDegreeView for (node, out_degree)

The node out_degree is the number of edges pointing out of the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iterator over (node, out_degree) as well as lookup for the degree for a single node.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
weightstring or None, optional (default=None): The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

If a single node is requested deg : int

Out-degree of the node

OR if multiple nodes are requested nd_iter : iterator

The iterator returns two-tuples of (node, out-degree).

degree, in_degree

>>> G = nx.DiGraph()
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.out_degree(0)  # node 0 with degree 1
1
>>> list(G.out_degree([0, 1, 2]))
[(0, 1), (1, 1), (2, 1)]

out_edges

An OutEdgeView of the DiGraph as G.edges or G.edges().

edges(self, nbunch=None, data=False, default=None)

The OutEdgeView provides set-like operations on the edge-tuples as well as edge attribute lookup. When called, it also provides an EdgeDataView object which allows control of access to edge attributes (but does not provide set-like operations). Hence, G.edges[u, v][‘color’] provides the value of the color attribute for edge (u, v) while for (u, v, c) in G.edges.data(‘color’, default=’red’): iterates through all the edges yielding the color attribute with default ‘red’ if no color attribute exists.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges from these nodes.
datastring or bool, optional (default=False): The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).
defaultvalue, optional (default=None): Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

edgesOutEdgeView: A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

in_edges, out_edges

Nodes in nbunch that are not in the graph will be (quietly) ignored. For directed graphs this returns the out-edges.

>>> G = nx.DiGraph()  # or MultiDiGraph, etc
>>> nx.add_path(G, [0, 1, 2])
>>> G.add_edge(2, 3, weight=5)
>>> [e for e in G.edges]
[(0, 1), (1, 2), (2, 3)]
>>> G.edges.data()  # default data is {} (empty dict)
OutEdgeDataView([(0, 1, {}), (1, 2, {}), (2, 3, {'weight': 5})])
>>> G.edges.data("weight", default=1)
OutEdgeDataView([(0, 1, 1), (1, 2, 1), (2, 3, 5)])
>>> G.edges([0, 2])  # only edges originating from these nodes
OutEdgeDataView([(0, 1), (2, 3)])
>>> G.edges(0)  # only edges from node 0
OutEdgeDataView([(0, 1)])

pred

Graph adjacency object holding the predecessors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.pred[2][3][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.pred behaves like a dict. Useful idioms include for nbr, datadict in G.pred[n].items():. A data-view not provided by dicts also exists: for nbr, foovalue in G.pred[node].data(‘foo’): A default can be set via a default argument to the data method.

succ

Graph adjacency object holding the successors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.succ[3][2][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.succ behaves like a dict. Useful idioms include for nbr, datadict in G.succ[n].items():. A data-view not provided by dicts also exists: for nbr, foovalue in G.succ[node].data(‘foo’): and a default can be set via a default argument to the data method.

The neighbor information is also provided by subscripting the graph. So for nbr, foovalue in G[node].data(‘foo’, default=1): works.

For directed graphs, G.adj is identical to G.succ.

Methods

`active_trail_nodes`(variables[, observed, ...])	Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.
`add_edge`(u, v[, weight])	Add an edge between u and v.
`add_edges_from`(ebunch[, weights])	Add all the edges in ebunch.
`add_node`(node[, weight, latent])	Adds a single node to the Graph.
`add_nodes_from`(nodes[, weights, latent])	Add multiple nodes to the Graph.
`add_weighted_edges_from`(ebunch_to_add[, weight])	Add weighted edges in ebunch_to_add with specified weight attr
`adjacency`()	Returns an iterator over (node, adjacency dict) tuples for all nodes.
`adjlist_inner_dict_factory`	alias of `dict`
`adjlist_outer_dict_factory`	alias of `dict`
`clear`()	Remove all nodes and edges from the graph.
`clear_edges`()	Remove all edges from the graph without altering nodes.
`copy`([as_view])	Returns a copy of the graph.
`do`(nodes[, inplace])	Applies the do operator to the graph and returns a new DAG with the transformed graph.
`edge_attr_dict_factory`	alias of `dict`
`edge_subgraph`(edges)	Returns the subgraph induced by the specified edges.
`get_ancestral_graph`(nodes)	Returns the ancestral graph of the given nodes.
`get_children`(node)	Returns a list of children of node.
`get_edge_data`(u, v[, default])	Returns the attribute dictionary associated with edge (u, v).
`get_immoralities`()	Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .
`get_independencies`([latex, include_latents])	Computes independencies in the DAG, by checking d-seperation.
`get_leaves`()	Returns a list of leaves of the graph.
`get_markov_blanket`(node)	Returns a markov blanket for a random variable.
`get_parents`(node)	Returns a list of parents of node.
`get_random`([n_nodes, edge_prob, latents])	Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.
`get_roots`()	Returns a list of roots of the graph.
`graph_attr_dict_factory`	alias of `dict`
`has_edge`(u, v)	Returns True if the edge (u, v) is in the graph.
`has_node`(n)	Returns True if the graph contains the node n.
`has_predecessor`(u, v)	Returns True if node u has predecessor v.
`has_successor`(u, v)	Returns True if node u has successor v.
`is_dconnected`(start, end[, observed])	Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.
`is_directed`()	Returns True if graph is directed, False otherwise.
`is_iequivalent`(model)	Checks whether the given model is I-equivalent
`is_multigraph`()	Returns True if graph is a multigraph, False otherwise.
`local_independencies`(variables)	Returns an instance of Independencies containing the local independencies of each of the variables.
`minimal_dseparator`(start, end)	Finds the minimal d-separating set for start and end.
`moralize`()	Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).
`nbunch_iter`([nbunch])	Returns an iterator over nodes contained in nbunch that are also in the graph.
`neighbors`(n)	Returns an iterator over successor nodes of n.
`node_attr_dict_factory`	alias of `dict`
`node_dict_factory`	alias of `dict`
`number_of_edges`([u, v])	Returns the number of edges between two nodes.
`number_of_nodes`()	Returns the number of nodes in the graph.
`order`()	Returns the number of nodes in the graph.
`predecessors`(n)	Returns an iterator over predecessor nodes of n.
`remove_edge`(u, v)	Remove the edge between u and v.
`remove_edges_from`(ebunch)	Remove all edges specified in ebunch.
`remove_node`(n)	Remove node n.
`remove_nodes_from`(nodes)	Remove multiple nodes.
`reverse`([copy])	Returns the reverse of the graph.
`size`([weight])	Returns the number of edges or total of all edge weights.
`subgraph`(nodes)	Returns a SubGraph view of the subgraph induced on nodes.
`successors`(n)	Returns an iterator over successor nodes of n.
`to_daft`([node_pos, latex, pgm_params, ...])	Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots.
`to_directed`([as_view])	Returns a directed representation of the graph.
`to_directed_class`()	Returns the class to use for empty directed copies.
`to_pdag`()	Returns the PDAG (the equivalence class of DAG; also known as CPDAG) of the DAG.
`to_undirected`([reciprocal, as_view])	Returns an undirected representation of the digraph.
`to_undirected_class`()	Returns the class to use for empty undirected copies.
`update`([edges, nodes])	Update the graph using nodes/edges/graphs as input.

in_degree_iter
out_degree_iter

__init__(ebunch=None, latents={})[source]#

Initialize a graph with edges, name, or graph attributes.

Parameters:

incoming_graph_data (input graph (optional, default: None)) – Data to initialize graph. If None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a 2D NumPy array, a SciPy sparse array, or a PyGraphviz graph.
attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.

See also

convert

Examples

>>> G = nx.Graph()  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G = nx.Graph(name="my graph")
>>> e = [(1, 2), (2, 3), (3, 4)]  # list of edges
>>> G = nx.Graph(e)

Arbitrary graph attribute pairs (key=value) may be assigned

>>> G = nx.Graph(e, day="Friday")
>>> G.graph
{'day': 'Friday'}

add_node(node, weight=None, latent=False)[source]#

Adds a single node to the Graph.

Parameters:

node (str, int, or any hashable python object.) – The node to add to the graph.
weight (int, float) – The weight of the node.
latent (boolean (default: False)) – Specifies whether the variable is latent or not.

add_nodes_from(nodes, weights=None, latent=False)[source]#

Add multiple nodes to the Graph.

**The behviour of adding weights is different than in networkx.

Parameters:

nodes (iterable container) – A container of nodes (list, dict, set, or any hashable python object).
weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the variable at index i.
latent (list, tuple (default=False)) – A container of boolean. The value at index i tells whether the node at index i is latent or not.

add_edge(u, v, weight=None)[source]#

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph.

Parameters:

u (nodes) – Nodes can be any hashable Python object.
v (nodes) – Nodes can be any hashable Python object.
weight (int, float (default=None)) – The weight of the edge

add_edges_from(ebunch, weights=None)[source]#

Add all the edges in ebunch.

If nodes referred in the ebunch are not already present, they will be automatically added. Node names can be any hashable python object.

**The behavior of adding weights is different than networkx.

Parameters:

ebunch (container of edges) – Each edge given in the container will be added to the graph. The edges must be given as 2-tuples (u, v).
weights (list, tuple (default=None)) – A container of weights (int, float). The weight value at index i is associated with the edge at index i.

get_parents(node)[source]#

Returns a list of parents of node.

Throws an error if the node is not present in the graph.

Parameters:: node (string, int or any hashable python object.) – The node whose parents would be returned.

moralize()[source]#

Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

get_leaves()[source]#: Returns a list of leaves of the graph.

out_degree_iter(nbunch=None, weight=None)[source]#

in_degree_iter(nbunch=None, weight=None)[source]#

get_roots()[source]#: Returns a list of roots of the graph.

get_children(node)[source]#

Returns a list of children of node. Throws an error if the node is not present in the graph.

Parameters:: node (string, int or any hashable python object.) – The node whose children would be returned.

get_independencies(latex=False, include_latents=False)[source]#

Computes independencies in the DAG, by checking d-seperation.

Parameters:

latex (boolean) – If latex=True then latex string of the independence assertion would be created.
include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.

local_independencies(variables)[source]#

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters:: variables (str or array like) – variables whose local independencies are to be found.

is_iequivalent(model)[source]#

Checks whether the given model is I-equivalent

Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.

Parameters:: model (A DAG object, for which you want to check I-equivalence)
Returns:: boolean
Return type:: True if both are I-equivalent, False otherwise

get_immoralities()[source]#

Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .

Returns:: set
Return type:: A set of all the immoralities in the model

is_dconnected(start, end, observed=None)[source]#

Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.

Parameters:

start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.

minimal_dseparator(start, end)[source]#

Finds the minimal d-separating set for start and end.

Parameters:

start (node) – The first node.
end (node) – The second node.

References

[1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and Judea Pearl. Finding minimal d-separators. Computer Science Department, University of California, 1998.

get_markov_blanket(node)[source]#

Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.

Returns:: list(blanket_nodes)
Return type:: List of nodes contained in Markov Blanket
Parameters:: node (string, int or any hashable python object.) – The node whose markov blanket would be returned.

active_trail_nodes(variables, observed=None, include_latents=False)[source]#

Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.

Parameters:

variables (str or array like) – variables whose active trails are to be found.
observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.
include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

to_pdag()[source]#

Returns the PDAG (the equivalence class of DAG; also known as CPDAG) of the DAG.

Returns:: PDAG
Return type:: An instance of pgmpy.base.PDAG.

Examples

do(nodes, inplace=False)[source]#

Applies the do operator to the graph and returns a new DAG with the transformed graph.

The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.

Parameters:

nodes (list, array-like) – The names of the nodes to apply the do-operator for.
inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.

Returns:

pgmpy.base.DAG

Return type:

A new instance of DAG modified by the do-operator

References

Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.

get_ancestral_graph(nodes)[source]#

Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of atleast one of the variables in node.

Parameters:: node (iterable) – List of nodes whose ancestral graph needs to be computed.
Returns:: pgmpy.base.DAG instance
Return type:: The ancestral graph.

to_daft(node_pos='circular', latex=True, pgm_params={}, edge_params={}, node_params={})[source]#

Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.

Parameters:

node_pos (str or dict (default: circular)) –

If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,
spectral, spiral. Please refer: https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout for details on these layouts.

If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.

If no argument is provided uses circular layout.
latex (boolean) – Whether to use latex for rendering the node names.
pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}
edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }
node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }

Returns:

daft.PGM object

Return type:

A plot of the DAG.

static get_random(n_nodes=5, edge_prob=0.5, latents=False)[source]#

Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.

Parameters:

n_nodes (int) – The number of nodes in the randomly generated DAG.
edge_prob (float) – The probability of edge between any two nodes in the topologically sorted DAG.
latents (bool (default: False)) – If True, includes latent variables in the generated DAG.

Returns:

pgmpy.base.DAG instance

Return type:

The randomly generated DAG.

StructureEstimator base class for PC and PC-stable algorithms.

convert_args_tuple(func)[source]#

Convert the arguments of a function to tuples.

Parameters:

obj – The object on which the function is called.
variable – The variable of interest.
parents – The parents of the variable.
complete_samples_only – Flag indicating whether to use only complete samples.
weighted – Flag indicating whether to use weighted samples.

Returns:

The result of the function with converted arguments.

class StructureEstimator(independencies=None)[source]#

Bases: object

Base class for estimators in pgmpy; ParameterEstimator, StructureEstimator and StructureScore derive from this class.

Attributes:

data
state_names
variables

Methods

state_counts

data = None#

complete_samples_only = True#

state_names = None#

__init__(independencies=None)[source]#

variables = None#

state_counts(variable, parents=(), complete_samples_only=None, weighted=False)[source]#

class Independencies(*assertions)[source]#

Bases: object

Base class for independencies. independencies class represents a set of Conditional Independence assertions (eg: “X is independent of Y given Z” where X, Y and Z are random variables) or Independence assertions (eg: “X is independent of Y” where X and Y are random variables). Initialize the independencies Class with Conditional Independence assertions or Independence assertions.

Parameters:: assertions (Lists or Tuples) – Each assertion is a list or tuple of the form: [event1, event2 and event3] eg: assertion [‘X’, ‘Y’, ‘Z’] would be X is independent of Y given Z.

Examples

Creating an independencies object with one independence assertion: Random Variable X is independent of Y

>>> independencies = independencies(['X', 'Y'])

Creating an independencies object with three conditional independence assertions: First assertion is Random Variable X is independent of Y given Z.

>>> independencies = independencies(['X', 'Y', 'Z'],
...             ['a', ['b', 'c'], 'd'],
...             ['l', ['m', 'n'], 'o'])

Methods

`add_assertions`(*assertions)	Adds assertions to independencies.
`closure`()	Returns a new Independencies()-object that additionally contains those IndependenceAssertions that are implied by the current independencies (using with the semi-graphoid axioms; see (Pearl, 1989, Conditional Independence and its representations)).
`contains`(assertion)	Returns True if assertion is contained in this Independencies-object, otherwise False.
`entails`(entailed_independencies)	Returns True if the entailed_independencies are implied by this Independencies-object, otherwise False.
`get_all_variables`()	Returns a set of all the variables in all the independence assertions.
`get_assertions`()	Returns the independencies object which is a set of IndependenceAssertion objects.
`is_equivalent`(other)	Returns True if the two Independencies-objects are equivalent, otherwise False.
`latex_string`()	Returns a list of string.
`reduce`()	Add function to remove duplicate Independence Assertions

get_factorized_product

__init__(*assertions)[source]#

contains(assertion)[source]#

Returns True if assertion is contained in this Independencies-object, otherwise False.

Parameters:: assertion (IndependenceAssertion()-object)

Examples

>>> from pgmpy.independencies import Independencies, IndependenceAssertion
>>> ind = Independencies(['A', 'B', ['C', 'D']])
>>> IndependenceAssertion('A', 'B', ['C', 'D']) in ind
True
>>> # does not depend on variable order:
>>> IndependenceAssertion('B', 'A', ['D', 'C']) in ind
True
>>> # but does not check entailment:
>>> IndependenceAssertion('X', 'Y', 'Z') in Independencies(['X', 'Y'])
False

__contains__(assertion)#

Returns True if assertion is contained in this Independencies-object, otherwise False.

Parameters:: assertion (IndependenceAssertion()-object)

Examples

>>> from pgmpy.independencies import Independencies, IndependenceAssertion
>>> ind = Independencies(['A', 'B', ['C', 'D']])
>>> IndependenceAssertion('A', 'B', ['C', 'D']) in ind
True
>>> # does not depend on variable order:
>>> IndependenceAssertion('B', 'A', ['D', 'C']) in ind
True
>>> # but does not check entailment:
>>> IndependenceAssertion('X', 'Y', 'Z') in Independencies(['X', 'Y'])
False

get_all_variables()[source]#: Returns a set of all the variables in all the independence assertions.

get_assertions()[source]#

Returns the independencies object which is a set of IndependenceAssertion objects.

Examples

>>> from pgmpy.independencies import Independencies
>>> independencies = Independencies(['X', 'Y', 'Z'])
>>> independencies.get_assertions()

add_assertions(*assertions)[source]#

Adds assertions to independencies.

Parameters:: assertions (Lists or Tuples) – Each assertion is a list or tuple of variable, independent_of and given.

Examples

>>> from pgmpy.independencies import Independencies
>>> independencies = Independencies()
>>> independencies.add_assertions(['X', 'Y', 'Z'])
>>> independencies.add_assertions(['a', ['b', 'c'], 'd'])

closure()[source]#

Returns a new Independencies()-object that additionally contains those IndependenceAssertions that are implied by the current independencies (using with the semi-graphoid axioms; see (Pearl, 1989, Conditional Independence and its representations)).

Might be very slow if more than six variables are involved.

Examples

>>> from pgmpy.independencies import Independencies
>>> ind1 = Independencies(('A', ['B', 'C'], 'D'))
>>> ind1.closure()
(A ⟂ B | D, C)
(A ⟂ B, C | D)
(A ⟂ B | D)
(A ⟂ C | D, B)
(A ⟂ C | D)

>>> ind2 = Independencies(('W', ['X', 'Y', 'Z']))
>>> ind2.closure()
(W ⟂ Y)
(W ⟂ Y | X)
(W ⟂ Z | Y)
(W ⟂ Z, X, Y)
(W ⟂ Z)
(W ⟂ Z, X)
(W ⟂ X, Y)
(W ⟂ Z | X)
(W ⟂ Z, Y | X)
[..]

entails(entailed_independencies)[source]#

Returns True if the entailed_independencies are implied by this Independencies-object, otherwise False. Entailment is checked using the semi-graphoid axioms.

Might be very slow if more than six variables are involved.

Parameters:: entailed_independencies (Independencies()-object)

Examples

>>> from pgmpy.independencies import Independencies
>>> ind1 = Independencies([['A', 'B'], ['C', 'D'], 'E'])
>>> ind2 = Independencies(['A', 'C', 'E'])
>>> ind1.entails(ind2)
True
>>> ind2.entails(ind1)
False

is_equivalent(other)[source]#

Returns True if the two Independencies-objects are equivalent, otherwise False. (i.e. any Bayesian Network that satisfies the one set of conditional independencies also satisfies the other).

Might be very slow if more than six variables are involved.

Parameters:: other (Independencies()-object)

Examples

>>> from pgmpy.independencies import Independencies
>>> ind1 = Independencies(['X', ['Y', 'W'], 'Z'])
>>> ind2 = Independencies(['X', 'Y', 'Z'], ['X', 'W', 'Z'])
>>> ind3 = Independencies(['X', 'Y', 'Z'], ['X', 'W', 'Z'], ['X', 'Y', ['W','Z']])
>>> ind1.is_equivalent(ind2)
False
>>> ind1.is_equivalent(ind3)
True

reduce()[source]#: Add function to remove duplicate Independence Assertions

latex_string()[source]#: Returns a list of string. Each string represents the IndependenceAssertion in latex.

get_factorized_product(random_variables=None, latex=False)[source]#

class IndependenceAssertion(event1=[], event2=[], event3=[])[source]#

Bases: object

Represents Conditional Independence or Independence assertion.

Each assertion has 3 attributes: event1, event2, event3. The attributes for

\[U \perp X, Y | Z\]

is read as: Random Variable U is independent of X and Y given Z would be:

event1 = {U}

event2 = {X, Y}

event3 = {Z}

Parameters:

event1 (String or List of strings) – Random Variable which is independent.
event2 (String or list of strings.) – Random Variables from which event1 is independent
event3 (String or list of strings.) – Random Variables given which event1 is independent of event2.

Examples

>>> from pgmpy.independencies import IndependenceAssertion
>>> assertion = IndependenceAssertion('U', 'X')
>>> assertion = IndependenceAssertion('U', ['X', 'Y'])
>>> assertion = IndependenceAssertion('U', ['X', 'Y'], 'Z')
>>> assertion = IndependenceAssertion(['U', 'V'], ['X', 'Y'], ['Z', 'A'])

Methods

get_assertion()

Returns a tuple of the attributes: variable, independent_of, given.

latex_string

__init__(event1=[], event2=[], event3=[])[source]#

Initialize an IndependenceAssertion object with event1, event2 and event3 attributes.

event2 ^

event1 / event3

^ / ^ | / |

(U || X, Y | Z) read as Random variable U is independent of X and Y given Z.: —

get_assertion()[source]#

Returns a tuple of the attributes: variable, independent_of, given.

Examples

>>> from pgmpy.independencies import IndependenceAssertion
>>> asser = IndependenceAssertion('X', 'Y', 'Z')
>>> asser.get_assertion()

latex_string()[source]#

class PC(name, independencies=None, **kwargs)[source]#

Bases: StructureEstimator

Class for constraint-based estimation of DAGs using the PC algorithm from a given data set. Identifies (conditional) dependencies in data set using chi_square dependency test and uses the PC algorithm to estimate a DAG pattern that satisfies the identified dependencies. The DAG pattern can then be completed to a faithful DAG, if possible.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and: Techniques, 2009, Section 18.2
[2] Neapolitan, Learning Bayesian Networks, Section 10.1.2 for the PC algorithm: (page 550), http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa

Attributes:

dag
data
metrics
pdag
state_names
variables

Methods

`build_skeleton`(**kwargs)	Estimates a graph skeleton (UndirectedGraph) from a set of independencies using (the first part of) the PC algorithm.
`fit`(X, **kwargs)	Estimates a DAG/PDAG from the given dataset using the PC algorithm which is a constraint-based structure learning algorithm[1].
`skeleton_to_pdag`(skeleton, separating_sets)	Orients the edges of a graph skeleton based on information from separating_sets to form a DAG pattern (DAG).

fit_predict
state_counts

dag = None#

pdag = None#

is_fitted_ = False#

metrics = None#

__init__(name, independencies=None, **kwargs)[source]#

Class intialization.

Parameters:

data (pandas DataFrame object) – datafame object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)
independencies (Independencies object) – Independencies object containing a set of conditional independence assertions that will be used to estimate the DAG skeleton. If independencies is None, all conditional independence assertions will be tested.
kwargs (key-value arguments) – Additional arguments passed to the StructureEstimator base class. - variant: str (one of “orig”, “stable”, “parallel”)ss

fit(X, **kwargs)[source]#

Estimates a DAG/PDAG from the given dataset using the PC algorithm which is a constraint-based structure learning algorithm[1]. The independencies in the dataset are identified by doing statistical independece test. This method returns a DAG/PDAG structure which is faithful to the independencies implied by the dataset

Parameters:

variant (str (one of "orig", "stable", "parallel")) –
The variant of PC algorithm to run. “orig”: The original PC algorithm. Might not give the same

results in different runs but does less independence tests compared to stable.

”stable”: Gives the same result in every run but does needs to
do more statistical independence tests.

”parallel”: Parallel version of PC Stable. Can run on multiple
cores with the same result on each run.
ci_test (str or fun) –
The statistical test to use for testing conditional independence in the dataset. If str values should be one of:

”independence_match”: If using this option, an additional parameter
independencies must be specified.

”chi_square”: Uses the Chi-Square independence test. This works
only for discrete datasets.

”pearsonr”: Uses the pertial correlation based on pearson
correlation coefficient to test independence. This works only for continuous datasets.
max_cond_vars (int) – The maximum number of conditional variables allowed to do the statistical test with.
return_type (str (one of "dag", "cpdag", "pdag", "skeleton")) –
The type of structure to return.

If return_type=pdag or return_type=cpdag: a partially directed structure is returned. If return_type=dag, a fully directed structure is returned if it

is possible to orient all the edges.

If `return_type=”skeleton”, returns an undirected graph along
with the separating sets.
significance_level (float (default: 0.01)) –
The statistical tests use this value to compare with the p-value of the test to decide whether the tested variables are independent or not. Different tests can treat this parameter differently:
1. Chi-Square: If p-value > significance_level, it assumes that the
  independence condition satisfied in the data.
2. pearsonr: If p-value > significance_level, it assumes that the
  independence condition satisfied in the data.

Returns:

model – The estimated model structure, can be a partially directed graph (PDAG) or a fully directed graph (DAG), or (Undirected Graph, separating sets) depending on the value of return_type argument.

Return type:

DAG-instance, PDAG-instance, or (networkx.UndirectedGraph, dict)

References

[1] Original PC: P. Spirtes, C. Glymour, and R. Scheines, Causation,: Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press, 2000.
[2] Stable PC: D. Colombo and M. H. Maathuis, “A modification of the PC: algorithm yielding order-independent skeletons,” ArXiv e-prints, Nov. 2012.
[3] Parallel PC: Le, Thuc, et al. “A fast PC algorithm for high dimensional: causal discovery with multi-core PCs.” IEEE/ACM transactions on computational biology and bioinformatics (2016).

Examples

>>> import pandas as pd
>>> import numpy as np

>>> data = pd.DataFrame(np.random.randint(0, 5, size=(2500, 3)), columns=list('XYZ'))
>>> data['sum'] = data.sum(axis=1)

>>> pc = PC("rex_generated_linear_1")
>>> pc = pc.fit(data)

fit_predict(train, test, ref_graph=None, **kwargs)[source]#

build_skeleton(**kwargs)[source]#

Estimates a graph skeleton (UndirectedGraph) from a set of independencies using (the first part of) the PC algorithm. The independencies can either be provided as an instance of the Independencies-class or by passing a decision function that decides any conditional independency assertion. Returns a tuple (skeleton, separating_sets).

If an Independencies-instance is passed, the contained IndependenceAssertions have to admit a faithful BN representation. This is the case if they are obtained as a set of d-seperations of some Bayesian network or if the independence assertions are closed under the semi-graphoid axioms. Otherwise the procedure may fail to identify the correct structure.

Returns:

skeleton (UndirectedGraph) – An estimate for the undirected graph skeleton of the BN underlying the data.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation procedures)

References

[1] Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550): http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa
[2] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009: Section 3.4.2.1 (page 85), Algorithm 3.3

static skeleton_to_pdag(skeleton, separating_sets)[source]#

Orients the edges of a graph skeleton based on information from separating_sets to form a DAG pattern (DAG).

Parameters:

skeleton (UndirectedGraph) – An undirected graph skeleton as e.g. produced by the estimate_skeleton method.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation)

Returns:

pdag – An estimate for the DAG pattern of the BN underlying the data. The graph might contain some nodes with both-way edges (X->Y and Y->X). Any completion by (removing one of the both-way edges for each such pair) results in a I-equivalent Bayesian network DAG.

Return type:

DAG

References

Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550) http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa

main(dataset_name, input_path=None, output_path=None, save=False)[source]#

class PDAG(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]#

Bases: DiGraph

Class for representing PDAGs (also known as CPDAG). PDAGs are the equivance classes of DAGs and contain both directed and undirected edges.

**Note: In this class, undirected edges are represented using two edges in both direction i.e. an undirected edge between X - Y is represented using X -> Y and X <- Y.

Attributes:

adj

Graph adjacency object holding the neighbors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.adj[3][2][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.adj behaves like a dict. Useful idioms include for nbr, datadict in G.adj[n].items():.

The neighbor information is also provided by subscripting the graph. So for nbr, foovalue in G[node].data(‘foo’, default=1): works.

For directed graphs, G.adj holds outgoing (successor) info.

degree

A DegreeView for the Graph as G.degree or G.degree().

The node degree is the number of edges adjacent to the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iterator for (node, degree) as well as lookup for the degree for a single node.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
weightstring or None, optional (default=None): The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

DiDegreeView or int: If multiple nodes are requested (the default), returns a DiDegreeView mapping nodes to their degree. If a single node is requested, returns the degree of the node as an integer.

in_degree, out_degree

>>> G = nx.DiGraph()  # or MultiDiGraph
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.degree(0)  # node 0 with degree 1
1
>>> list(G.degree([0, 1, 2]))
[(0, 1), (1, 2), (2, 2)]

edges

An OutEdgeView of the DiGraph as G.edges or G.edges().

edges(self, nbunch=None, data=False, default=None)

The OutEdgeView provides set-like operations on the edge-tuples as well as edge attribute lookup. When called, it also provides an EdgeDataView object which allows control of access to edge attributes (but does not provide set-like operations). Hence, G.edges[u, v][‘color’] provides the value of the color attribute for edge (u, v) while for (u, v, c) in G.edges.data(‘color’, default=’red’): iterates through all the edges yielding the color attribute with default ‘red’ if no color attribute exists.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges from these nodes.
datastring or bool, optional (default=False): The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).
defaultvalue, optional (default=None): Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

edgesOutEdgeView: A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

in_edges, out_edges

Nodes in nbunch that are not in the graph will be (quietly) ignored. For directed graphs this returns the out-edges.

>>> G = nx.DiGraph()  # or MultiDiGraph, etc
>>> nx.add_path(G, [0, 1, 2])
>>> G.add_edge(2, 3, weight=5)
>>> [e for e in G.edges]
[(0, 1), (1, 2), (2, 3)]
>>> G.edges.data()  # default data is {} (empty dict)
OutEdgeDataView([(0, 1, {}), (1, 2, {}), (2, 3, {'weight': 5})])
>>> G.edges.data("weight", default=1)
OutEdgeDataView([(0, 1, 1), (1, 2, 1), (2, 3, 5)])
>>> G.edges([0, 2])  # only edges originating from these nodes
OutEdgeDataView([(0, 1), (2, 3)])
>>> G.edges(0)  # only edges from node 0
OutEdgeDataView([(0, 1)])

in_degree

An InDegreeView for (node, in_degree) or in_degree for single node.

The node in_degree is the number of edges pointing to the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iteration over (node, in_degree) as well as lookup for the degree for a single node.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
weightstring or None, optional (default=None): The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

If a single node is requested deg : int

In-degree of the node

OR if multiple nodes are requested nd_iter : iterator

The iterator returns two-tuples of (node, in-degree).

degree, out_degree

>>> G = nx.DiGraph()
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.in_degree(0)  # node 0 with degree 0
0
>>> list(G.in_degree([0, 1, 2]))
[(0, 0), (1, 1), (2, 1)]

in_edges

A view of the in edges of the graph as G.in_edges or G.in_edges().

in_edges(self, nbunch=None, data=False, default=None):

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
datastring or bool, optional (default=False): The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).
defaultvalue, optional (default=None): Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

in_edgesInEdgeView or InEdgeDataView: A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

>>> G = nx.DiGraph()
>>> G.add_edge(1, 2, color="blue")
>>> G.in_edges()
InEdgeView([(1, 2)])
>>> G.in_edges(nbunch=2)
InEdgeDataView([(1, 2)])

edges

name

String identifier of the graph.

nodes

A NodeView of the Graph as G.nodes or G.nodes().

Can be used as G.nodes for data lookup and for set-like operations. Can also be used as G.nodes(data=’color’, default=None) to return a NodeDataView which reports specific node data but no set operations. It presents a dict-like interface as well with G.nodes.items() iterating over (node, nodedata) 2-tuples and G.nodes[3][‘foo’] providing the value of the foo attribute for node 3. In addition, a view G.nodes.data(‘foo’) provides a dict-like interface to the foo attribute of each node. G.nodes.data(‘foo’, default=1) provides a default for nodes that do not have attribute foo.

datastring or bool, optional (default=False): The node attribute returned in 2-tuple (n, ddict[data]). If True, return entire node attribute dict as (n, ddict). If False, return just the nodes n.
defaultvalue, optional (default=None): Value used for nodes that don’t have the requested attribute. Only relevant if data is not True or False.

NodeView

Allows set-like operations over the nodes as well as node attribute dict lookup and calling to get a NodeDataView. A NodeDataView iterates over (n, data) and has no set operations. A NodeView iterates over n and includes set operations.

When called, if data is False, an iterator over nodes. Otherwise an iterator of 2-tuples (node, attribute value) where the attribute is specified in data. If data is True then the attribute becomes the entire data dictionary.

If your node data is not needed, it is simpler and equivalent to use the expression for n in G, or list(G).

There are two simple ways of getting a list of all nodes in the graph:

>>> G = nx.path_graph(3)
>>> list(G.nodes)
[0, 1, 2]
>>> list(G)
[0, 1, 2]

To get the node data along with the nodes:

>>> G.add_node(1, time="5pm")
>>> G.nodes[0]["foo"] = "bar"
>>> list(G.nodes(data=True))
[(0, {'foo': 'bar'}), (1, {'time': '5pm'}), (2, {})]
>>> list(G.nodes.data())
[(0, {'foo': 'bar'}), (1, {'time': '5pm'}), (2, {})]

>>> list(G.nodes(data="foo"))
[(0, 'bar'), (1, None), (2, None)]
>>> list(G.nodes.data("foo"))
[(0, 'bar'), (1, None), (2, None)]

>>> list(G.nodes(data="time"))
[(0, None), (1, '5pm'), (2, None)]
>>> list(G.nodes.data("time"))
[(0, None), (1, '5pm'), (2, None)]

>>> list(G.nodes(data="time", default="Not Available"))
[(0, 'Not Available'), (1, '5pm'), (2, 'Not Available')]
>>> list(G.nodes.data("time", default="Not Available"))
[(0, 'Not Available'), (1, '5pm'), (2, 'Not Available')]

If some of your nodes have an attribute and the rest are assumed to have a default attribute value you can create a dictionary from node/attribute pairs using the default keyword argument to guarantee the value is never None:

>>> G = nx.Graph()
>>> G.add_node(0)
>>> G.add_node(1, weight=2)
>>> G.add_node(2, weight=3)
>>> dict(G.nodes(data="weight", default=1))
{0: 1, 1: 2, 2: 3}

out_degree

An OutDegreeView for (node, out_degree)

The node out_degree is the number of edges pointing out of the node. The weighted node degree is the sum of the edge weights for edges incident to that node.

This object provides an iterator over (node, out_degree) as well as lookup for the degree for a single node.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges incident to these nodes.
weightstring or None, optional (default=None): The name of an edge attribute that holds the numerical value used as a weight. If None, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node.

If a single node is requested deg : int

Out-degree of the node

OR if multiple nodes are requested nd_iter : iterator

The iterator returns two-tuples of (node, out-degree).

degree, in_degree

>>> G = nx.DiGraph()
>>> nx.add_path(G, [0, 1, 2, 3])
>>> G.out_degree(0)  # node 0 with degree 1
1
>>> list(G.out_degree([0, 1, 2]))
[(0, 1), (1, 1), (2, 1)]

out_edges

An OutEdgeView of the DiGraph as G.edges or G.edges().

edges(self, nbunch=None, data=False, default=None)

The OutEdgeView provides set-like operations on the edge-tuples as well as edge attribute lookup. When called, it also provides an EdgeDataView object which allows control of access to edge attributes (but does not provide set-like operations). Hence, G.edges[u, v][‘color’] provides the value of the color attribute for edge (u, v) while for (u, v, c) in G.edges.data(‘color’, default=’red’): iterates through all the edges yielding the color attribute with default ‘red’ if no color attribute exists.

nbunchsingle node, container, or all nodes (default= all nodes): The view will only report edges from these nodes.
datastring or bool, optional (default=False): The edge attribute returned in 3-tuple (u, v, ddict[data]). If True, return edge attribute dict in 3-tuple (u, v, ddict). If False, return 2-tuple (u, v).
defaultvalue, optional (default=None): Value used for edges that don’t have the requested attribute. Only relevant if data is not True or False.

edgesOutEdgeView: A view of edge attributes, usually it iterates over (u, v) or (u, v, d) tuples of edges, but can also be used for attribute lookup as edges[u, v][‘foo’].

in_edges, out_edges

Nodes in nbunch that are not in the graph will be (quietly) ignored. For directed graphs this returns the out-edges.

>>> G = nx.DiGraph()  # or MultiDiGraph, etc
>>> nx.add_path(G, [0, 1, 2])
>>> G.add_edge(2, 3, weight=5)
>>> [e for e in G.edges]
[(0, 1), (1, 2), (2, 3)]
>>> G.edges.data()  # default data is {} (empty dict)
OutEdgeDataView([(0, 1, {}), (1, 2, {}), (2, 3, {'weight': 5})])
>>> G.edges.data("weight", default=1)
OutEdgeDataView([(0, 1, 1), (1, 2, 1), (2, 3, 5)])
>>> G.edges([0, 2])  # only edges originating from these nodes
OutEdgeDataView([(0, 1), (2, 3)])
>>> G.edges(0)  # only edges from node 0
OutEdgeDataView([(0, 1)])

pred

Graph adjacency object holding the predecessors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.pred[2][3][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.pred behaves like a dict. Useful idioms include for nbr, datadict in G.pred[n].items():. A data-view not provided by dicts also exists: for nbr, foovalue in G.pred[node].data(‘foo’): A default can be set via a default argument to the data method.

succ

Graph adjacency object holding the successors of each node.

This object is a read-only dict-like structure with node keys and neighbor-dict values. The neighbor-dict is keyed by neighbor to the edge-data-dict. So G.succ[3][2][‘color’] = ‘blue’ sets the color of the edge (3, 2) to “blue”.

Iterating over G.succ behaves like a dict. Useful idioms include for nbr, datadict in G.succ[n].items():. A data-view not provided by dicts also exists: for nbr, foovalue in G.succ[node].data(‘foo’): and a default can be set via a default argument to the data method.

The neighbor information is also provided by subscripting the graph. So for nbr, foovalue in G[node].data(‘foo’, default=1): works.

For directed graphs, G.adj is identical to G.succ.

Methods

`add_edge`(u_of_edge, v_of_edge, **attr)	Add an edge between u and v.
`add_edges_from`(ebunch_to_add, **attr)	Add all the edges in ebunch_to_add.
`add_node`(node_for_adding, **attr)	Add a single node node_for_adding and update node attributes.
`add_nodes_from`(nodes_for_adding, **attr)	Add multiple nodes.
`add_weighted_edges_from`(ebunch_to_add[, weight])	Add weighted edges in ebunch_to_add with specified weight attr
`adjacency`()	Returns an iterator over (node, adjacency dict) tuples for all nodes.
`adjlist_inner_dict_factory`	alias of `dict`
`adjlist_outer_dict_factory`	alias of `dict`
`clear`()	Remove all nodes and edges from the graph.
`clear_edges`()	Remove all edges from the graph without altering nodes.
`copy`()	Returns a copy of the object instance.
`edge_attr_dict_factory`	alias of `dict`
`edge_subgraph`(edges)	Returns the subgraph induced by the specified edges.
`get_edge_data`(u, v[, default])	Returns the attribute dictionary associated with edge (u, v).
`graph_attr_dict_factory`	alias of `dict`
`has_edge`(u, v)	Returns True if the edge (u, v) is in the graph.
`has_node`(n)	Returns True if the graph contains the node n.
`has_predecessor`(u, v)	Returns True if node u has predecessor v.
`has_successor`(u, v)	Returns True if node u has successor v.
`is_directed`()	Returns True if graph is directed, False otherwise.
`is_multigraph`()	Returns True if graph is a multigraph, False otherwise.
`nbunch_iter`([nbunch])	Returns an iterator over nodes contained in nbunch that are also in the graph.
`neighbors`(n)	Returns an iterator over successor nodes of n.
`node_attr_dict_factory`	alias of `dict`
`node_dict_factory`	alias of `dict`
`number_of_edges`([u, v])	Returns the number of edges between two nodes.
`number_of_nodes`()	Returns the number of nodes in the graph.
`order`()	Returns the number of nodes in the graph.
`predecessors`(n)	Returns an iterator over predecessor nodes of n.
`remove_edge`(u, v)	Remove the edge between u and v.
`remove_edges_from`(ebunch)	Remove all edges specified in ebunch.
`remove_node`(n)	Remove node n.
`remove_nodes_from`(nodes)	Remove multiple nodes.
`reverse`([copy])	Returns the reverse of the graph.
`size`([weight])	Returns the number of edges or total of all edge weights.
`subgraph`(nodes)	Returns a SubGraph view of the subgraph induced on nodes.
`successors`(n)	Returns an iterator over successor nodes of n.
`to_dag`([required_edges])	Returns one possible DAG which is represented using the PDAG.
`to_directed`([as_view])	Returns a directed representation of the graph.
`to_directed_class`()	Returns the class to use for empty directed copies.
`to_undirected`([reciprocal, as_view])	Returns an undirected representation of the digraph.
`to_undirected_class`()	Returns the class to use for empty undirected copies.
`update`([edges, nodes])	Update the graph using nodes/edges/graphs as input.

__init__(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]#

Initializes a PDAG class.

Parameters:

directed_ebunch (list, array-like of 2-tuples) – List of directed edges in the PDAG.
undirected_ebunch (list, array-like of 2-tuples) – List of undirected edges in the PDAG.
latents (list, array-like) – List of nodes which are latent variables.

Return type:

An instance of the PDAG object.

Examples

copy()[source]#

Returns a copy of the object instance.

Returns:: PDAG instance
Return type:: Returns a copy of self.

to_dag(required_edges=[])[source]#

Returns one possible DAG which is represented using the PDAG.

Parameters:: required_edges (list, array-like of 2-tuples) – The list of edges that should be included in the DAG.
Return type:: Returns an instance of DAG.

Examples

causalexplain.estimators.pc package#

Submodules#

Module contents#

This Page