causalexplain.estimators.pc package#

Submodules#

power_divergence(X, Y, Z, data, boolean=True, lambda_='cressie-read', **kwargs)[source]#: Compute the power divergence test for conditional independence.

chi_square(X, Y, Z, data, boolean=True, **kwargs)[source]#: Compute the chi-square test for conditional independence.

pearsonr(X, Y, Z, data, boolean=True, **kwargs)[source]#

Computes Pearson correlation coefficient and p-value for testing non-correlation. Should be used only on continuous data. In case when \(Z != \null\) uses linear regression and computes pearson coefficient on residuals.

Parameters:

X (str) – The first variable for testing the independence condition X u27C2 Y | Z
Y (str) – The second variable for testing the independence condition X u27C2 Y | Z
Z (list/array-like) – A list of conditional variable for testing the condition X u27C2 Y | Z
data (pandas.DataFrame) – The dataset in which to test the indepenedence condition.
boolean (bool) –

If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the pearson correlation coefficient and p_value
of the test.

Returns:

Pearson’s correlation coefficient (float)
p-value (float)

References

[1] https://en.wikipedia.org/wiki/Pearson_correlation_coefficient [2] https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression

class DAG(ebunch=None, latents={})[source]#

Bases: DiGraph

Base class for all Directed Graphical Models.

Each node in the graph can represent either a random variable, Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.

Parameters:: data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list or any Networkx graph object.

__init__(ebunch=None, latents={})[source]#

Initialize a graph with edges, name, or graph attributes.

Parameters:

incoming_graph_data (input graph (optional, default: None)) – Data to initialize graph. If None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a 2D NumPy array, a SciPy sparse array, or a PyGraphviz graph.
attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.

See also

convert

Examples

>>> G = nx.Graph()  # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> G = nx.Graph(name="my graph")
>>> e = [(1, 2), (2, 3), (3, 4)]  # list of edges
>>> G = nx.Graph(e)

Arbitrary graph attribute pairs (key=value) may be assigned

>>> G = nx.Graph(e, day="Friday")
>>> G.graph
{'day': 'Friday'}

add_node(node, weight=None, latent=False)[source]#

Adds a single node to the Graph.

Parameters:

node (str, int, or any hashable python object.) – The node to add to the graph.
weight (int, float) – The weight of the node.
latent (boolean (default: False)) – Specifies whether the variable is latent or not.

add_nodes_from(nodes, weights=None, latent=False)[source]#: Add multiple nodes to the graph.

add_edge(u, v, weight=None)[source]#

Add an edge between u and v.

The nodes u and v will be automatically added if they are not already in the graph.

Parameters:

u (nodes) – Nodes can be any hashable Python object.
v (nodes) – Nodes can be any hashable Python object.
weight (int, float (default=None)) – The weight of the edge

add_edges_from(ebunch, weights=None)[source]#: Add multiple edges to the graph.

get_parents(node)[source]#

Returns a list of parents of node.

Throws an error if the node is not present in the graph.

Parameters:: node (string, int or any hashable python object.) – The node whose parents would be returned.

moralize()[source]#

Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).

A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.

get_leaves()[source]#: Returns a list of leaves of the graph.

out_degree_iter(nbunch=None, weight=None)[source]#

in_degree_iter(nbunch=None, weight=None)[source]#

get_roots()[source]#: Returns a list of roots of the graph.

get_children(node)[source]#

Returns a list of children of node. Throws an error if the node is not present in the graph.

Parameters:: node (string, int or any hashable python object.) – The node whose children would be returned.

get_independencies(latex=False, include_latents=False)[source]#

Computes independencies in the DAG, by checking d-seperation.

Parameters:

latex (boolean) – If latex=True then latex string of the independence assertion would be created.
include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.

local_independencies(variables)[source]#

Returns an instance of Independencies containing the local independencies of each of the variables.

Parameters:: variables (str or array like) – variables whose local independencies are to be found.

is_iequivalent(model)[source]#

Checks whether the given model is I-equivalent

Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.

Parameters:: model (A DAG object, for which you want to check I-equivalence)
Returns:: boolean
Return type:: True if both are I-equivalent, False otherwise

get_immoralities()[source]#

Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .

Returns:: set
Return type:: A set of all the immoralities in the model

is_dconnected(start, end, observed=None)[source]#

Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.

Parameters:

start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.

minimal_dseparator(start, end)[source]#

Finds the minimal d-separating set for start and end.

Parameters:

start (node) – The first node.
end (node) – The second node.

References

[1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and Judea Pearl. Finding minimal d-separators. Computer Science Department, University of California, 1998.

get_markov_blanket(node)[source]#

Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.

Returns:: list(blanket_nodes)
Return type:: List of nodes contained in Markov Blanket
Parameters:: node (string, int or any hashable python object.) – The node whose markov blanket would be returned.

active_trail_nodes(variables, observed=None, include_latents=False)[source]#

Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.

Parameters:

variables (str or array like) – variables whose active trails are to be found.
observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.
include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.

References

Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1

to_pdag()[source]#

Returns the PDAG (the equivalence class of DAG; also known as CPDAG) of the DAG.

Returns:: PDAG
Return type:: An instance of pgmpy.base.PDAG.

Examples

do(nodes, inplace=False)[source]#

Applies the do operator to the graph and returns a new DAG with the transformed graph.

The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.

Parameters:

nodes (list, array-like) – The names of the nodes to apply the do-operator for.
inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.

Returns:

pgmpy.base.DAG

Return type:

A new instance of DAG modified by the do-operator

References

Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.

get_ancestral_graph(nodes)[source]#

Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of atleast one of the variables in node.

Parameters:: node (iterable) – List of nodes whose ancestral graph needs to be computed.
Returns:: pgmpy.base.DAG instance
Return type:: The ancestral graph.

to_daft(node_pos='circular', latex=True, pgm_params={}, edge_params={}, node_params={})[source]#

Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.

Parameters:

node_pos (str or dict (default: circular)) –

If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,
spectral, spiral. Please refer: https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout for details on these layouts.

If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.

If no argument is provided uses circular layout.
latex (boolean) – Whether to use latex for rendering the node names.
pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}
edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }
node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }

Returns:

daft.PGM object

Return type:

A plot of the DAG.

static get_random(n_nodes=5, edge_prob=0.5, latents=False)[source]#

Returns a randomly generated DAG with n_nodes number of nodes with edge probability being edge_prob.

Parameters:

n_nodes (int) – The number of nodes in the randomly generated DAG.
edge_prob (float) – The probability of edge between any two nodes in the topologically sorted DAG.
latents (bool (default: False)) – If True, includes latent variables in the generated DAG.

Returns:

pgmpy.base.DAG instance

Return type:

The randomly generated DAG.

StructureEstimator base class for PC and PC-stable algorithms.

convert_args_tuple(func)[source]#

Convert the arguments of a function to tuples.

Parameters:

obj – The object on which the function is called.
variable – The variable of interest.
parents – The parents of the variable.
complete_samples_only – Flag indicating whether to use only complete samples.
weighted – Flag indicating whether to use weighted samples.

Returns:

The result of the function with converted arguments.

class StructureEstimator(independencies=None)[source]#

Bases: object

Base class for estimators in pgmpy; ParameterEstimator, StructureEstimator and StructureScore derive from this class.

data = None#

complete_samples_only = True#

state_names = None#

__init__(independencies=None)[source]#

variables = None#

state_counts(variable, parents=(), complete_samples_only=None, weighted=False)[source]#

class Independencies(*assertions)[source]#

Bases: object

Base class for independencies. independencies class represents a set of Conditional Independence assertions (eg: “X is independent of Y given Z” where X, Y and Z are random variables) or Independence assertions (eg: “X is independent of Y” where X and Y are random variables). Initialize the independencies Class with Conditional Independence assertions or Independence assertions.

Parameters:: assertions (Lists or Tuples) – Each assertion is a list or tuple of the form: [event1, event2 and event3] eg: assertion [‘X’, ‘Y’, ‘Z’] would be X is independent of Y given Z.

Examples

Creating an independencies object with one independence assertion: Random Variable X is independent of Y

>>> independencies = independencies(['X', 'Y'])

Creating an independencies object with three conditional independence assertions: First assertion is Random Variable X is independent of Y given Z.

>>> independencies = independencies(['X', 'Y', 'Z'],
...             ['a', ['b', 'c'], 'd'],
...             ['l', ['m', 'n'], 'o'])

__init__(*assertions)[source]#

contains(assertion)[source]#

Returns True if assertion is contained in this Independencies-object, otherwise False.

Parameters:: assertion (IndependenceAssertion()-object)

Examples

>>> from pgmpy.independencies import Independencies, IndependenceAssertion
>>> ind = Independencies(['A', 'B', ['C', 'D']])
>>> IndependenceAssertion('A', 'B', ['C', 'D']) in ind
True
>>> # does not depend on variable order:
>>> IndependenceAssertion('B', 'A', ['D', 'C']) in ind
True
>>> # but does not check entailment:
>>> IndependenceAssertion('X', 'Y', 'Z') in Independencies(['X', 'Y'])
False

__contains__(assertion)#

Returns True if assertion is contained in this Independencies-object, otherwise False.

Parameters:: assertion (IndependenceAssertion()-object)

Examples

>>> from pgmpy.independencies import Independencies, IndependenceAssertion
>>> ind = Independencies(['A', 'B', ['C', 'D']])
>>> IndependenceAssertion('A', 'B', ['C', 'D']) in ind
True
>>> # does not depend on variable order:
>>> IndependenceAssertion('B', 'A', ['D', 'C']) in ind
True
>>> # but does not check entailment:
>>> IndependenceAssertion('X', 'Y', 'Z') in Independencies(['X', 'Y'])
False

get_all_variables()[source]#: Returns a set of all the variables in all the independence assertions.

get_assertions()[source]#

Returns the independencies object which is a set of IndependenceAssertion objects.

Examples

>>> from pgmpy.independencies import Independencies
>>> independencies = Independencies(['X', 'Y', 'Z'])
>>> independencies.get_assertions()

add_assertions(*assertions)[source]#

Adds assertions to independencies.

Parameters:: assertions (Lists or Tuples) – Each assertion is a list or tuple of variable, independent_of and given.

Examples

>>> from pgmpy.independencies import Independencies
>>> independencies = Independencies()
>>> independencies.add_assertions(['X', 'Y', 'Z'])
>>> independencies.add_assertions(['a', ['b', 'c'], 'd'])

closure()[source]#

Returns a new Independencies()-object that additionally contains those IndependenceAssertions that are implied by the current independencies (using with the semi-graphoid axioms; see (Pearl, 1989, Conditional Independence and its representations)).

Might be very slow if more than six variables are involved.

Examples

>>> from pgmpy.independencies import Independencies
>>> ind1 = Independencies(('A', ['B', 'C'], 'D'))
>>> ind1.closure()
(A ⟂ B | D, C)
(A ⟂ B, C | D)
(A ⟂ B | D)
(A ⟂ C | D, B)
(A ⟂ C | D)

>>> ind2 = Independencies(('W', ['X', 'Y', 'Z']))
>>> ind2.closure()
(W ⟂ Y)
(W ⟂ Y | X)
(W ⟂ Z | Y)
(W ⟂ Z, X, Y)
(W ⟂ Z)
(W ⟂ Z, X)
(W ⟂ X, Y)
(W ⟂ Z | X)
(W ⟂ Z, Y | X)
[..]

entails(entailed_independencies)[source]#

Returns True if the entailed_independencies are implied by this Independencies-object, otherwise False. Entailment is checked using the semi-graphoid axioms.

Might be very slow if more than six variables are involved.

Parameters:: entailed_independencies (Independencies()-object)

Examples

>>> from pgmpy.independencies import Independencies
>>> ind1 = Independencies([['A', 'B'], ['C', 'D'], 'E'])
>>> ind2 = Independencies(['A', 'C', 'E'])
>>> ind1.entails(ind2)
True
>>> ind2.entails(ind1)
False

is_equivalent(other)[source]#

Returns True if the two Independencies-objects are equivalent, otherwise False. (i.e. any Bayesian Network that satisfies the one set of conditional independencies also satisfies the other).

Might be very slow if more than six variables are involved.

Parameters:: other (Independencies()-object)

Examples

>>> from pgmpy.independencies import Independencies
>>> ind1 = Independencies(['X', ['Y', 'W'], 'Z'])
>>> ind2 = Independencies(['X', 'Y', 'Z'], ['X', 'W', 'Z'])
>>> ind3 = Independencies(['X', 'Y', 'Z'], ['X', 'W', 'Z'], ['X', 'Y', ['W','Z']])
>>> ind1.is_equivalent(ind2)
False
>>> ind1.is_equivalent(ind3)
True

reduce()[source]#: Add function to remove duplicate Independence Assertions

latex_string()[source]#: Returns a list of string. Each string represents the IndependenceAssertion in latex.

get_factorized_product(random_variables=None, latex=False)[source]#

class IndependenceAssertion(event1=[], event2=[], event3=[])[source]#

Bases: object

Represents Conditional Independence or Independence assertion.

Each assertion has 3 attributes: event1, event2, event3. The attributes for

\[U \perp X, Y | Z\]

is read as: Random Variable U is independent of X and Y given Z would be:

event1 = {U}

event2 = {X, Y}

event3 = {Z}

Parameters:

event1 (String or List of strings) – Random Variable which is independent.
event2 (String or list of strings.) – Random Variables from which event1 is independent
event3 (String or list of strings.) – Random Variables given which event1 is independent of event2.

Examples

>>> from pgmpy.independencies import IndependenceAssertion
>>> assertion = IndependenceAssertion('U', 'X')
>>> assertion = IndependenceAssertion('U', ['X', 'Y'])
>>> assertion = IndependenceAssertion('U', ['X', 'Y'], 'Z')
>>> assertion = IndependenceAssertion(['U', 'V'], ['X', 'Y'], ['Z', 'A'])

__init__(event1=[], event2=[], event3=[])[source]#: Initialize an independence assertion with up to three events.

get_assertion()[source]#

Returns a tuple of the attributes: variable, independent_of, given.

Examples

>>> from pgmpy.independencies import IndependenceAssertion
>>> asser = IndependenceAssertion('X', 'Y', 'Z')
>>> asser.get_assertion()

latex_string()[source]#

class PC(name, independencies=None, **kwargs)[source]#

Bases: StructureEstimator

Class for constraint-based estimation of DAGs using the PC algorithm from a given data set. Identifies (conditional) dependencies in data set using chi_square dependency test and uses the PC algorithm to estimate a DAG pattern that satisfies the identified dependencies. The DAG pattern can then be completed to a faithful DAG, if possible.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and: Techniques, 2009, Section 18.2
[2] Neapolitan, Learning Bayesian Networks, Section 10.1.2 for the PC algorithm: (page 550), http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa

dag = None#

pdag = None#

is_fitted_ = False#

metrics = None#

__init__(name, independencies=None, **kwargs)[source]#

Class intialization.

Parameters:

data (pandas DataFrame object) – datafame object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)
independencies (Independencies object) – Independencies object containing a set of conditional independence assertions that will be used to estimate the DAG skeleton. If independencies is None, all conditional independence assertions will be tested.
kwargs (key-value arguments) – Additional arguments passed to the StructureEstimator base class. - variant: str (one of “orig”, “stable”, “parallel”)ss

fit(X, **kwargs)[source]#: Fit the PC algorithm to the input data.

fit_predict(train, test, ref_graph=None, **kwargs)[source]#

build_skeleton(**kwargs)[source]#

Estimates a graph skeleton (UndirectedGraph) from a set of independencies using (the first part of) the PC algorithm. The independencies can either be provided as an instance of the Independencies-class or by passing a decision function that decides any conditional independency assertion. Returns a tuple (skeleton, separating_sets).

If an Independencies-instance is passed, the contained IndependenceAssertions have to admit a faithful BN representation. This is the case if they are obtained as a set of d-seperations of some Bayesian network or if the independence assertions are closed under the semi-graphoid axioms. Otherwise the procedure may fail to identify the correct structure.

Returns:

skeleton (UndirectedGraph) – An estimate for the undirected graph skeleton of the BN underlying the data.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation procedures)

References

[1] Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550): http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa
[2] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009: Section 3.4.2.1 (page 85), Algorithm 3.3

static skeleton_to_pdag(skeleton, separating_sets)[source]#

Orients the edges of a graph skeleton based on information from separating_sets to form a DAG pattern (DAG).

Parameters:

skeleton (UndirectedGraph) – An undirected graph skeleton as e.g. produced by the estimate_skeleton method.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation)

Returns:

pdag – An estimate for the DAG pattern of the BN underlying the data. The graph might contain some nodes with both-way edges (X->Y and Y->X). Any completion by (removing one of the both-way edges for each such pair) results in a I-equivalent Bayesian network DAG.

Return type:

DAG

References

Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550) http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa

main(dataset_name, input_path=None, output_path=None, save=False)[source]#

class PDAG(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]#

Bases: DiGraph

Partially directed acyclic graph (PDAG) representation.

__init__(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]#: Initialize a PDAG from directed and undirected edge lists.

copy()[source]#

Returns a copy of the object instance.

Returns:: PDAG instance
Return type:: Returns a copy of self.

to_dag(required_edges=[])[source]#

Returns one possible DAG which is represented using the PDAG.

Parameters:: required_edges (list, array-like of 2-tuples) – The list of edges that should be included in the DAG.
Return type:: Returns an instance of DAG.

Examples

causalexplain.estimators.pc package#

Submodules#

Module contents#

This Page