causalexplain.estimators.pc package#
Submodules#
- power_divergence(X, Y, Z, data, boolean=True, lambda_='cressie-read', **kwargs)[source]#
Compute the power divergence test for conditional independence.
- chi_square(X, Y, Z, data, boolean=True, **kwargs)[source]#
Compute the chi-square test for conditional independence.
- pearsonr(X, Y, Z, data, boolean=True, **kwargs)[source]#
Computes Pearson correlation coefficient and p-value for testing non-correlation. Should be used only on continuous data. In case when \(Z != \null\) uses linear regression and computes pearson coefficient on residuals.
- Parameters:
X (str) – The first variable for testing the independence condition X u27C2 Y | Z
Y (str) – The second variable for testing the independence condition X u27C2 Y | Z
Z (list/array-like) – A list of conditional variable for testing the condition X u27C2 Y | Z
data (pandas.DataFrame) – The dataset in which to test the indepenedence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
- If boolean=False, returns the pearson correlation coefficient and p_value
of the test.
- Returns:
Pearson’s correlation coefficient (float)
p-value (float)
References
[1] https://en.wikipedia.org/wiki/Pearson_correlation_coefficient [2] https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression
- class DAG(ebunch=None, latents={})[source]#
Bases:
DiGraphBase class for all Directed Graphical Models.
Each node in the graph can represent either a random variable, Factor, or a cluster of random variables. Edges in the graph represent the dependencies between these.
- Parameters:
data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list or any Networkx graph object.
- __init__(ebunch=None, latents={})[source]#
Initialize a graph with edges, name, or graph attributes.
- Parameters:
incoming_graph_data (input graph (optional, default: None)) – Data to initialize graph. If None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a 2D NumPy array, a SciPy sparse array, or a PyGraphviz graph.
attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.
See also
convertExamples
>>> G = nx.Graph() # or DiGraph, MultiGraph, MultiDiGraph, etc >>> G = nx.Graph(name="my graph") >>> e = [(1, 2), (2, 3), (3, 4)] # list of edges >>> G = nx.Graph(e)
Arbitrary graph attribute pairs (key=value) may be assigned
>>> G = nx.Graph(e, day="Friday") >>> G.graph {'day': 'Friday'}
- add_edge(u, v, weight=None)[source]#
Add an edge between u and v.
The nodes u and v will be automatically added if they are not already in the graph.
- get_parents(node)[source]#
Returns a list of parents of node.
Throws an error if the node is not present in the graph.
- Parameters:
node (string, int or any hashable python object.) – The node whose parents would be returned.
- moralize()[source]#
Removes all the immoralities in the DAG and creates a moral graph (UndirectedGraph).
A v-structure X->Z<-Y is an immorality if there is no directed edge between X and Y.
- get_children(node)[source]#
Returns a list of children of node. Throws an error if the node is not present in the graph.
- Parameters:
node (string, int or any hashable python object.) – The node whose children would be returned.
- get_independencies(latex=False, include_latents=False)[source]#
Computes independencies in the DAG, by checking d-seperation.
- Parameters:
latex (boolean) – If latex=True then latex string of the independence assertion would be created.
include_latents (boolean) – If True, includes latent variables in the independencies. Otherwise, only generates independencies on observed variables.
- local_independencies(variables)[source]#
Returns an instance of Independencies containing the local independencies of each of the variables.
- Parameters:
variables (str or array like) – variables whose local independencies are to be found.
- is_iequivalent(model)[source]#
Checks whether the given model is I-equivalent
Two graphs G1 and G2 are said to be I-equivalent if they have same skeleton and have same set of immoralities.
- Parameters:
model (A DAG object, for which you want to check I-equivalence)
- Returns:
boolean
- Return type:
True if both are I-equivalent, False otherwise
- get_immoralities()[source]#
Finds all the immoralities in the model A v-structure X -> Z <- Y is an immorality if there is no direct edge between X and Y .
- Returns:
set
- Return type:
A set of all the immoralities in the model
- is_dconnected(start, end, observed=None)[source]#
Returns True if there is an active trail (i.e. d-connection) between start and end node given that observed is observed.
- Parameters:
start (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
end (int, str, any hashable python object.) – The nodes in the DAG between which to check the d-connection/active trail.
observed (list, array-like (optional)) – If given the active trail would be computed assuming these nodes to be observed.
- minimal_dseparator(start, end)[source]#
Finds the minimal d-separating set for start and end.
- Parameters:
start (node) – The first node.
end (node) – The second node.
References
[1] Algorithm 4, Page 10: Tian, Jin, Azaria Paz, and Judea Pearl. Finding minimal d-separators. Computer Science Department, University of California, 1998.
- get_markov_blanket(node)[source]#
Returns a markov blanket for a random variable. In the case of Bayesian Networks, the markov blanket is the set of node’s parents, its children and its children’s other parents.
- Returns:
list(blanket_nodes)
- Return type:
List of nodes contained in Markov Blanket
- Parameters:
node (string, int or any hashable python object.) – The node whose markov blanket would be returned.
- active_trail_nodes(variables, observed=None, include_latents=False)[source]#
Returns a dictionary with the given variables as keys and all the nodes reachable from that respective variable as values.
- Parameters:
variables (str or array like) – variables whose active trails are to be found.
observed (List of nodes (optional)) – If given the active trails would be computed assuming these nodes to be observed.
include_latents (boolean (default: False)) – Whether to include the latent variables in the returned active trail nodes.
References
Details of the algorithm can be found in ‘Probabilistic Graphical Model Principles and Techniques’ - Koller and Friedman Page 75 Algorithm 3.1
- to_pdag()[source]#
Returns the PDAG (the equivalence class of DAG; also known as CPDAG) of the DAG.
- Returns:
PDAG
- Return type:
An instance of pgmpy.base.PDAG.
Examples
- do(nodes, inplace=False)[source]#
Applies the do operator to the graph and returns a new DAG with the transformed graph.
The do-operator, do(X = x) has the effect of removing all edges from the parents of X and setting X to the given value x.
- Parameters:
nodes (list, array-like) – The names of the nodes to apply the do-operator for.
inplace (boolean (default: False)) – If inplace=True, makes the changes to the current object, otherwise returns a new instance.
- Returns:
pgmpy.base.DAG
- Return type:
A new instance of DAG modified by the do-operator
References
Causality: Models, Reasoning, and Inference, Judea Pearl (2000). p.70.
- get_ancestral_graph(nodes)[source]#
Returns the ancestral graph of the given nodes. The ancestral graph only contains the nodes which are ancestors of atleast one of the variables in node.
- Parameters:
node (iterable) – List of nodes whose ancestral graph needs to be computed.
- Returns:
pgmpy.base.DAG instance
- Return type:
The ancestral graph.
- to_daft(node_pos='circular', latex=True, pgm_params={}, edge_params={}, node_params={})[source]#
Returns a daft (https://docs.daft-pgm.org/en/latest/) object which can be rendered for publication quality plots. The returned object’s render method can be called to see the plots.
- Parameters:
node_pos (str or dict (default: circular)) –
- If str: Must be one of the following: circular, kamada_kawai, planar, random, shell, sprint,
spectral, spiral. Please refer: https://networkx.org/documentation/stable//reference/drawing.html#module-networkx.drawing.layout for details on these layouts.
If dict should be of the form {node: (x coordinate, y coordinate)} describing the x and y coordinate of each node.
If no argument is provided uses circular layout.
latex (boolean) – Whether to use latex for rendering the node names.
pgm_params (dict (optional)) – Any additional parameters that need to be passed to daft.PGM initializer. Should be of the form: {param_name: param_value}
edge_params (dict (optional)) – Any additional edge parameters that need to be passed to daft.add_edge method. Should be of the form: {(u1, v1): {param_name: param_value}, (u2, v2): {…} }
node_params (dict (optional)) – Any additional node parameters that need to be passed to daft.add_node method. Should be of the form: {node1: {param_name: param_value}, node2: {…} }
- Returns:
daft.PGM object
- Return type:
A plot of the DAG.
StructureEstimator base class for PC and PC-stable algorithms.
- convert_args_tuple(func)[source]#
Convert the arguments of a function to tuples.
- Parameters:
obj – The object on which the function is called.
variable – The variable of interest.
parents – The parents of the variable.
complete_samples_only – Flag indicating whether to use only complete samples.
weighted – Flag indicating whether to use weighted samples.
- Returns:
The result of the function with converted arguments.
- class StructureEstimator(independencies=None)[source]#
Bases:
objectBase class for estimators in pgmpy; ParameterEstimator, StructureEstimator and StructureScore derive from this class.
- data = None#
- complete_samples_only = True#
- state_names = None#
- variables = None#
- class Independencies(*assertions)[source]#
Bases:
objectBase class for independencies. independencies class represents a set of Conditional Independence assertions (eg: “X is independent of Y given Z” where X, Y and Z are random variables) or Independence assertions (eg: “X is independent of Y” where X and Y are random variables). Initialize the independencies Class with Conditional Independence assertions or Independence assertions.
- Parameters:
assertions (Lists or Tuples) – Each assertion is a list or tuple of the form: [event1, event2 and event3] eg: assertion [‘X’, ‘Y’, ‘Z’] would be X is independent of Y given Z.
Examples
Creating an independencies object with one independence assertion: Random Variable X is independent of Y
>>> independencies = independencies(['X', 'Y'])
Creating an independencies object with three conditional independence assertions: First assertion is Random Variable X is independent of Y given Z.
>>> independencies = independencies(['X', 'Y', 'Z'], ... ['a', ['b', 'c'], 'd'], ... ['l', ['m', 'n'], 'o'])
- contains(assertion)[source]#
Returns True if assertion is contained in this Independencies-object, otherwise False.
- Parameters:
assertion (IndependenceAssertion()-object)
Examples
>>> from pgmpy.independencies import Independencies, IndependenceAssertion >>> ind = Independencies(['A', 'B', ['C', 'D']]) >>> IndependenceAssertion('A', 'B', ['C', 'D']) in ind True >>> # does not depend on variable order: >>> IndependenceAssertion('B', 'A', ['D', 'C']) in ind True >>> # but does not check entailment: >>> IndependenceAssertion('X', 'Y', 'Z') in Independencies(['X', 'Y']) False
- __contains__(assertion)#
Returns True if assertion is contained in this Independencies-object, otherwise False.
- Parameters:
assertion (IndependenceAssertion()-object)
Examples
>>> from pgmpy.independencies import Independencies, IndependenceAssertion >>> ind = Independencies(['A', 'B', ['C', 'D']]) >>> IndependenceAssertion('A', 'B', ['C', 'D']) in ind True >>> # does not depend on variable order: >>> IndependenceAssertion('B', 'A', ['D', 'C']) in ind True >>> # but does not check entailment: >>> IndependenceAssertion('X', 'Y', 'Z') in Independencies(['X', 'Y']) False
- get_assertions()[source]#
Returns the independencies object which is a set of IndependenceAssertion objects.
Examples
>>> from pgmpy.independencies import Independencies >>> independencies = Independencies(['X', 'Y', 'Z']) >>> independencies.get_assertions()
- add_assertions(*assertions)[source]#
Adds assertions to independencies.
- Parameters:
assertions (Lists or Tuples) – Each assertion is a list or tuple of variable, independent_of and given.
Examples
>>> from pgmpy.independencies import Independencies >>> independencies = Independencies() >>> independencies.add_assertions(['X', 'Y', 'Z']) >>> independencies.add_assertions(['a', ['b', 'c'], 'd'])
- closure()[source]#
Returns a new Independencies()-object that additionally contains those IndependenceAssertions that are implied by the current independencies (using with the semi-graphoid axioms; see (Pearl, 1989, Conditional Independence and its representations)).
Might be very slow if more than six variables are involved.
Examples
>>> from pgmpy.independencies import Independencies >>> ind1 = Independencies(('A', ['B', 'C'], 'D')) >>> ind1.closure() (A ⟂ B | D, C) (A ⟂ B, C | D) (A ⟂ B | D) (A ⟂ C | D, B) (A ⟂ C | D)
>>> ind2 = Independencies(('W', ['X', 'Y', 'Z'])) >>> ind2.closure() (W ⟂ Y) (W ⟂ Y | X) (W ⟂ Z | Y) (W ⟂ Z, X, Y) (W ⟂ Z) (W ⟂ Z, X) (W ⟂ X, Y) (W ⟂ Z | X) (W ⟂ Z, Y | X) [..]
- entails(entailed_independencies)[source]#
Returns True if the entailed_independencies are implied by this Independencies-object, otherwise False. Entailment is checked using the semi-graphoid axioms.
Might be very slow if more than six variables are involved.
- Parameters:
entailed_independencies (Independencies()-object)
Examples
>>> from pgmpy.independencies import Independencies >>> ind1 = Independencies([['A', 'B'], ['C', 'D'], 'E']) >>> ind2 = Independencies(['A', 'C', 'E']) >>> ind1.entails(ind2) True >>> ind2.entails(ind1) False
- is_equivalent(other)[source]#
Returns True if the two Independencies-objects are equivalent, otherwise False. (i.e. any Bayesian Network that satisfies the one set of conditional independencies also satisfies the other).
Might be very slow if more than six variables are involved.
- Parameters:
other (Independencies()-object)
Examples
>>> from pgmpy.independencies import Independencies >>> ind1 = Independencies(['X', ['Y', 'W'], 'Z']) >>> ind2 = Independencies(['X', 'Y', 'Z'], ['X', 'W', 'Z']) >>> ind3 = Independencies(['X', 'Y', 'Z'], ['X', 'W', 'Z'], ['X', 'Y', ['W','Z']]) >>> ind1.is_equivalent(ind2) False >>> ind1.is_equivalent(ind3) True
- class IndependenceAssertion(event1=[], event2=[], event3=[])[source]#
Bases:
objectRepresents Conditional Independence or Independence assertion.
Each assertion has 3 attributes: event1, event2, event3. The attributes for
\[U \perp X, Y | Z\]is read as: Random Variable U is independent of X and Y given Z would be:
event1 = {U}
event2 = {X, Y}
event3 = {Z}
- Parameters:
Examples
>>> from pgmpy.independencies import IndependenceAssertion >>> assertion = IndependenceAssertion('U', 'X') >>> assertion = IndependenceAssertion('U', ['X', 'Y']) >>> assertion = IndependenceAssertion('U', ['X', 'Y'], 'Z') >>> assertion = IndependenceAssertion(['U', 'V'], ['X', 'Y'], ['Z', 'A'])
- __init__(event1=[], event2=[], event3=[])[source]#
Initialize an independence assertion with up to three events.
- class PC(name, independencies=None, **kwargs)[source]#
Bases:
StructureEstimatorClass for constraint-based estimation of DAGs using the PC algorithm from a given data set. Identifies (conditional) dependencies in data set using chi_square dependency test and uses the PC algorithm to estimate a DAG pattern that satisfies the identified dependencies. The DAG pattern can then be completed to a faithful DAG, if possible.
References
- [1] Koller & Friedman, Probabilistic Graphical Models - Principles and
Techniques, 2009, Section 18.2
- [2] Neapolitan, Learning Bayesian Networks, Section 10.1.2 for the PC algorithm
(page 550), http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa
- dag = None#
- pdag = None#
- is_fitted_ = False#
- metrics = None#
- __init__(name, independencies=None, **kwargs)[source]#
Class intialization.
- Parameters:
data (pandas DataFrame object) – datafame object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)
independencies (Independencies object) – Independencies object containing a set of conditional independence assertions that will be used to estimate the DAG skeleton. If independencies is None, all conditional independence assertions will be tested.
kwargs (key-value arguments) – Additional arguments passed to the StructureEstimator base class. - variant: str (one of “orig”, “stable”, “parallel”)ss
- build_skeleton(**kwargs)[source]#
Estimates a graph skeleton (UndirectedGraph) from a set of independencies using (the first part of) the PC algorithm. The independencies can either be provided as an instance of the Independencies-class or by passing a decision function that decides any conditional independency assertion. Returns a tuple (skeleton, separating_sets).
If an Independencies-instance is passed, the contained IndependenceAssertions have to admit a faithful BN representation. This is the case if they are obtained as a set of d-seperations of some Bayesian network or if the independence assertions are closed under the semi-graphoid axioms. Otherwise the procedure may fail to identify the correct structure.
- Returns:
skeleton (UndirectedGraph) – An estimate for the undirected graph skeleton of the BN underlying the data.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation procedures)
References
- [1] Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550)
http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa
- [2] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009
Section 3.4.2.1 (page 85), Algorithm 3.3
- static skeleton_to_pdag(skeleton, separating_sets)[source]#
Orients the edges of a graph skeleton based on information from separating_sets to form a DAG pattern (DAG).
- Parameters:
skeleton (UndirectedGraph) – An undirected graph skeleton as e.g. produced by the estimate_skeleton method.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation)
- Returns:
pdag – An estimate for the DAG pattern of the BN underlying the data. The graph might contain some nodes with both-way edges (X->Y and Y->X). Any completion by (removing one of the both-way edges for each such pair) results in a I-equivalent Bayesian network DAG.
- Return type:
References
Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550) http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf # noqa
- class PDAG(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]#
Bases:
DiGraphPartially directed acyclic graph (PDAG) representation.
- __init__(directed_ebunch=[], undirected_ebunch=[], latents=[])[source]#
Initialize a PDAG from directed and undirected edge lists.