causalexplain.independence package#
Submodules#
- class ConditionalIndependencies[source]#
Bases:
object
A class to store conditional independencies in a graph.
Methods
add
(var1, var2[, conditioning_set])Adds a new conditional independence to the cache.
- class SufficientSets[source]#
Bases:
object
A class to represent the sufficient sets of a conditional independence test.
- add(suff_set)[source]#
Adds a new sufficient set to the cache.
- Parameters:
suff_set (list) – A list of tuples representing the new sufficient set to be added.
Methods
add
(suff_set)Adds a new sufficient set to the cache.
- get_backdoor_paths(dag, x, y)[source]#
Returns all backdoor paths between two nodes in a graph. A backdoor path is a path that starts with an edge towards ‘x’ and ends with an edge towards ‘y’.
- get_paths(graph, x, y)[source]#
Returns all simple paths between two nodes in a directed graph.
- Parameters:
(nx.DiGraph) (- graph)
(str) (- y)
(str)
- Returns:
- list
- Return type:
A list of all simple paths between x and y.
- get_sufficient_sets_for_pair(dag, x, y, verbose=False)[source]#
Compute the sufficient sets for a pair of nodes in a graph. A sufficient set is a set of nodes that blocks all backdoor paths between x and y.
- get_sufficient_sets(dag, verbose=False)[source]#
Get the sufficient sets (admissible sets) for all pairs of nodes in a graph.
- get_conditional_independencies(dag, verbose=False)[source]#
Computes the set of conditional independencies implied by the graph G.
Module for the edge orientation algorithm.
- get_edge_orientation(data, x, y, iters=20, method='gpr', verbose=False)[source]#
This is an ANM test of independence for the pairs between which a lot of correlation has been seen. If the test is repeated a sufficient number of times (100) the correct causal direction almost always comes out – and in cases where it is not, it is enough to repeat the test an odd number of times (5) to see that the result Yeah that’s right.
- Parameters:
data (pandas.DataFrame) – The dataset with the samples for the features
x (str) – the name of the source feature
y (str) – the name of the target feature
iters (int, optional) – Nr of repetitions of the test. Defaults to 100.
method (str, optional) – Can be ‘gpr’ or ‘gam’. Defaults to ‘gpr’.
verbose (bool, optional) – Verbosity. Defaults to False.
- Returns:
- Returns +1 if direction is x->y, or -1 if direction is x<-y
Returns 0 if no direction can be set.
- Return type:
- estimate(digraph, data, in_place=True, verbose=False)[source]#
Takes the original digraph passed as argument, and computes the effect of each treatment in the outcome, given the graph. Treament and outcome are the pairs formed by traversing the edges in the oriented DAG. The resulting ATE and RefuteResult are computed by calling the method estimate_edge and incorporated to each edge by adding the ATE and RefuteResult as attributes.
- Parameters:
- Returns:
The estimated causal graph.
- Return type:
nx.DiGraph
- estimate_edge(digraph, treatment, outcome, data, verbose=False)[source]#
Estimate the effect of a treatment in an outcome, given a graph.
- Parameters:
- Returns:
The estimated effect.
- Return type:
CausalEstimate
- main(exp_name, data_path='/Users/renero/phd/data/RC3/', output_path='/Users/renero/phd/output/RC4/', scale=False)[source]#
Runs a custom main function for the given experiment name.
- select_features(values, feature_names, return_shaps=False, min_impact=1e-06, exhaustive=False, threshold=None, verbose=False)[source]#
Sort the values and select those before (strict) the point of max. curvature, according to the selected algorithm. If strict is False, the point of max curv. is also selected. When the method is ‘abrupt’ the selection method is based on taking only those feature up-to (down-to) a certain percentage of change in their values.
- Parameters:
values (-) – The values for each of the features. This can be anything that should be used to determine what features are more important than others.
feature_names (-) – Names of the variables corresponding to the shap values
return_shaps (-) – Whether returning the mean shap values together with order of the features.
min_impact (-) – Default 1e-06. The minimum impact of a feature to be selected. If all features are below this value, none are selected.
exhaustive (-) – Default False. Whether to use the exhaustive method or not. If True, the threshold is used to find all possible clusters above the given threshold, not only the first one.
threshold (-) – Default None. The threshold to use when exahustive is True. If None, exception is raised.
verbose (-) – guess what.
- find_cluster_change_point(X, verbose=False)[source]#
Given an array of values in increasing or decreasing order, detect what are the elements that belong to the same cluster. The clustering is done using DBSCAN with a distance computed as the max. difference between consecutive elements.
- Parameters:
X (-) – the series of values to detect the abrupt change.
verbose (-) – Verbose output.
- Returns:
- The position in the array where an abrupt change is produced. If there’s
no change in consecutive values greater than the tolerance passed then the last element of the array is returned.
- Return type:
Application of early stage FCI rules to setup the skeleton of a causal graph to an existing graph in order to prune potentially spurious edges.
Renero, 2022
- class GraphIndependence(base_graph, condlen=1, condsize=0, max_rows=500, prog_bar=True, verbose=False, silent=False)[source]#
Bases:
object
Class for removing independent edges from a causal graph.
Methods
Perform the _test_cond_independence on all pairs of nodes in the graph, and store the p_value (3rd element in the return tuple) on the class cond_indep_pvals
fit
(X[, y])Remove edges from the graph that are conditionally independent.
fit_predict
(X[, y])Fits the model to the data and returns predictions.
predict
()Predicts the causal graph using the current independence tests and returns the resulting graph.
- __init__(base_graph, condlen=1, condsize=0, max_rows=500, prog_bar=True, verbose=False, silent=False)[source]#
- predict()[source]#
Predicts the causal graph using the current independence tests and returns the resulting graph.
- Returns:
The predicted causal graph.
Methods in this module perform HSIC independence test, and compute the HSIC value and statistic.
After checking that most implementations fail to provide consistent results, I decided to take the implementation from the DoWhy package.
- class HSIC[source]#
Bases:
object
Provides a class sklearn-type interface to the Hsic methods in this DoWhy.
- Attributes:
- hsic
- independence
- p_value
- stat
Methods
fit
- property p_value#
- property hsic#
- property stat#
- property independence#
Created on 2013/01/26
@author: myamada
This module contains functions to compute the Maximal Information Coefficient between pairs of features in a dataframe. The MIC is a measure of the strength of the linear or non-linear association between two variables. The MIC is computed using the MINE statistics, which is a non-parametric method that computes the MIC between two variables by estimating the mutual information between them.
- pairwise_mic(data, alpha=0.6, c=15, to_return='mic', est='mic_approx', prog_bar=True, silent=False)[source]#
From a dataframe, compute the MIC for each pair of features. See minepy/minepy and minepy/mictools for more details.
[Reshef2016] Yakir A. Reshef, David N. Reshef, Hilary K. Finucane and Pardis C.
Sabeti and Michael Mitzenmacher. Measuring Dependence Powerfully and Equitably. Journal of Machine Learning Research, 2016. - [Matejka2017] J. Matejka and G. Fitzmaurice. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. ACM SIGCHI Conference on Human Factors in Computing Systems, 2017.
- Parameters:
data (DataFrame) – A DF with continuous numerical values
alpha (float) – MINE MIC value for alpha
c (int) – MINE MIC value for c
to_return (str) – Either ‘mic’ or ‘tic’.
est (str) – MINE MIC value for est. Default is est=”mic_approx” where the original MINE statistics will be computed, with est=”mic_e” the equicharacteristic matrix is is evaluated and MIC_e and TIC_e are returned.
prog_bar (bool) – whether to print the prog_bar or not.
- Returns:
A dataframe with the MIC values between pairs.
- fit_and_get_residuals(X, Y, X_test=None, Y_test=None, method='gpr')[source]#
Fit a model y ~ f(X), where X is an independent variable and Y is a dependent one. The model is passed as argument, together with the training and test sets.
- Parameters:
X (ndarray) – (np.ndarray) The feature to be used as input to predict Y_train
Y (ndarray) – (np.ndarray) The feature to be predicted
X_test (ndarray | None) – (np.ndarray) The feature to be used as input to predict Y_test
Y_test (ndarray | None) – (np.ndarray) The feature to be predicted
method – (str) Either “gpr” or “gam”
- Returns:
The method returns the residuals and the RMS error.