causalexplain.independence package#

Submodules#

class ConditionalIndependencies[source]#

Bases: object

A class to store conditional independencies in a graph.

_cache#

A dictionary representing the conditional independencies.

Type:

dict

add(x, y, z)[source]#

Adds a new conditional independence to the cache.

__str__()[source]#

Returns a string representation of the conditional independencies.

__repr__()[source]#

Returns a string representation of the conditional independencies.

Methods

add(var1, var2[, conditioning_set])

Adds a new conditional independence to the cache.

__init__()[source]#
__repr__()[source]#

Returns a string representation of the ConditionalIndependencies object.

add(var1, var2, conditioning_set=None)[source]#

Adds a new conditional independence to the cache.

Parameters:
  • var1 (str) – A node in the graph.

  • var2 (str) – A node in the graph.

  • conditioning_set (list of str or None) – A set of nodes in the graph.

class SufficientSets[source]#

Bases: object

A class to represent the sufficient sets of a conditional independence test.

_cache#

A list of tuples representing the sufficient sets.

Type:

list

add(suff_set)[source]#

Adds a new sufficient set to the cache.

Parameters:

suff_set (list) – A list of tuples representing the new sufficient set to be added.

__str__()[source]#

Returns a string representation of the sufficient sets.

Methods

add(suff_set)

Adds a new sufficient set to the cache.

__init__()[source]#
add(suff_set)[source]#

Adds a new sufficient set to the cache.

Parameters:

suff_set (list) – A list of tuples representing the new sufficient set to be added.

__str__()[source]#

Returns a string representation of the sufficient sets.

Returns:

A string representation of the sufficient sets.

Return type:

str

get_backdoor_paths(dag, x, y)[source]#

Returns all backdoor paths between two nodes in a graph. A backdoor path is a path that starts with an edge towards ‘x’ and ends with an edge towards ‘y’.

get_paths(graph, x, y)[source]#

Returns all simple paths between two nodes in a directed graph.

Parameters:
  • (nx.DiGraph) (- graph)

  • (str) (- y)

  • (str)

Returns:

- list

Return type:

A list of all simple paths between x and y.

find_colliders_in_path(dag, path)[source]#

Returns all colliders in a path.

get_sufficient_sets_for_pair(dag, x, y, verbose=False)[source]#

Compute the sufficient sets for a pair of nodes in a graph. A sufficient set is a set of nodes that blocks all backdoor paths between x and y.

get_sufficient_sets(dag, verbose=False)[source]#

Get the sufficient sets (admissible sets) for all pairs of nodes in a graph.

get_conditional_independencies(dag, verbose=False)[source]#

Computes the set of conditional independencies implied by the graph G.

custom_main()[source]#
main()[source]#
dag_main()[source]#

Module for the edge orientation algorithm.

get_edge_orientation(data, x, y, iters=20, method='gpr', verbose=False)[source]#

This is an ANM test of independence for the pairs between which a lot of correlation has been seen. If the test is repeated a sufficient number of times (100) the correct causal direction almost always comes out – and in cases where it is not, it is enough to repeat the test an odd number of times (5) to see that the result Yeah that’s right.

Parameters:
  • data (pandas.DataFrame) – The dataset with the samples for the features

  • x (str) – the name of the source feature

  • y (str) – the name of the target feature

  • iters (int, optional) – Nr of repetitions of the test. Defaults to 100.

  • method (str, optional) – Can be ‘gpr’ or ‘gam’. Defaults to ‘gpr’.

  • verbose (bool, optional) – Verbosity. Defaults to False.

Returns:

Returns +1 if direction is x->y, or -1 if direction is x<-y

Returns 0 if no direction can be set.

Return type:

int

estimate(digraph, data, in_place=True, verbose=False)[source]#

Takes the original digraph passed as argument, and computes the effect of each treatment in the outcome, given the graph. Treament and outcome are the pairs formed by traversing the edges in the oriented DAG. The resulting ATE and RefuteResult are computed by calling the method estimate_edge and incorporated to each edge by adding the ATE and RefuteResult as attributes.

Parameters:
  • digraph (nx.DiGraph) – The causal graph.

  • data (pd.DataFrame) – The data.

  • in_place (bool, optional) – Whether to modify the graph in place. Defaults to True.

  • verbose (bool, optional) – If True, print the results. Defaults to False.

Returns:

The estimated causal graph.

Return type:

nx.DiGraph

estimate_edge(digraph, treatment, outcome, data, verbose=False)[source]#

Estimate the effect of a treatment in an outcome, given a graph.

Parameters:
  • digraph (nx.DiGraph) – The causal graph.

  • treatment (str) – The name of the treatment variable.

  • outcome (str) – The name of the outcome variable.

  • data (pd.DataFrame) – The data.

  • verbose (bool, optional) – If True, print the results. Defaults to False.

Returns:

The estimated effect.

Return type:

CausalEstimate

main(exp_name, data_path='/Users/renero/phd/data/RC3/', output_path='/Users/renero/phd/output/RC4/', scale=False)[source]#

Runs a custom main function for the given experiment name.

Parameters:
  • experiment_name (str) – The name of the experiment to run.

  • path (str) – The path to the data files.

  • output_path (str) – The path to the output files.

Returns:

None

select_features(values, feature_names, return_shaps=False, min_impact=1e-06, exhaustive=False, threshold=None, verbose=False)[source]#

Sort the values and select those before (strict) the point of max. curvature, according to the selected algorithm. If strict is False, the point of max curv. is also selected. When the method is ‘abrupt’ the selection method is based on taking only those feature up-to (down-to) a certain percentage of change in their values.

Parameters:
  • values (-) – The values for each of the features. This can be anything that should be used to determine what features are more important than others.

  • feature_names (-) – Names of the variables corresponding to the shap values

  • return_shaps (-) – Whether returning the mean shap values together with order of the features.

  • min_impact (-) – Default 1e-06. The minimum impact of a feature to be selected. If all features are below this value, none are selected.

  • exhaustive (-) – Default False. Whether to use the exhaustive method or not. If True, the threshold is used to find all possible clusters above the given threshold, not only the first one.

  • threshold (-) – Default None. The threshold to use when exahustive is True. If None, exception is raised.

  • verbose (-) – guess what.

find_cluster_change_point(X, verbose=False)[source]#

Given an array of values in increasing or decreasing order, detect what are the elements that belong to the same cluster. The clustering is done using DBSCAN with a distance computed as the max. difference between consecutive elements.

Parameters:
  • X (-) – the series of values to detect the abrupt change.

  • verbose (-) – Verbose output.

Returns:

The position in the array where an abrupt change is produced. If there’s

no change in consecutive values greater than the tolerance passed then the last element of the array is returned.

Return type:

int

main()[source]#
test()[source]#

Application of early stage FCI rules to setup the skeleton of a causal graph to an existing graph in order to prune potentially spurious edges.

    1. Renero, 2022

class GraphIndependence(base_graph, condlen=1, condsize=0, max_rows=500, prog_bar=True, verbose=False, silent=False)[source]#

Bases: object

Class for removing independent edges from a causal graph.

Methods

compute_cond_indep_pvals()

Perform the _test_cond_independence on all pairs of nodes in the graph, and store the p_value (3rd element in the return tuple) on the class cond_indep_pvals

fit(X[, y])

Remove edges from the graph that are conditionally independent.

fit_predict(X[, y])

Fits the model to the data and returns predictions.

predict()

Predicts the causal graph using the current independence tests and returns the resulting graph.

__init__(base_graph, condlen=1, condsize=0, max_rows=500, prog_bar=True, verbose=False, silent=False)[source]#
fit(X, y=None)[source]#

Remove edges from the graph that are conditionally independent.

predict()[source]#

Predicts the causal graph using the current independence tests and returns the resulting graph.

Returns:

The predicted causal graph.

fit_predict(X, y=None)[source]#

Fits the model to the data and returns predictions.

Parameters: X (DataFrame): The input data to fit the model on. y (optional): The target variable to fit the model on.

Returns: The predictions made by the model.

compute_cond_indep_pvals()[source]#

Perform the _test_cond_independence on all pairs of nodes in the graph, and store the p_value (3rd element in the return tuple) on the class cond_indep_pvals

Methods in this module perform HSIC independence test, and compute the HSIC value and statistic.

After checking that most implementations fail to provide consistent results, I decided to take the implementation from the DoWhy package.

class HSIC_Values(hsic, p_value, stat, independence)[source]#

Bases: object

__init__(hsic, p_value, stat, independence)[source]#
hsic: float = 0.0#
p_value: float = 0.0#
stat: float = 0.0#
independence: bool = False#
class HSIC[source]#

Bases: object

Provides a class sklearn-type interface to the Hsic methods in this DoWhy.

Attributes:
hsic
independence
p_value
stat

Methods

fit

__init__()[source]#
fit(X, Y)[source]#
property p_value#
property hsic#
property stat#
property independence#

Created on 2013/01/26

@author: myamada

rbf_dot(X, deg)[source]#
kernel_Delta_norm(xin1, xin2)[source]#
kernel_Delta(xin1, xin2)[source]#
kernel_Gaussian(xin1, xin2, sigma)[source]#

This module contains functions to compute the Maximal Information Coefficient between pairs of features in a dataframe. The MIC is a measure of the strength of the linear or non-linear association between two variables. The MIC is computed using the MINE statistics, which is a non-parametric method that computes the MIC between two variables by estimating the mutual information between them.

pairwise_mic(data, alpha=0.6, c=15, to_return='mic', est='mic_approx', prog_bar=True, silent=False)[source]#

From a dataframe, compute the MIC for each pair of features. See minepy/minepy and minepy/mictools for more details.

  • [Reshef2016] Yakir A. Reshef, David N. Reshef, Hilary K. Finucane and Pardis C.

Sabeti and Michael Mitzenmacher. Measuring Dependence Powerfully and Equitably. Journal of Machine Learning Research, 2016. - [Matejka2017] J. Matejka and G. Fitzmaurice. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. ACM SIGCHI Conference on Human Factors in Computing Systems, 2017.

Parameters:
  • data (DataFrame) – A DF with continuous numerical values

  • alpha (float) – MINE MIC value for alpha

  • c (int) – MINE MIC value for c

  • to_return (str) – Either ‘mic’ or ‘tic’.

  • est (str) – MINE MIC value for est. Default is est=”mic_approx” where the original MINE statistics will be computed, with est=”mic_e” the equicharacteristic matrix is is evaluated and MIC_e and TIC_e are returned.

  • prog_bar (bool) – whether to print the prog_bar or not.

Returns:

A dataframe with the MIC values between pairs.

fit_and_get_residuals(X, Y, X_test=None, Y_test=None, method='gpr')[source]#

Fit a model y ~ f(X), where X is an independent variable and Y is a dependent one. The model is passed as argument, together with the training and test sets.

Parameters:
  • X (ndarray) – (np.ndarray) The feature to be used as input to predict Y_train

  • Y (ndarray) – (np.ndarray) The feature to be predicted

  • X_test (ndarray | None) – (np.ndarray) The feature to be used as input to predict Y_test

  • Y_test (ndarray | None) – (np.ndarray) The feature to be predicted

  • method – (str) Either “gpr” or “gam”

Returns:

The method returns the residuals and the RMS error.

run_feature_selection(X, y)[source]#

Extracts ‘y’ from the list of features of “X” and call the prediction method passed to asses the predictive influence of each variable in X to obtain “y”.

Parameters:
  • X (DataFrame) – Dataframe with ALL continous variables

  • y (str) – the name of the variable in X to be used as target.

  • predict_method – the method used to predict “y” from “X”. “hsiclasso” or “block_hsic_lasso”

Returns:

with the predictive score for each variable.

Return type:

List

Module contents#