Quickstart#

This guide will help you get started with CausalExplain.

Basic Usage#

Here’s a simple example of how to get CausalExplain help from the command line:

python -m causalexplain --help

In order to run CausalExplain from the command line, you need to have Python 3.10 or later installed on your system. To install CausalExplain, run the following command:

pip install causalexplain

Once CausalExplain is installed, you can run it from the command line by typing python -m causalexplain.

To run a simple case with a toy_dataset.csv file using ReX model, you can use the following command, assuming default parameters:

python -m causalexplain -d /path/to/toy_dataset.csv

That will generate the ReX model and run the model on the dataset, and print the results to the terminal, like this:

Resulting Graph:
---------------
X1 -> X2
  X2 -> X4
  X2 -> X3
X1 -> X4

which is the true graph expected.

Input Arguments Information#

The basic arguments are:

  • -d or --dataset: The path to the dataset file in CSV format.

  • -t or --true_dag: The path to the true DAG file in DOT format.

  • -m or --method: The method to use to infer the causal graph.

These options allow you to specify the dataset, true DAG, and method to be used. In case you don’t have a true DAG, the result is the plausible causal graph, which is the causal graph that is inferred by the method without taking into account the true DAG.

Regarding the output of the causalexplain command, the following information is provided:

  • The plausible causal graph, which is the causal graph that is inferred by

the method without taking into account the true DAG. - The metrics obtained from the evaluation of the causal graph against the true DAG.

In those cases where training or running a method takes a long time, causalexplain allows you to save the model (-s or `--save_model`) trained in a file and load it later. To load the model, use the -l or --load_model option.

The option -b or --bootstrap allows you to specify the number of iterations for bootstrap in the ReX method. The default value is 20, but you can change it to a different value, to test the effect of the number of iterations on the performance of the method. This option is linked to the next one, -T.

The option -T or --threshold allows you to specify a threshold for the bootstrapped adjacency matrix computed for the ReX method. The default value is 0.3, but you can change it to a different value, to test the effect of the threshold on the performance of the method. Lower values in the adjacency matrix represent edges that appear less frequently in the bootstrap samples, while higher values represent edges that appear more frequently. So, a higher threshold represents a more conservative approach to the inference of the causal graph.

The option -r or --regressor allows you to specify a list of comma-separated names of the regressors to be used. The default value is dnn,gbt, but you can change it to a different list of regressors. Current implementation only supports DNN and GBT regressors, but they can be extended in the future.

The option -u or --union allows you to specify a list of comma-separated names of the DAGs to be unioned. This option is only valid for the ReX method, and it is used to combine the causal graphs inferred by the method with different hyperparameters. By default, ReX combines the DAGs inferred with the DNN and GBT regressors, but you can extend ReX with more regressors and combine them with different hyperparameters.

The option -i or --iterations allows you to specify the number of iterations that the hyper-parameter optimization will perform in the ReX method. The default value is 100, but you can change it to a different value, to test the effect of the number of iterations on the performance of the method.

The option -S or --seed allows you to specify a seed for the random number generator. The default value is 1234, but you can change it to a different value, to test the effect of the seed on the performance of the method.

The option -o or --output allows you to specify the path to the output file where the resulting DAG will be saved in DOT format. The default value is ./output.dot, but you can change it to a different value, to save the DAG in a different file.