algorithms

ANMNonlinear

class castle.algorithms.ANMNonlinear(alpha=0.05)[source]

Nonlinear causal discovery with additive noise models

Use GPML with Gaussian kernel and independent Gaussian noise, optimizing the hyper-parameters for each regression individually. For the independence test, we implemented the HSIC with a Gaussian kernel, where we used the gamma distribution as an approximation for the distribution of the HSIC under the null hypothesis of independence in order to calculate the p-value of the test result.

References

Hoyer, Patrik O and Janzing, Dominik and Mooij, Joris M and Peters, Jonas and Schölkopf, Bernhard, “Nonlinear causal discovery with additive noise models”, NIPS 2009

Parameters

alphafloat, default 0.05: significance level be used to compute threshold

Attributes

causal_matrixarray like shape of (n_features, n_features): Learned causal structure matrix.

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import DAG, IIDSimulation
>>> from castle.algorithms.anm import ANMNonlinear

>>> weighted_random_dag = DAG.erdos_renyi(n_nodes=6, n_edges=10,
>>>                                      weight_range=(0.5, 2.0), seed=1)
>>> dataset = IIDSimulation(W=weighted_random_dag, n=1000,
>>>                         method='nonlinear', sem_type='gp-add')
>>> true_dag, X = dataset.B, dataset.X

>>> anm = ANMNonlinear(alpha=0.05)
>>> anm.learn(data=X)

>>> # plot predict_dag and true_dag
>>> GraphDAG(anm.causal_matrix, true_dag, show=False, save_name='result')

you can also provide more parameters to use it. like the flowing: >>> from sklearn.gaussian_process.kernels import Matern, RBF >>> kernel = Matern(nu=1.5) >>> # kernel = 1.0 * RBF(1.0) >>> anm = ANMNonlinear(alpha=0.05) >>> anm.learn(data=X, regressor=GPR(kernel=kernel)) >>> # plot predict_dag and true_dag >>> GraphDAG(anm.causal_matrix, true_dag, show=False, save_name=’result’)

anm_estimate(x, y, regressor=<castle.algorithms.anm._anm.GPR object>, test_method=<function hsic_test>)[source]

Compute the fitness score of the ANM model in the x->y direction.

Parameters

x: array: Variable seen as cause
y: array: Variable seen as effect
regressor: Class: Nonlinear regression estimator, if not provided, it is GPR. If user defined, must implement estimate method. such as :

regressor.estimate(x, y)
test_method: callable, default test_method: independence test method, if not provided, it is HSIC. If user defined, must accept three arguments–x, y and keyword argument–alpha. such as :

test_method(x, y, alpha=0.05)

Returns

out: int, 0 or 1: If 1, residuals n is independent of x, then accept x –> y If 0, residuals n is not independent of x, then reject x –> y

Examples

>>> import numpy as np
>>> from castle.algorithms.anm import ANMNonlinear
>>> np.random.seed(1)
>>> x = np.random.rand(500, 2)
>>> anm = ANMNonlinear(alpha=0.05)
>>> print(anm.anm_estimate(x[:, [0]], x[:, [1]]))
1

learn(data, columns=None, regressor=<castle.algorithms.anm._anm.GPR object>, test_method=<function hsic_test>, **kwargs)[source]

Set up and run the ANM_Nonlinear algorithm.

Parameters

data: numpy.ndarray or Tensor: Training data.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
regressor: Class: Nonlinear regression estimator, if not provided, it is GPR. If user defined, must implement estimate method. such as :

regressor.estimate(x, y)
test_method: callable, default test_method: independence test method, if not provided, it is HSIC. If user defined, must accept three arguments–x, y and keyword argument–alpha. such as :

test_method(x, y, alpha=0.05)

GES

class castle.algorithms.GES(criterion='bic', method='scatter', k=0.001, N=10)[source]

Greedy equivalence search for causal discovering

References

[1]: https://www.sciencedirect.com/science/article/pii/S0888613X12001636 [2]: https://www.jmlr.org/papers/volume3/chickering02b/chickering02b.pdf

Parameters

criterion: str for DecomposableScore object

scoring criterion, one of [‘bic’, ‘bdeu’].

Notes:

‘bdeu’ just for discrete variable.

2. if you want to customize criterion, you must create a class and inherit the base class DecomposableScore in module ges.score.local_scores

method: str

effective when criterion=’bic’, one of [‘r2’, ‘scatter’].

k: float, default: 0.001

structure prior, effective when criterion=’bdeu’.

N: int, default: 10

prior equivalent sample size, effective when criterion=’bdeu’

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import load_dataset

>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> algo = GES()
>>> algo.learn(X)
>>> GraphDAG(algo.causal_matrix, true_dag, save_name='result_pc')
>>> met = MetricsDAG(algo.causal_matrix, true_dag)
>>> print(met.metrics)

DirectLiNGAM

class castle.algorithms.DirectLiNGAM(prior_knowledge=None, measure='pwling', thresh=0.3)[source]

DirectLiNGAM Algorithm. A direct learning algorithm for linear non-Gaussian acyclic model (LiNGAM). Implementation of DirectLiNGAM Algorithm [1]_ [2], Construct a DirectLiNGAM model.

Parameters

prior_knowledgearray-like, shape (n_features, n_features), optional (default=None)

Prior knowledge used for causal discovery, where n_features is the number of features.

The elements of prior knowledge matrix are defined as follows [1]_:

0 : \(x_i\) does not have a directed path to \(x_j\)
1 : \(x_i\) has a directed path to \(x_j\)
-1 : No prior knowledge is available to know if either of the two cases above (0 or 1) is true.

measure{‘pwling’, ‘kernel’}, default=’pwling’

Measure to evaluate independence: ‘pwling’ [2] or ‘kernel’ [1]_.

threshfloat, default=’0.3’

Drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray: Learned causal structure matrix.
weight_causal_matrix: numpy.ndarray: Learned weighted causal structure matrix.

References

Examples

>>> from castle.algorithms import DirectLiNGAM
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> n = DirectLiNGAM()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

fit(X)[source]

Fit the model to X.

Parameters

Xarray-like, shape (n_samples, n_features): Training data, where n_samples is the number of samples and n_features is the number of features.

Returns

selfobject: Returns the instance itself.

learn(data, columns=None, **kwargs)[source]

Set up and run the DirectLiNGAM algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

ICALiNGAM

class castle.algorithms.ICALiNGAM(random_state=None, max_iter=1000, thresh=0.3)[source]

ICALiNGAM Algorithm. An ICA-based learning algorithm for linear non-Gaussian acyclic model (LiNGAM). Implementation of ICA-based LiNGAM Algorithm [1]_, Construct a ICA-based LiNGAM model.

Parameters

random_stateint, optional (default=None): random_state is the seed used by the random number generator.
max_iterint, optional (default=1000): The maximum number of iterations of FastICA.
threshfloat, default=’0.3’: Drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray: Learned causal structure matrix
weight_causal_matrix: numpy.ndarray: Learned weighted causal structure matrix.

References

Examples

>>> from castle.algorithms import ICALiNGAM
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> n = ICALiNGAM()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

fit(X)[source]

Fit the model to X.

Parameters

Xarray-like, shape (n_samples, n_features): Training data, where n_samples is the number of samples and n_features is the number of features.

Returns

selfobject: Returns the instance of self.

learn(data, columns=None)[source]

Set up and run the ICALiNGAM algorithm.

columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

PC

class castle.algorithms.PC(variant='original', alpha=0.05, ci_test='fisherz', priori_knowledge=None)[source]

PC algorithm

A classic causal discovery algorithm based on conditional independence tests.

References

[1] original-PC: https://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf
[2] stable-PC: https://arxiv.org/pdf/1211.3295.pdf
[3] parallel-PC: https://arxiv.org/pdf/1502.02454.pdf

Parameters

variantstr: A variant of PC-algorithm, one of [original, stable, parallel].
alpha: float, default 0.05: Significance level.
ci_teststr, callable: ci_test method, if str, must be one of [fisherz, g2, chi2] See more: castle.common.independence_tests.CITest
priori_knowledge: PrioriKnowledge: a class object PrioriKnowledge

Attributes

causal_matrixarray: Learned causal structure matrix.

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import load_dataset

>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> pc = PC(variant='stable')
>>> pc.learn(X)
>>> GraphDAG(pc.causal_matrix, true_dag, save_name='result_pc')
>>> met = MetricsDAG(pc.causal_matrix, true_dag)
>>> print(met.metrics)

>>> pc = PC(variant='parallel')
>>> pc.learn(X, p_cores=2)
>>> GraphDAG(pc.causal_matrix, true_dag, save_name='result_pc')
>>> met = MetricsDAG(pc.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, columns=None, **kwargs)[source]

Set up and run the PC algorithm.

Parameters

data: array or Tensor

Training data

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

kwargs: [optional]

p_coresint: number of CPU cores to be used
sboolean: memory-efficient indicator
batchint: number of edges per batch

if s is None or False, or without batch, batch=|J|. |J| denote number of all pairs of adjacency vertices (X, Y) in G.

TTPM

class castle.algorithms.TTPM(topology_matrix, delta=0.1, epsilon=1, max_hop=0, penalty='BIC', max_iter=20, priori_knowledge=None)[source]

TTPM Algorithm.

A causal structure learning algorithm based on Topological Hawkes process: for spatio-temporal event sequences.

Parameters

topology_matrix: np.matrix: Interpreted as an adjacency matrix to generate the graph. It should have two dimensions, and should be square.
delta: float, default=0.1: Time decaying coefficient for the exponential kernel.
epsilon: int, default=1: BIC penalty coefficient.
max_hop: positive int, default=6: The maximum considered hops in the topology, when max_hop=0, it is divided by nodes, regardless of topology.
penalty: str, default=BIC: Two optional values: ‘BIC’ or ‘AIC’.
max_iter: int: Maximum number of iterations.
priori_knowledge: PrioriKnowledge, default=None: a class object PrioriKnowledge

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import load_dataset
>>> from castle.algorithms import TTPM
# Data Simulation for TTPM
>>> X, true_causal_matrix, topology_matrix = load_dataset('THP_Test')
>>> ttpm = TTPM(topology_matrix, max_hop=2)
>>> ttpm.learn(X)
>>> causal_matrix = ttpm.causal_matrix
# plot est_dag and true_dag
>>> GraphDAG(ttpm.causal_matrix, true_causal_matrix)
# calculate accuracy
>>> ret_metrix = MetricsDAG(ttpm.causal_matrix, true_causal_matrix)
>>> ret_metrix.metrics

learn(tensor, *args, **kwargs)[source]

Set up and run the TTPM algorithm.

Parameters

tensor: pandas.DataFrame

(V 1.0.0, we’ll eliminate this constraint in the next version) The tensor is supposed to contain three cols:

[‘event’, ‘timestamp’, ‘node’]

Description of the three columns:: event: event name (type). timestamp: occurrence timestamp of event, i.e., ‘1615962101.0’. node: topological node where the event happened.

CORL

class castle.algorithms.CORL(batch_size=64, input_dim=100, embed_dim=256, normalize=False, encoder_name='transformer', encoder_heads=8, encoder_blocks=3, encoder_dropout_rate=0.1, decoder_name='lstm', reward_mode='episodic', reward_score_type='BIC', reward_regression_type='LR', reward_gpr_alpha=1.0, iteration=10000, lambda_iter_num=500, actor_lr=0.0001, critic_lr=0.001, alpha=0.99, init_baseline=-1.0, random_seed=0, device_type='cpu', device_ids=0)[source]

Causal discovery with Ordering-based Reinforcement Learning

A RL- and order-based algorithm that improves the efficiency and scalability of previous RL-based approach, contains CORL1 with episodic reward type and CORL2 with dense reward type``.

References

https://arxiv.org/abs/2105.06631

Parameters

batch_size: int, default: 64: training batch size
input_dim: int, default: 64: dimension of input data
embed_dim: int, default: 256: dimension of embedding layer output
normalize: bool, default: False: whether normalization for input data
encoder_name: str, default: ‘transformer’: Encoder name, must be one of [‘transformer’, ‘lstm’, ‘mlp’]
encoder_heads: int, default: 8: number of multi-head of transformer Encoder.
encoder_blocks: int, default: 3: blocks number of Encoder
encoder_dropout_rate: float, default: 0.1: dropout rate for encoder
decoder_name: str, default: ‘lstm’: Decoder name, must be one of [‘lstm’, ‘mlp’]
reward_mode: str, default: ‘episodic’: reward mode, ‘episodic’ or ‘dense’, ‘episodic’ denotes episodic-reward, ‘dense’ denotes dense-reward.
reward_score_type: str, default: ‘BIC’: type of score function
reward_regression_type: str, default: ‘LR’: type of regression function, must be one of [‘LR’, ‘QR’]
reward_gpr_alpha: float, default: 1.0: alpha of GPR
iteration: int, default: 5000: training times
actor_lr: float, default: 1e-4: learning rate of Actor network, includes encoder and decoder.
critic_lr: float, default: 1e-3: learning rate of Critic network
alpha: float, default: 0.99: alpha for score function, includes dense_actor_loss and dense_critic_loss.
init_baseline: float, default: -1.0: initilization baseline for score function, includes dense_actor_loss and dense_critic_loss.
random_seed: int, default: 0: random seed for all random process
device_type: str, default: cpu: cpu or gpu
device_ids: int or str, default None: CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

Examples

>>> from castle.algorithms.gradient.corl.torch import CORL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = CORL()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, columns=None, **kwargs) → None[source]

Set up and run the Causal discovery with Ordering-based Reinforcement Learning algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

Other Parameters:

dag_maskndarray: two-dimensional array with [0, 1], shape = [n_nodes, n_nodes]. (i, j) indicated element 0 denotes there must be no edge between nodes i and j , the element 1 indicates that there may or may not be an edge.

DAG_GNN

class castle.algorithms.DAG_GNN(encoder_type='mlp', decoder_type='mlp', encoder_hidden=64, latent_dim=None, decoder_hidden=64, encoder_dropout=0.0, decoder_dropout=0.0, epochs=300, k_max_iter=100.0, tau_a=0.0, batch_size=100, lr=0.003, lr_decay=200, gamma=1.0, init_lambda_a=0.0, init_c_a=1.0, c_a_thresh=1e+20, eta=10, multiply_h=0.25, h_tolerance=1e-08, use_a_connect_loss=False, use_a_positiver_loss=False, graph_threshold=0.3, optimizer='adam', seed=42, device_type='cpu', device_ids='0')[source]

DAG Structure Learning with Graph Neural Networks

References

https://arxiv.org/pdf/1904.10098.pdf

Parameters

encoder_type: str, default: ‘mlp’: choose an encoder, ‘mlp’ or ‘sem’.
decoder_type: str, detault: ‘mlp’: choose a decoder, ‘mlp’ or ‘sem’.
encoder_hidden: int, default: 64: MLP encoder hidden layer dimension, just one hidden layer.
latent_dim: int, default equal to input dimension: encoder output dimension
decoder_hidden: int, default: 64: MLP decoder hidden layer dimension, just one hidden layer.
encoder_dropout: float, default: 0.0: Dropout rate (1 - keep probability).
decoder_dropout: float, default: 0.0: Dropout rate (1 - keep probability).
epochs: int, default: 300: train epochs
k_max_iter: int, default: 1e2: the max iteration number for searching lambda and c.
batch_size: int, default: 100: Sample size of each training batch
lr: float, default: 3e-3: learning rate
lr_decay: int, default: 200: Period of learning rate decay.
gamma: float, default: 1.0: Multiplicative factor of learning rate decay.
lambda_a: float, default: 0.0: coefficient for DAG constraint h(A).
c_a: float, default: 1.0: coefficient for absolute value h(A).
c_a_thresh: float, default: 1e20: control loop by c_a
eta: int, default: 10: use for update c_a, greater equal than 1.
multiply_h: float, default: 0.25: use for judge whether update c_a.
tau_a: float, default: 0.0: coefficient for L-1 norm of A.
h_tolerance: float, default: 1e-8: the tolerance of error of h(A) to zero.
use_a_connect_loss: bool, default: False: flag to use A connect loss
use_a_positiver_loss: bool, default: False: flag to enforce A must have positive values
graph_threshold: float, default: 0.3: threshold for learned adjacency matrix binarization. greater equal to graph_threshold denotes has causal relationship.
optimizer: str, default: ‘Adam’: choose optimizer, ‘Adam’ or ‘SGD’
seed: int, default: 42: random seed
device_type: str, default: cpu: cpu or gpu
device_ids: int or str, default None: CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

Examples

>>> from castle.algorithms.gradient.dag_gnn.torch import DAG_GNN
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> m = DAG_GNN()
>>> m.learn(X)
>>> GraphDAG(m.causal_matrix, true_dag)
>>> met = MetricsDAG(m.causal_matrix, true_dag)
>>> print(met.metrics)

GAE

class castle.algorithms.GAE(input_dim=1, hidden_layers=1, hidden_dim=4, activation=LeakyReLU(negative_slope=0.05), epochs=10, update_freq=3000, init_iter=3, lr=0.001, alpha=0.0, beta=2.0, init_rho=1.0, rho_thresh=1e+30, gamma=0.25, penalty_lambda=0.0, h_thresh=1e-08, graph_thresh=0.3, early_stopping=False, early_stopping_thresh=1.0, seed=1230, device_type='cpu', device_ids='0')[source]

GAE Algorithm. A gradient-based algorithm using graph autoencoder to model non-linear causal relationships.

Parameters

input_dim: int, default: 1: dimension of vector for x
hidden_layers: int, default: 1: number of hidden layers for encoder and decoder
hidden_dim: int, default: 4: hidden size for mlp layer
activation: callable, default: nn.LeakyReLU(0.05): nonlinear functional
epochs: int, default: 10: Number of iterations for optimization problem
update_freq: int, default: 3000: Number of steps for each iteration
init_iter: int, default: 3: Initial iteration to disallow early stopping
lr: float, default: 1e-3: learning rate
alpha: float, default: 0.0: Lagrange multiplier
beta: float, default: 2.0: Multiplication to amplify rho each time
init_rho: float, default: 1.0: Initial value for rho
rho_thresh: float, default: 1e30: Threshold for rho
gamma: float, default: 0.25: Threshold for h
penalty_lambda: float, default: 0.0: L1 penalty for sparse graph. Set to 0.0 to disable
h_thresh: float, default: 1e-8: Tolerance of optimization problem
graph_thresh: float, default: 0.3: Threshold to filter out small values in the graph
early_stopping: bool, default: False: Whether to use early stopping
early_stopping_thresh: float, default: 1.0: Threshold ratio for early stopping
seed: int, default: 1230: Reproducibility, must be int
device_type: str, default: ‘cpu’: ‘cpu’ or ‘gpu’
device_ids: int or str, default ‘0’: CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

GraNDAG

class castle.algorithms.GraNDAG(input_dim, hidden_num=2, hidden_dim=10, batch_size=64, lr=0.001, iterations=10000, model_name='NonLinGaussANM', nonlinear='leaky-relu', optimizer='rmsprop', h_threshold=1e-08, device_type='cpu', device_ids='0', use_pns=False, pns_thresh=0.75, num_neighbors=None, normalize=False, precision=False, random_seed=42, jac_thresh=True, lambda_init=0.0, mu_init=0.001, omega_lambda=0.0001, omega_mu=0.9, stop_crit_win=100, edge_clamp_range=0.0001, norm_prod='paths', square_prod=False)[source]

Gradient Based Neural DAG Learner

A gradient-based algorithm using neural network modeling for non-linear additive noise data

References: https://arxiv.org/pdf/1906.02226.pdf

Parameters

input_dimint: number of input layer, must be int
hidden_numint, default 2: number of hidden layers
hidden_dimint, default 10: number of dimension per hidden layer
batch_sizeint, default 64: batch size of per training of NN
lrfloat, default 0.001: learning rate
iterationsint, default 10000: times of iteration
model_namestr, default ‘NonLinGaussANM’: name of model, ‘NonLinGauss’ or ‘NonLinGaussANM’
nonlinearstr, default ‘leaky-relu’: name of Nonlinear activation function, ‘sigmoid’ or ‘leaky-relu’
optimizerstr, default ‘rmsprop’: Method of optimize, rmsprop or sgd
h_thresholdfloat, default 1e-8: constrained threshold
device_typestr, default ‘cpu’: use gpu or cpu
use_pnsbool, default False: whether use pns before training, if nodes > 50, use it.
pns_threshfloat, default 0.75: threshold for feature importance score in pns
num_neighborsint, default None: number of potential parents for each variables
normalizebool, default False: whether normalize data
precisionbool, default False: whether use Double precision if True, use torch.FloatTensor; if False, use torch.DoubleTensor
random_seedint, default 42: random seed
norm_prodstr, default ‘paths’: use norm of product of paths, ‘none’ or ‘paths’ ‘paths’: use norm, ‘none’: with no norm
square_prodbool, default False: use squared product of paths
jac_threshbool, default True: get the average Jacobian with the trained model
lambda_initfloat, default 0.0: initialization of Lagrangian coefficient in the optimization of augmented Lagrangian
mu_initfloat, default 0.001: initialization of penalty coefficient in the optimization of augmented Lagrangian
omega_lambdafloat, default 0.0001: tolerance on the delta lambda, to find saddle points
omega_mufloat, default 0.9: check whether the constraint decreases sufficiently if it decreases at least (1-omega_mu) * h_prev
stop_crit_winint, default 100: number of iterations for updating values
edge_clamp_rangefloat, default 0.0001: threshold for keeping the edge (if during training)

Examples

Load data

>>> from castle.datasets import load_dataset
>>> data, true_dag, _ = load_dataset('IID_Test')

>>> gnd = GraNDAG(input_dim=data.shape[1])
>>> gnd.learn(data=data)

Also print GraN_DAG.model.adjacency with torch.Tensor type or print GranN_DAG.causal_matrix with numpy.ndarray.

>>> print(gnd.causal_matrix)
>>> print(gnd.model.adjacency)

learn(data, columns=None, **kwargs)[source]

Set up and run the Gran-DAG algorithm

Parameters

data: numpy.ndarray or Tensor: include Tensor.data
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

MCSL

class castle.algorithms.MCSL(model_type='nn', num_hidden_layers=4, hidden_dim=16, graph_thresh=0.5, l1_graph_penalty=0.002, learning_rate=0.03, max_iter=25, iter_step=1000, init_iter=2, h_tol=1e-10, init_rho=1e-05, rho_thresh=100000000000000.0, h_thresh=0.25, rho_multiply=10, temperature=0.2, device_type='cpu', device_ids='0', random_seed=1230)[source]

Masked Gradient-Based Causal Structure Learning

A gradient-based algorithm for non-linear additive noise data by learning the binary adjacency matrix.

Parameters

model_type: str, default: ‘nn’: nn denotes neural network, qr denotes quatratic regression.
num_hidden_layers: int, default: 4: Number of hidden layer in neural network when model_type is ‘nn’.
hidden_dim: int, default: 16: Number of hidden dimension in hidden layer, when model_type is ‘nn’.
graph_thresh: float, default: 0.5: Threshold used to determine whether has edge in graph, element greater than the graph_thresh means has a directed edge, otherwise has not.
l1_graph_penalty: float, default: 2e-3: Penalty weight for L1 normalization
learning_rate: float, default: 3e-2: learning rate for opitimizer
max_iter: int, default: 25: Number of iterations for optimization problem
iter_step: int, default: 1000: Number of steps for each iteration
init_iter: int, default: 2: Initial iteration to disallow early stopping
h_tol: float, default: 1e-10: Tolerance of optimization problem
init_rho: float, default: 1e-5: Initial value for penalty parameter.
rho_thresh: float, default: 1e14: Threshold for penalty parameter.
h_thresh: float, default: 0.25: Threshold for h
rho_multiply: float, default: 10.0: Multiplication to amplify rho each time
temperature: float, default: 0.2: Temperature for gumbel sigmoid
device_type: str, default: ‘cpu’: ‘cpu’ or ‘gpu’
device_ids: int or str, default ‘0’: CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.
random_seed: int, default: 1230: random seed for every random value

References

https://arxiv.org/abs/1910.08527

Examples

>>> from castle.algorithms import MCSL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> true_dag, X = load_dataset(name='iid_test')
>>> n = MCSL(iter_step=1000, rho_thres=1e14, init_rho=1e-5,
...          rho_multiply=10, graph_thres=0.5, l1_graph_penalty=2e-3)
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, columns=None, pns_mask=None, **kwargs) → None[source]

Set up and run the MCSL algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
columns: Index or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
pns_mask: array_like or None: The mask matrix. array with element in {0, 1}, 0 denotes has no edge in i -> j, 1 denotes maybe has edge in i -> j or not.

Notears

class castle.algorithms.Notears(lambda1=0.1, loss_type='l2', max_iter=100, h_tol=1e-08, rho_max=1e+16, w_threshold=0.3)[source]

Notears Algorithm. A gradient-based algorithm for linear data models (typically with least-squares loss).

Parameters

lambda1: float: l1 penalty parameter
loss_type: str: l2, logistic, poisson
max_iter: int: max num of dual ascent steps
h_tol: float: exit if |h(w_est)| <= htol
rho_max: float: exit if rho >= rho_max
w_threshold: float: drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray: Learned causal structure matrix

References

https://arxiv.org/abs/1803.01422

Examples

>>> from castle.algorithms import Notears
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = Notears()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, columns=None, **kwargs)[source]

Set up and run the Notears algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

notears_linear(X, lambda1, loss_type, max_iter, h_tol, rho_max)[source]

Solve min_W L(W; X) + lambda1 ‖W‖_1 s.t. h(W) = 0 using augmented Lagrangian.

Parameters

X: np.ndarray: n*d sample matrix

Return

W_est: np.ndarray: d*d estimated DAG

NotearsLowRank

class castle.algorithms.NotearsLowRank(w_init=None, max_iter=15, h_tol=1e-06, rho_max=1e+20, w_threshold=0.3)[source]

NotearsLowRank Algorithm. Adapting NOTEARS for large problems with low-rank causal graphs.

Parameters

w_init: None or numpy.ndarray: Initialized weight matrix
max_iter: int: Maximum number of iterations
h_tol: float: exit if |h(w)| <= h_tol
rho_max: float: maximum for rho
w_thresholdfloat, default=’0.3’: Drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray: Learned causal structure matrix

References

https://arxiv.org/abs/2006.05691

Examples

>>> import numpy as np
>>> from castle.algorithms import NotearsLowRank
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> rank = np.linalg.matrix_rank(true_dag)
>>> n = NotearsLowRank()
>>> n.learn(X, rank=rank)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, rank, columns=None, **kwargs)[source]

Set up and run the NotearsLowRank algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
rank: int: The algebraic rank of the weighted adjacency matrix of a graph.

notears_low_rank(X, rank, w_init=None)[source]

Solve min_W ell(W; X) s.t. h(W) = 0 using augmented Lagrangian.

Parameters

X: [n,d] sample matrix: max_iter: max number of dual ascent steps.
rank: int: The rank of data.
w_init: None or numpy.ndarray: Initialized weight matrix

Return

W_est: np.ndarray: estimate [d,d] dag matrix

GOLEM

class castle.algorithms.GOLEM(B_init=None, lambda_1=0.02, lambda_2=5.0, equal_variances=True, learning_rate=0.001, num_iter=100000.0, checkpoint_iter=5000, seed=1, graph_thres=0.3, device_type='cpu', device_ids=0)[source]

GOLEM Algorithm. A more efficient version of NOTEARS that can reduce number of optimization iterations.

Paramaters

B_init: None: File of weighted matrix for initialization. Set to None to disable.
lambda_1: float: Coefficient of L1 penalty.
lambda_2: float: Coefficient of DAG penalty.
equal_variances: bool: Assume equal noise variances for likelibood objective.
learning_rate: float: Learning rate of Adam optimizer.
num_iter: float: Number of iterations for training.
checkpoint_iter: int: Number of iterations between each checkpoint. Set to None to disable.
seed: int: Random seed.
graph_thres: float: Threshold for weighted matrix.
device_type: bool: whether to use GPU or not
device_ids: int: choose which gpu to use

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix (binary)

weight_causal_matrix: numpy.ndarray: Learned causal structure matrix (weighted)

References

https://arxiv.org/abs/2006.10201

Examples

>>> from castle.algorithms import GOLEM
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, topology_matrix = load_dataset(name='IID_Test')
>>> n = GOLEM()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, columns=None, **kwargs)[source]

Set up and run the GOLEM algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
X: numpy.ndarray: [n, d] data matrix.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

NotearsNonlinear

class castle.algorithms.NotearsNonlinear(lambda1: float = 0.01, lambda2: float = 0.01, max_iter: int = 100, h_tol: float = 1e-08, rho_max: float = 1e+16, w_threshold: float = 0.3, hidden_layers: tuple = (10, 1), expansions: int = 10, bias: bool = True, model_type: str = 'mlp', device_type: str = 'cpu', device_ids=None)[source]

Notears Nonlinear. include notears-mlp and notears-sob. A gradient-based algorithm using neural network or Sobolev space modeling for non-linear causal relationships.

Parameters

lambda1: float: l1 penalty parameter
lambda2: float: l2 penalty parameter
max_iter: int: max num of dual ascent steps
h_tol: float: exit if |h(w_est)| <= htol
rho_max: float: exit if rho >= rho_max
w_threshold: float: drop edge if |weight| < threshold
hidden_layers: Iterrable: Dimension of per hidden layer, and the last element must be 1 as output dimension. At least contains 2 elements. For example: hidden_layers=(5, 10, 1), denotes two hidden layer has 5 and 10 dimension and output layer has 1 dimension. It is effective when model_type=’mlp’.
expansions: int: expansions of each variable, it is effective when model_type=’sob’.
bias: bool: Indicates whether to use weight deviation.
model_type: str: The Choice of Two Nonlinear Network Models in a Notears Framework: Multilayer perceptrons value is ‘mlp’, Basis expansions value is ‘sob’.
device_type: str, default: cpu: cpu or gpu
device_ids: int or str, default None: CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

Attributes

causal_matrixnumpy.ndarray: Learned causal structure matrix

References

https://arxiv.org/abs/1909.13189

Examples

>>> from castle.algorithms import NotearsNonlinear
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = NotearsNonlinear()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

dual_ascent_step(model, X)[source]

Perform one step of dual ascent in augmented Lagrangian.

Parameters

model: nn.Module: network model
X: torch.tenser: sample data

Returns

:tuple: cycle control parameter

get_model(input_dim)[source]

Choose a different model.

Parameters

input_dim: int: Enter the number of data dimensions.

Returns

learn(data, columns=None, **kwargs)[source]

Set up and run the NotearsNonlinear algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

notears_nonlinear(model: Module, X: ndarray)[source]

notaears frame entrance.

Parameters

model: nn.Module: network model
X: castle.Tensor or numpy.ndarray: sample data

Returns

:tuple: Prediction Graph Matrix Coefficients.

PNL

class castle.algorithms.PNL(hidden_layers=1, hidden_units=10, batch_size=64, epochs=100, lr=0.0001, alpha=0.01, bias=True, activation=LeakyReLU(negative_slope=0.01), device_type='cpu', device_ids=None)[source]

On the Identifiability of the Post-Nonlinear Causal Model

References

https://arxiv.org/ftp/arxiv/papers/1205/1205.2599.pdf

Parameters

hidden_layers: int: number of hidden layer of mlp
hidden_units: int: number of unit of per hidden layer
batch_size: int: size of training batch
epochs: int: training times on all samples
lr: float: learning rate
alpha: float: significance level
bias: bool: whether use bias
activation: callable: nonlinear activation function
device_type: str: ‘cpu’ or ‘gpu’, default: ‘cpu’
device_ids: int or str: e.g. 0 or ‘0,1’, denotes which gpu that you want to use.

Examples

>>> from castle.algorithms.gradient.pnl.torch import PNL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = PNL()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

RL

class castle.algorithms.RL(encoder_type='TransformerEncoder', hidden_dim=64, num_heads=16, num_stacks=6, residual=False, decoder_type='SingleLayerDecoder', decoder_activation='tanh', decoder_hidden_dim=16, use_bias=False, use_bias_constant=False, bias_initial_value=False, batch_size=64, input_dimension=64, normalize=False, transpose=False, score_type='BIC', reg_type='LR', lambda_iter_num=1000, lambda_flag_default=True, score_bd_tight=False, lambda2_update=10, score_lower=0.0, score_upper=0.0, seed=8, nb_epoch=20000, lr1_start=0.001, lr1_decay_step=5000, lr1_decay_rate=0.96, alpha=0.99, init_baseline=-1.0, l1_graph_reg=0.0, verbose=False, device_type='cpu', device_ids=0)[source]

RL Algorithm. A RL-based algorithm that can work with flexible score functions (including non-smooth ones).

Parameters

encoder_type: str: type of encoder used
hidden_dim: int: actor LSTM num_neurons
num_heads: int: actor input embedding
num_stacks: int: actor LSTM num_neurons
residual: bool: whether to use residual for gat encoder
decoder_type: str: type of decoder used
decoder_activation: str: activation for decoder
decoder_hidden_dim: int: hidden dimension for decoder
use_bias: bool: Whether to add bias term when calculating decoder logits
use_bias_constant: bool: Whether to add bias term as CONSTANT when calculating decoder logits
bias_initial_value: float: Initial value for bias term when calculating decoder logits
batch_size: int: batch size for training
input_dimension: int: dimension of reshaped vector
normalize: bool: whether the inputdata shall be normalized
transpose: bool: whether the true graph needs transposed
score_type: str: score functions
reg_type: str: regressor type (in combination wth score_type)
lambda_iter_num: int: how often to update lambdas
lambda_flag_default: bool: with set lambda parameters; true with default strategy and ignore input bounds
score_bd_tight: bool: if bound is tight, then simply use a fixed value, rather than the adaptive one
lambda1_update: float: increasing additive lambda1
lambda2_update: float: increasing multiplying lambda2
score_lower: float: lower bound on lambda1
score_upper: float: upper bound on lambda1
lambda2_lower: float: lower bound on lambda2
lambda2_upper: float: upper bound on lambda2
seed: int: seed
nb_epoch: int: nb epoch
lr1_start: float: actor learning rate
lr1_decay_step: int: lr1 decay step
lr1_decay_rate: float: lr1 decay rate
alpha: float: update factor moving average baseline
init_baseline: float: initial baseline - REINFORCE
temperature: float: pointer_net initial temperature
C: float: pointer_net tan clipping
l1_graph_reg: float: L1 graph regularization to encourage sparsity
inference_mode: bool: switch to inference mode when model is trained
verbose: bool: print detailed logging or not
device_type: str: whether to use GPU or not
device_ids: int: choose which gpu to use

Attributes

causal_matrixnumpy.ndarray: Learned causal structure matrix

References

https://arxiv.org/abs/1906.04477

Examples

>>> from castle.algorithms import RL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = RL()
>>> n.learn(X, dag=true_dag)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

learn(data, columns=None, dag=None, **kwargs)[source]

Set up and run the RL algorithm.

Parameters

data: castle.Tensor or numpy.ndarray: The castle.Tensor or numpy.ndarray format data you want to learn.
columnsIndex or array-like: Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
dagndarray: two-dimensional, prior matrix