algorithms

ANMNonlinear

class castle.algorithms.ANMNonlinear(alpha=0.05)[source]

Nonlinear causal discovery with additive noise models

Use GPML with Gaussian kernel and independent Gaussian noise, optimizing the hyper-parameters for each regression individually. For the independence test, we implemented the HSIC with a Gaussian kernel, where we used the gamma distribution as an approximation for the distribution of the HSIC under the null hypothesis of independence in order to calculate the p-value of the test result.

References

Hoyer, Patrik O and Janzing, Dominik and Mooij, Joris M and Peters, Jonas and Schölkopf, Bernhard, “Nonlinear causal discovery with additive noise models”, NIPS 2009

Parameters

alphafloat, default 0.05

significance level be used to compute threshold

Attributes

causal_matrixarray like shape of (n_features, n_features)

Learned causal structure matrix.

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import DAG, IIDSimulation
>>> from castle.algorithms.anm import ANMNonlinear
>>> weighted_random_dag = DAG.erdos_renyi(n_nodes=6, n_edges=10,
>>>                                      weight_range=(0.5, 2.0), seed=1)
>>> dataset = IIDSimulation(W=weighted_random_dag, n=1000,
>>>                         method='nonlinear', sem_type='gp-add')
>>> true_dag, X = dataset.B, dataset.X
>>> anm = ANMNonlinear(alpha=0.05)
>>> anm.learn(data=X)
>>> # plot predict_dag and true_dag
>>> GraphDAG(anm.causal_matrix, true_dag, show=False, save_name='result')

you can also provide more parameters to use it. like the flowing: >>> from sklearn.gaussian_process.kernels import Matern, RBF >>> kernel = Matern(nu=1.5) >>> # kernel = 1.0 * RBF(1.0) >>> anm = ANMNonlinear(alpha=0.05) >>> anm.learn(data=X, regressor=GPR(kernel=kernel)) >>> # plot predict_dag and true_dag >>> GraphDAG(anm.causal_matrix, true_dag, show=False, save_name=’result’)

anm_estimate(x, y, regressor=<castle.algorithms.anm._anm.GPR object>, test_method=<function hsic_test>)[source]

Compute the fitness score of the ANM model in the x->y direction.

Parameters

x: array

Variable seen as cause

y: array

Variable seen as effect

regressor: Class

Nonlinear regression estimator, if not provided, it is GPR. If user defined, must implement estimate method. such as :

regressor.estimate(x, y)

test_method: callable, default test_method

independence test method, if not provided, it is HSIC. If user defined, must accept three arguments–x, y and keyword argument–alpha. such as :

test_method(x, y, alpha=0.05)

Returns

out: int, 0 or 1

If 1, residuals n is independent of x, then accept x –> y If 0, residuals n is not independent of x, then reject x –> y

Examples

>>> import numpy as np
>>> from castle.algorithms.anm import ANMNonlinear
>>> np.random.seed(1)
>>> x = np.random.rand(500, 2)
>>> anm = ANMNonlinear(alpha=0.05)
>>> print(anm.anm_estimate(x[:, [0]], x[:, [1]]))
1
learn(data, columns=None, regressor=<castle.algorithms.anm._anm.GPR object>, test_method=<function hsic_test>, **kwargs)[source]

Set up and run the ANM_Nonlinear algorithm.

Parameters

data: numpy.ndarray or Tensor

Training data.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

regressor: Class

Nonlinear regression estimator, if not provided, it is GPR. If user defined, must implement estimate method. such as :

regressor.estimate(x, y)

test_method: callable, default test_method

independence test method, if not provided, it is HSIC. If user defined, must accept three arguments–x, y and keyword argument–alpha. such as :

test_method(x, y, alpha=0.05)

GES

class castle.algorithms.GES(criterion='bic', method='scatter', k=0.001, N=10)[source]

Greedy equivalence search for causal discovering

References

[1]: https://www.sciencedirect.com/science/article/pii/S0888613X12001636 [2]: https://www.jmlr.org/papers/volume3/chickering02b/chickering02b.pdf

Parameters

criterion: str for DecomposableScore object

scoring criterion, one of [‘bic’, ‘bdeu’].

Notes:
  1. ‘bdeu’ just for discrete variable.

2. if you want to customize criterion, you must create a class and inherit the base class DecomposableScore in module ges.score.local_scores

method: str

effective when criterion=’bic’, one of [‘r2’, ‘scatter’].

k: float, default: 0.001

structure prior, effective when criterion=’bdeu’.

N: int, default: 10

prior equivalent sample size, effective when criterion=’bdeu’

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import load_dataset
>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> algo = GES()
>>> algo.learn(X)
>>> GraphDAG(algo.causal_matrix, true_dag, save_name='result_pc')
>>> met = MetricsDAG(algo.causal_matrix, true_dag)
>>> print(met.metrics)

DirectLiNGAM

class castle.algorithms.DirectLiNGAM(prior_knowledge=None, measure='pwling', thresh=0.3)[source]

DirectLiNGAM Algorithm. A direct learning algorithm for linear non-Gaussian acyclic model (LiNGAM). Implementation of DirectLiNGAM Algorithm [1]_ [2], Construct a DirectLiNGAM model.

Parameters

prior_knowledgearray-like, shape (n_features, n_features), optional (default=None)

Prior knowledge used for causal discovery, where n_features is the number of features.

The elements of prior knowledge matrix are defined as follows [1]_:

  • 0 : \(x_i\) does not have a directed path to \(x_j\)

  • 1 : \(x_i\) has a directed path to \(x_j\)

  • -1 : No prior knowledge is available to know if either of the two cases above (0 or 1) is true.

measure{‘pwling’, ‘kernel’}, default=’pwling’

Measure to evaluate independence: ‘pwling’ [2] or ‘kernel’ [1]_.

threshfloat, default=’0.3’

Drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix.

weight_causal_matrix: numpy.ndarray

Learned weighted causal structure matrix.

References

Examples

>>> from castle.algorithms import DirectLiNGAM
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> n = DirectLiNGAM()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
fit(X)[source]

Fit the model to X.

Parameters

Xarray-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

Returns

selfobject

Returns the instance itself.

learn(data, columns=None, **kwargs)[source]

Set up and run the DirectLiNGAM algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

ICALiNGAM

class castle.algorithms.ICALiNGAM(random_state=None, max_iter=1000, thresh=0.3)[source]

ICALiNGAM Algorithm. An ICA-based learning algorithm for linear non-Gaussian acyclic model (LiNGAM). Implementation of ICA-based LiNGAM Algorithm [1]_, Construct a ICA-based LiNGAM model.

Parameters

random_stateint, optional (default=None)

random_state is the seed used by the random number generator.

max_iterint, optional (default=1000)

The maximum number of iterations of FastICA.

threshfloat, default=’0.3’

Drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix

weight_causal_matrix: numpy.ndarray

Learned weighted causal structure matrix.

References

Examples

>>> from castle.algorithms import ICALiNGAM
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> n = ICALiNGAM()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
fit(X)[source]

Fit the model to X.

Parameters

Xarray-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

Returns

selfobject

Returns the instance of self.

learn(data, columns=None)[source]

Set up and run the ICALiNGAM algorithm.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

PC

class castle.algorithms.PC(variant='original', alpha=0.05, ci_test='fisherz', priori_knowledge=None)[source]

PC algorithm

A classic causal discovery algorithm based on conditional independence tests.

References

[1] original-PC

https://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf

[2] stable-PC

https://arxiv.org/pdf/1211.3295.pdf

[3] parallel-PC

https://arxiv.org/pdf/1502.02454.pdf

Parameters

variantstr

A variant of PC-algorithm, one of [original, stable, parallel].

alpha: float, default 0.05

Significance level.

ci_teststr, callable

ci_test method, if str, must be one of [fisherz, g2, chi2] See more: castle.common.independence_tests.CITest

priori_knowledge: PrioriKnowledge

a class object PrioriKnowledge

Attributes

causal_matrixarray

Learned causal structure matrix.

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import load_dataset
>>> X, true_dag, _ = load_dataset(name='IID_Test')
>>> pc = PC(variant='stable')
>>> pc.learn(X)
>>> GraphDAG(pc.causal_matrix, true_dag, save_name='result_pc')
>>> met = MetricsDAG(pc.causal_matrix, true_dag)
>>> print(met.metrics)
>>> pc = PC(variant='parallel')
>>> pc.learn(X, p_cores=2)
>>> GraphDAG(pc.causal_matrix, true_dag, save_name='result_pc')
>>> met = MetricsDAG(pc.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, columns=None, **kwargs)[source]

Set up and run the PC algorithm.

Parameters

data: array or Tensor

Training data

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

kwargs: [optional]
p_coresint

number of CPU cores to be used

sboolean

memory-efficient indicator

batchint

number of edges per batch

if s is None or False, or without batch, batch=|J|. |J| denote number of all pairs of adjacency vertices (X, Y) in G.

TTPM

class castle.algorithms.TTPM(topology_matrix, delta=0.1, epsilon=1, max_hop=0, penalty='BIC', max_iter=20, priori_knowledge=None)[source]

TTPM Algorithm.

A causal structure learning algorithm based on Topological Hawkes process

for spatio-temporal event sequences.

Parameters

topology_matrix: np.matrix

Interpreted as an adjacency matrix to generate the graph. It should have two dimensions, and should be square.

delta: float, default=0.1

Time decaying coefficient for the exponential kernel.

epsilon: int, default=1

BIC penalty coefficient.

max_hop: positive int, default=6

The maximum considered hops in the topology, when max_hop=0, it is divided by nodes, regardless of topology.

penalty: str, default=BIC

Two optional values: ‘BIC’ or ‘AIC’.

max_iter: int

Maximum number of iterations.

priori_knowledge: PrioriKnowledge, default=None

a class object PrioriKnowledge

Examples

>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> from castle.datasets import load_dataset
>>> from castle.algorithms import TTPM
# Data Simulation for TTPM
>>> X, true_causal_matrix, topology_matrix = load_dataset('THP_Test')
>>> ttpm = TTPM(topology_matrix, max_hop=2)
>>> ttpm.learn(X)
>>> causal_matrix = ttpm.causal_matrix
# plot est_dag and true_dag
>>> GraphDAG(ttpm.causal_matrix, true_causal_matrix)
# calculate accuracy
>>> ret_metrix = MetricsDAG(ttpm.causal_matrix, true_causal_matrix)
>>> ret_metrix.metrics
learn(tensor, *args, **kwargs)[source]

Set up and run the TTPM algorithm.

Parameters

tensor: pandas.DataFrame

(V 1.0.0, we’ll eliminate this constraint in the next version) The tensor is supposed to contain three cols:

[‘event’, ‘timestamp’, ‘node’]

Description of the three columns:

event: event name (type). timestamp: occurrence timestamp of event, i.e., ‘1615962101.0’. node: topological node where the event happened.

CORL

class castle.algorithms.CORL(batch_size=64, input_dim=100, embed_dim=256, normalize=False, encoder_name='transformer', encoder_heads=8, encoder_blocks=3, encoder_dropout_rate=0.1, decoder_name='lstm', reward_mode='episodic', reward_score_type='BIC', reward_regression_type='LR', reward_gpr_alpha=1.0, iteration=10000, lambda_iter_num=500, actor_lr=0.0001, critic_lr=0.001, alpha=0.99, init_baseline=-1.0, random_seed=0, device_type='cpu', device_ids=0)[source]

Causal discovery with Ordering-based Reinforcement Learning

A RL- and order-based algorithm that improves the efficiency and scalability of previous RL-based approach, contains CORL1 with episodic reward type and CORL2 with dense reward type``.

References

https://arxiv.org/abs/2105.06631

Parameters

batch_size: int, default: 64

training batch size

input_dim: int, default: 64

dimension of input data

embed_dim: int, default: 256

dimension of embedding layer output

normalize: bool, default: False

whether normalization for input data

encoder_name: str, default: ‘transformer’

Encoder name, must be one of [‘transformer’, ‘lstm’, ‘mlp’]

encoder_heads: int, default: 8

number of multi-head of transformer Encoder.

encoder_blocks: int, default: 3

blocks number of Encoder

encoder_dropout_rate: float, default: 0.1

dropout rate for encoder

decoder_name: str, default: ‘lstm’

Decoder name, must be one of [‘lstm’, ‘mlp’]

reward_mode: str, default: ‘episodic’

reward mode, ‘episodic’ or ‘dense’, ‘episodic’ denotes episodic-reward, ‘dense’ denotes dense-reward.

reward_score_type: str, default: ‘BIC’

type of score function

reward_regression_type: str, default: ‘LR’

type of regression function, must be one of [‘LR’, ‘QR’]

reward_gpr_alpha: float, default: 1.0

alpha of GPR

iteration: int, default: 5000

training times

actor_lr: float, default: 1e-4

learning rate of Actor network, includes encoder and decoder.

critic_lr: float, default: 1e-3

learning rate of Critic network

alpha: float, default: 0.99

alpha for score function, includes dense_actor_loss and dense_critic_loss.

init_baseline: float, default: -1.0

initilization baseline for score function, includes dense_actor_loss and dense_critic_loss.

random_seed: int, default: 0

random seed for all random process

device_type: str, default: cpu

cpu or gpu

device_ids: int or str, default None

CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

Examples

>>> from castle.algorithms.gradient.corl.torch import CORL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = CORL()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, columns=None, **kwargs) None[source]

Set up and run the Causal discovery with Ordering-based Reinforcement Learning algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

Other Parameters:
dag_maskndarray

two-dimensional array with [0, 1], shape = [n_nodes, n_nodes]. (i, j) indicated element 0 denotes there must be no edge between nodes i and j , the element 1 indicates that there may or may not be an edge.

DAG_GNN

class castle.algorithms.DAG_GNN(encoder_type='mlp', decoder_type='mlp', encoder_hidden=64, latent_dim=None, decoder_hidden=64, encoder_dropout=0.0, decoder_dropout=0.0, epochs=300, k_max_iter=100.0, tau_a=0.0, batch_size=100, lr=0.003, lr_decay=200, gamma=1.0, init_lambda_a=0.0, init_c_a=1.0, c_a_thresh=1e+20, eta=10, multiply_h=0.25, h_tolerance=1e-08, use_a_connect_loss=False, use_a_positiver_loss=False, graph_threshold=0.3, optimizer='adam', seed=42, device_type='cpu', device_ids='0')[source]

DAG Structure Learning with Graph Neural Networks

References

https://arxiv.org/pdf/1904.10098.pdf

Parameters

encoder_type: str, default: ‘mlp’

choose an encoder, ‘mlp’ or ‘sem’.

decoder_type: str, detault: ‘mlp’

choose a decoder, ‘mlp’ or ‘sem’.

encoder_hidden: int, default: 64

MLP encoder hidden layer dimension, just one hidden layer.

latent_dim: int, default equal to input dimension

encoder output dimension

decoder_hidden: int, default: 64

MLP decoder hidden layer dimension, just one hidden layer.

encoder_dropout: float, default: 0.0

Dropout rate (1 - keep probability).

decoder_dropout: float, default: 0.0

Dropout rate (1 - keep probability).

epochs: int, default: 300

train epochs

k_max_iter: int, default: 1e2

the max iteration number for searching lambda and c.

batch_size: int, default: 100

Sample size of each training batch

lr: float, default: 3e-3

learning rate

lr_decay: int, default: 200

Period of learning rate decay.

gamma: float, default: 1.0

Multiplicative factor of learning rate decay.

lambda_a: float, default: 0.0

coefficient for DAG constraint h(A).

c_a: float, default: 1.0

coefficient for absolute value h(A).

c_a_thresh: float, default: 1e20

control loop by c_a

eta: int, default: 10

use for update c_a, greater equal than 1.

multiply_h: float, default: 0.25

use for judge whether update c_a.

tau_a: float, default: 0.0

coefficient for L-1 norm of A.

h_tolerance: float, default: 1e-8

the tolerance of error of h(A) to zero.

use_a_connect_loss: bool, default: False

flag to use A connect loss

use_a_positiver_loss: bool, default: False

flag to enforce A must have positive values

graph_threshold: float, default: 0.3

threshold for learned adjacency matrix binarization. greater equal to graph_threshold denotes has causal relationship.

optimizer: str, default: ‘Adam’

choose optimizer, ‘Adam’ or ‘SGD’

seed: int, default: 42

random seed

device_type: str, default: cpu

cpu or gpu

device_ids: int or str, default None

CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

Examples

>>> from castle.algorithms.gradient.dag_gnn.torch import DAG_GNN
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> m = DAG_GNN()
>>> m.learn(X)
>>> GraphDAG(m.causal_matrix, true_dag)
>>> met = MetricsDAG(m.causal_matrix, true_dag)
>>> print(met.metrics)

GAE

class castle.algorithms.GAE(input_dim=1, hidden_layers=1, hidden_dim=4, activation=LeakyReLU(negative_slope=0.05), epochs=10, update_freq=3000, init_iter=3, lr=0.001, alpha=0.0, beta=2.0, init_rho=1.0, rho_thresh=1e+30, gamma=0.25, penalty_lambda=0.0, h_thresh=1e-08, graph_thresh=0.3, early_stopping=False, early_stopping_thresh=1.0, seed=1230, device_type='cpu', device_ids='0')[source]

GAE Algorithm. A gradient-based algorithm using graph autoencoder to model non-linear causal relationships.

Parameters

input_dim: int, default: 1

dimension of vector for x

hidden_layers: int, default: 1

number of hidden layers for encoder and decoder

hidden_dim: int, default: 4

hidden size for mlp layer

activation: callable, default: nn.LeakyReLU(0.05)

nonlinear functional

epochs: int, default: 10

Number of iterations for optimization problem

update_freq: int, default: 3000

Number of steps for each iteration

init_iter: int, default: 3

Initial iteration to disallow early stopping

lr: float, default: 1e-3

learning rate

alpha: float, default: 0.0

Lagrange multiplier

beta: float, default: 2.0

Multiplication to amplify rho each time

init_rho: float, default: 1.0

Initial value for rho

rho_thresh: float, default: 1e30

Threshold for rho

gamma: float, default: 0.25

Threshold for h

penalty_lambda: float, default: 0.0

L1 penalty for sparse graph. Set to 0.0 to disable

h_thresh: float, default: 1e-8

Tolerance of optimization problem

graph_thresh: float, default: 0.3

Threshold to filter out small values in the graph

early_stopping: bool, default: False

Whether to use early stopping

early_stopping_thresh: float, default: 1.0

Threshold ratio for early stopping

seed: int, default: 1230

Reproducibility, must be int

device_type: str, default: ‘cpu’

‘cpu’ or ‘gpu’

device_ids: int or str, default ‘0’

CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

GraNDAG

class castle.algorithms.GraNDAG(input_dim, hidden_num=2, hidden_dim=10, batch_size=64, lr=0.001, iterations=10000, model_name='NonLinGaussANM', nonlinear='leaky-relu', optimizer='rmsprop', h_threshold=1e-08, device_type='cpu', device_ids='0', use_pns=False, pns_thresh=0.75, num_neighbors=None, normalize=False, precision=False, random_seed=42, jac_thresh=True, lambda_init=0.0, mu_init=0.001, omega_lambda=0.0001, omega_mu=0.9, stop_crit_win=100, edge_clamp_range=0.0001, norm_prod='paths', square_prod=False)[source]

Gradient Based Neural DAG Learner

A gradient-based algorithm using neural network modeling for non-linear additive noise data

References: https://arxiv.org/pdf/1906.02226.pdf

Parameters

input_dimint

number of input layer, must be int

hidden_numint, default 2

number of hidden layers

hidden_dimint, default 10

number of dimension per hidden layer

batch_sizeint, default 64

batch size of per training of NN

lrfloat, default 0.001

learning rate

iterationsint, default 10000

times of iteration

model_namestr, default ‘NonLinGaussANM’

name of model, ‘NonLinGauss’ or ‘NonLinGaussANM’

nonlinearstr, default ‘leaky-relu’

name of Nonlinear activation function, ‘sigmoid’ or ‘leaky-relu’

optimizerstr, default ‘rmsprop’

Method of optimize, rmsprop or sgd

h_thresholdfloat, default 1e-8

constrained threshold

device_typestr, default ‘cpu’

use gpu or cpu

use_pnsbool, default False

whether use pns before training, if nodes > 50, use it.

pns_threshfloat, default 0.75

threshold for feature importance score in pns

num_neighborsint, default None

number of potential parents for each variables

normalizebool, default False

whether normalize data

precisionbool, default False

whether use Double precision if True, use torch.FloatTensor; if False, use torch.DoubleTensor

random_seedint, default 42

random seed

norm_prodstr, default ‘paths’

use norm of product of paths, ‘none’ or ‘paths’ ‘paths’: use norm, ‘none’: with no norm

square_prodbool, default False

use squared product of paths

jac_threshbool, default True

get the average Jacobian with the trained model

lambda_initfloat, default 0.0

initialization of Lagrangian coefficient in the optimization of augmented Lagrangian

mu_initfloat, default 0.001

initialization of penalty coefficient in the optimization of augmented Lagrangian

omega_lambdafloat, default 0.0001

tolerance on the delta lambda, to find saddle points

omega_mufloat, default 0.9

check whether the constraint decreases sufficiently if it decreases at least (1-omega_mu) * h_prev

stop_crit_winint, default 100

number of iterations for updating values

edge_clamp_rangefloat, default 0.0001

threshold for keeping the edge (if during training)

Examples

Load data

>>> from castle.datasets import load_dataset
>>> data, true_dag, _ = load_dataset('IID_Test')
>>> gnd = GraNDAG(input_dim=data.shape[1])
>>> gnd.learn(data=data)

Also print GraN_DAG.model.adjacency with torch.Tensor type or print GranN_DAG.causal_matrix with numpy.ndarray.

>>> print(gnd.causal_matrix)
>>> print(gnd.model.adjacency)
learn(data, columns=None, **kwargs)[source]

Set up and run the Gran-DAG algorithm

Parameters

data: numpy.ndarray or Tensor

include Tensor.data

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

MCSL

class castle.algorithms.MCSL(model_type='nn', num_hidden_layers=4, hidden_dim=16, graph_thresh=0.5, l1_graph_penalty=0.002, learning_rate=0.03, max_iter=25, iter_step=1000, init_iter=2, h_tol=1e-10, init_rho=1e-05, rho_thresh=100000000000000.0, h_thresh=0.25, rho_multiply=10, temperature=0.2, device_type='cpu', device_ids='0', random_seed=1230)[source]

Masked Gradient-Based Causal Structure Learning

A gradient-based algorithm for non-linear additive noise data by learning the binary adjacency matrix.

Parameters

model_type: str, default: ‘nn’

nn denotes neural network, qr denotes quatratic regression.

num_hidden_layers: int, default: 4

Number of hidden layer in neural network when model_type is ‘nn’.

hidden_dim: int, default: 16

Number of hidden dimension in hidden layer, when model_type is ‘nn’.

graph_thresh: float, default: 0.5

Threshold used to determine whether has edge in graph, element greater than the graph_thresh means has a directed edge, otherwise has not.

l1_graph_penalty: float, default: 2e-3

Penalty weight for L1 normalization

learning_rate: float, default: 3e-2

learning rate for opitimizer

max_iter: int, default: 25

Number of iterations for optimization problem

iter_step: int, default: 1000

Number of steps for each iteration

init_iter: int, default: 2

Initial iteration to disallow early stopping

h_tol: float, default: 1e-10

Tolerance of optimization problem

init_rho: float, default: 1e-5

Initial value for penalty parameter.

rho_thresh: float, default: 1e14

Threshold for penalty parameter.

h_thresh: float, default: 0.25

Threshold for h

rho_multiply: float, default: 10.0

Multiplication to amplify rho each time

temperature: float, default: 0.2

Temperature for gumbel sigmoid

device_type: str, default: ‘cpu’

‘cpu’ or ‘gpu’

device_ids: int or str, default ‘0’

CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

random_seed: int, default: 1230

random seed for every random value

References

https://arxiv.org/abs/1910.08527

Examples

>>> from castle.algorithms import MCSL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> true_dag, X = load_dataset(name='iid_test')
>>> n = MCSL(iter_step=1000, rho_thres=1e14, init_rho=1e-5,
...          rho_multiply=10, graph_thres=0.5, l1_graph_penalty=2e-3)
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, columns=None, pns_mask=None, **kwargs) None[source]

Set up and run the MCSL algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columns: Index or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

pns_mask: array_like or None

The mask matrix. array with element in {0, 1}, 0 denotes has no edge in i -> j, 1 denotes maybe has edge in i -> j or not.

Notears

class castle.algorithms.Notears(lambda1=0.1, loss_type='l2', max_iter=100, h_tol=1e-08, rho_max=1e+16, w_threshold=0.3)[source]

Notears Algorithm. A gradient-based algorithm for linear data models (typically with least-squares loss).

Parameters

lambda1: float

l1 penalty parameter

loss_type: str

l2, logistic, poisson

max_iter: int

max num of dual ascent steps

h_tol: float

exit if |h(w_est)| <= htol

rho_max: float

exit if rho >= rho_max

w_threshold: float

drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix

References

https://arxiv.org/abs/1803.01422

Examples

>>> from castle.algorithms import Notears
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = Notears()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, columns=None, **kwargs)[source]

Set up and run the Notears algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

notears_linear(X, lambda1, loss_type, max_iter, h_tol, rho_max)[source]

Solve min_W L(W; X) + lambda1 ‖W‖_1 s.t. h(W) = 0 using augmented Lagrangian.

Parameters

X: np.ndarray

n*d sample matrix

Return

W_est: np.ndarray

d*d estimated DAG

NotearsLowRank

class castle.algorithms.NotearsLowRank(w_init=None, max_iter=15, h_tol=1e-06, rho_max=1e+20, w_threshold=0.3)[source]

NotearsLowRank Algorithm. Adapting NOTEARS for large problems with low-rank causal graphs.

Parameters

w_init: None or numpy.ndarray

Initialized weight matrix

max_iter: int

Maximum number of iterations

h_tol: float

exit if |h(w)| <= h_tol

rho_max: float

maximum for rho

w_thresholdfloat, default=’0.3’

Drop edge if |weight| < threshold

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix

References

https://arxiv.org/abs/2006.05691

Examples

>>> import numpy as np
>>> from castle.algorithms import NotearsLowRank
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> rank = np.linalg.matrix_rank(true_dag)
>>> n = NotearsLowRank()
>>> n.learn(X, rank=rank)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, rank, columns=None, **kwargs)[source]

Set up and run the NotearsLowRank algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

rank: int

The algebraic rank of the weighted adjacency matrix of a graph.

notears_low_rank(X, rank, w_init=None)[source]

Solve min_W ell(W; X) s.t. h(W) = 0 using augmented Lagrangian.

Parameters

X: [n,d] sample matrix

max_iter: max number of dual ascent steps.

rank: int

The rank of data.

w_init: None or numpy.ndarray

Initialized weight matrix

Return

W_est: np.ndarray

estimate [d,d] dag matrix

GOLEM

class castle.algorithms.GOLEM(B_init=None, lambda_1=0.02, lambda_2=5.0, equal_variances=True, learning_rate=0.001, num_iter=100000.0, checkpoint_iter=5000, seed=1, graph_thres=0.3, device_type='cpu', device_ids=0)[source]

GOLEM Algorithm. A more efficient version of NOTEARS that can reduce number of optimization iterations.

Paramaters

B_init: None

File of weighted matrix for initialization. Set to None to disable.

lambda_1: float

Coefficient of L1 penalty.

lambda_2: float

Coefficient of DAG penalty.

equal_variances: bool

Assume equal noise variances for likelibood objective.

learning_rate: float

Learning rate of Adam optimizer.

num_iter: float

Number of iterations for training.

checkpoint_iter: int

Number of iterations between each checkpoint. Set to None to disable.

seed: int

Random seed.

graph_thres: float

Threshold for weighted matrix.

device_type: bool

whether to use GPU or not

device_ids: int

choose which gpu to use

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix (binary)

weight_causal_matrix: numpy.ndarray

Learned causal structure matrix (weighted)

References

https://arxiv.org/abs/2006.10201

Examples

>>> from castle.algorithms import GOLEM
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, topology_matrix = load_dataset(name='IID_Test')
>>> n = GOLEM()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, columns=None, **kwargs)[source]

Set up and run the GOLEM algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

X: numpy.ndarray

[n, d] data matrix.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

NotearsNonlinear

class castle.algorithms.NotearsNonlinear(lambda1: float = 0.01, lambda2: float = 0.01, max_iter: int = 100, h_tol: float = 1e-08, rho_max: float = 1e+16, w_threshold: float = 0.3, hidden_layers: tuple = (10, 1), expansions: int = 10, bias: bool = True, model_type: str = 'mlp', device_type: str = 'cpu', device_ids=None)[source]

Notears Nonlinear. include notears-mlp and notears-sob. A gradient-based algorithm using neural network or Sobolev space modeling for non-linear causal relationships.

Parameters

lambda1: float

l1 penalty parameter

lambda2: float

l2 penalty parameter

max_iter: int

max num of dual ascent steps

h_tol: float

exit if |h(w_est)| <= htol

rho_max: float

exit if rho >= rho_max

w_threshold: float

drop edge if |weight| < threshold

hidden_layers: Iterrable

Dimension of per hidden layer, and the last element must be 1 as output dimension. At least contains 2 elements. For example: hidden_layers=(5, 10, 1), denotes two hidden layer has 5 and 10 dimension and output layer has 1 dimension. It is effective when model_type=’mlp’.

expansions: int

expansions of each variable, it is effective when model_type=’sob’.

bias: bool

Indicates whether to use weight deviation.

model_type: str

The Choice of Two Nonlinear Network Models in a Notears Framework: Multilayer perceptrons value is ‘mlp’, Basis expansions value is ‘sob’.

device_type: str, default: cpu

cpu or gpu

device_ids: int or str, default None

CUDA devices, it’s effective when use_gpu is True. For single-device modules, device_ids can be int or str, e.g. 0 or ‘0’, For multi-device modules, device_ids must be str, format like ‘0, 1’.

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix

References

https://arxiv.org/abs/1909.13189

Examples

>>> from castle.algorithms import NotearsNonlinear
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = NotearsNonlinear()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
dual_ascent_step(model, X)[source]

Perform one step of dual ascent in augmented Lagrangian.

Parameters

model: nn.Module

network model

X: torch.tenser

sample data

Returns

:tuple

cycle control parameter

get_model(input_dim)[source]

Choose a different model.

Parameters

input_dim: int

Enter the number of data dimensions.

Returns

learn(data, columns=None, **kwargs)[source]

Set up and run the NotearsNonlinear algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

notears_nonlinear(model: Module, X: ndarray)[source]

notaears frame entrance.

Parameters

model: nn.Module

network model

X: castle.Tensor or numpy.ndarray

sample data

Returns

:tuple

Prediction Graph Matrix Coefficients.

PNL

class castle.algorithms.PNL(hidden_layers=1, hidden_units=10, batch_size=64, epochs=100, lr=0.0001, alpha=0.01, bias=True, activation=LeakyReLU(negative_slope=0.01), device_type='cpu', device_ids=None)[source]

On the Identifiability of the Post-Nonlinear Causal Model

References

https://arxiv.org/ftp/arxiv/papers/1205/1205.2599.pdf

Parameters

hidden_layers: int

number of hidden layer of mlp

hidden_units: int

number of unit of per hidden layer

batch_size: int

size of training batch

epochs: int

training times on all samples

lr: float

learning rate

alpha: float

significance level

bias: bool

whether use bias

activation: callable

nonlinear activation function

device_type: str

‘cpu’ or ‘gpu’, default: ‘cpu’

device_ids: int or str

e.g. 0 or ‘0,1’, denotes which gpu that you want to use.

Examples

>>> from castle.algorithms.gradient.pnl.torch import PNL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = PNL()
>>> n.learn(X)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)

RL

class castle.algorithms.RL(encoder_type='TransformerEncoder', hidden_dim=64, num_heads=16, num_stacks=6, residual=False, decoder_type='SingleLayerDecoder', decoder_activation='tanh', decoder_hidden_dim=16, use_bias=False, use_bias_constant=False, bias_initial_value=False, batch_size=64, input_dimension=64, normalize=False, transpose=False, score_type='BIC', reg_type='LR', lambda_iter_num=1000, lambda_flag_default=True, score_bd_tight=False, lambda2_update=10, score_lower=0.0, score_upper=0.0, seed=8, nb_epoch=20000, lr1_start=0.001, lr1_decay_step=5000, lr1_decay_rate=0.96, alpha=0.99, init_baseline=-1.0, l1_graph_reg=0.0, verbose=False, device_type='cpu', device_ids=0)[source]

RL Algorithm. A RL-based algorithm that can work with flexible score functions (including non-smooth ones).

Parameters

encoder_type: str

type of encoder used

hidden_dim: int

actor LSTM num_neurons

num_heads: int

actor input embedding

num_stacks: int

actor LSTM num_neurons

residual: bool

whether to use residual for gat encoder

decoder_type: str

type of decoder used

decoder_activation: str

activation for decoder

decoder_hidden_dim: int

hidden dimension for decoder

use_bias: bool

Whether to add bias term when calculating decoder logits

use_bias_constant: bool

Whether to add bias term as CONSTANT when calculating decoder logits

bias_initial_value: float

Initial value for bias term when calculating decoder logits

batch_size: int

batch size for training

input_dimension: int

dimension of reshaped vector

normalize: bool

whether the inputdata shall be normalized

transpose: bool

whether the true graph needs transposed

score_type: str

score functions

reg_type: str

regressor type (in combination wth score_type)

lambda_iter_num: int

how often to update lambdas

lambda_flag_default: bool

with set lambda parameters; true with default strategy and ignore input bounds

score_bd_tight: bool

if bound is tight, then simply use a fixed value, rather than the adaptive one

lambda1_update: float

increasing additive lambda1

lambda2_update: float

increasing multiplying lambda2

score_lower: float

lower bound on lambda1

score_upper: float

upper bound on lambda1

lambda2_lower: float

lower bound on lambda2

lambda2_upper: float

upper bound on lambda2

seed: int

seed

nb_epoch: int

nb epoch

lr1_start: float

actor learning rate

lr1_decay_step: int

lr1 decay step

lr1_decay_rate: float

lr1 decay rate

alpha: float

update factor moving average baseline

init_baseline: float

initial baseline - REINFORCE

temperature: float

pointer_net initial temperature

C: float

pointer_net tan clipping

l1_graph_reg: float

L1 graph regularization to encourage sparsity

inference_mode: bool

switch to inference mode when model is trained

verbose: bool

print detailed logging or not

device_type: str

whether to use GPU or not

device_ids: int

choose which gpu to use

Attributes

causal_matrixnumpy.ndarray

Learned causal structure matrix

References

https://arxiv.org/abs/1906.04477

Examples

>>> from castle.algorithms import RL
>>> from castle.datasets import load_dataset
>>> from castle.common import GraphDAG
>>> from castle.metrics import MetricsDAG
>>> X, true_dag, _ = load_dataset('IID_Test')
>>> n = RL()
>>> n.learn(X, dag=true_dag)
>>> GraphDAG(n.causal_matrix, true_dag)
>>> met = MetricsDAG(n.causal_matrix, true_dag)
>>> print(met.metrics)
learn(data, columns=None, dag=None, **kwargs)[source]

Set up and run the RL algorithm.

Parameters

data: castle.Tensor or numpy.ndarray

The castle.Tensor or numpy.ndarray format data you want to learn.

columnsIndex or array-like

Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

dagndarray

two-dimensional, prior matrix