algorithms
ANMNonlinear
- class castle.algorithms.ANMNonlinear(alpha=0.05)[source]
Nonlinear causal discovery with additive noise models
Use GPML with Gaussian kernel and independent Gaussian noise, optimizing the hyper-parameters for each regression individually. For the independence test, we implemented the HSIC with a Gaussian kernel, where we used the gamma distribution as an approximation for the distribution of the HSIC under the null hypothesis of independence in order to calculate the p-value of the test result.
References
Hoyer, Patrik O and Janzing, Dominik and Mooij, Joris M and Peters, Jonas and Schölkopf, Bernhard, “Nonlinear causal discovery with additive noise models”, NIPS 2009
Parameters
- alphafloat, default 0.05
significance level be used to compute threshold
Attributes
- causal_matrixarray like shape of (n_features, n_features)
Learned causal structure matrix.
Examples
>>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> from castle.datasets import DAG, IIDSimulation >>> from castle.algorithms.anm import ANMNonlinear
>>> weighted_random_dag = DAG.erdos_renyi(n_nodes=6, n_edges=10, >>> weight_range=(0.5, 2.0), seed=1) >>> dataset = IIDSimulation(W=weighted_random_dag, n=1000, >>> method='nonlinear', sem_type='gp-add') >>> true_dag, X = dataset.B, dataset.X
>>> anm = ANMNonlinear(alpha=0.05) >>> anm.learn(data=X)
>>> # plot predict_dag and true_dag >>> GraphDAG(anm.causal_matrix, true_dag, show=False, save_name='result')
you can also provide more parameters to use it. like the flowing: >>> from sklearn.gaussian_process.kernels import Matern, RBF >>> kernel = Matern(nu=1.5) >>> # kernel = 1.0 * RBF(1.0) >>> anm = ANMNonlinear(alpha=0.05) >>> anm.learn(data=X, regressor=GPR(kernel=kernel)) >>> # plot predict_dag and true_dag >>> GraphDAG(anm.causal_matrix, true_dag, show=False, save_name=’result’)
- anm_estimate(x, y, regressor=<castle.algorithms.anm._anm.GPR object>, test_method=<function hsic_test>)[source]
Compute the fitness score of the ANM model in the x->y direction.
Parameters
- x: array
Variable seen as cause
- y: array
Variable seen as effect
- regressor: Class
Nonlinear regression estimator, if not provided, it is GPR. If user defined, must implement estimate method. such as :
regressor.estimate(x, y)
- test_method: callable, default test_method
independence test method, if not provided, it is HSIC. If user defined, must accept three arguments–x, y and keyword argument–alpha. such as :
test_method(x, y, alpha=0.05)
Returns
- out: int, 0 or 1
If 1, residuals n is independent of x, then accept x –> y If 0, residuals n is not independent of x, then reject x –> y
Examples
>>> import numpy as np >>> from castle.algorithms.anm import ANMNonlinear >>> np.random.seed(1) >>> x = np.random.rand(500, 2) >>> anm = ANMNonlinear(alpha=0.05) >>> print(anm.anm_estimate(x[:, [0]], x[:, [1]])) 1
- learn(data, columns=None, regressor=<castle.algorithms.anm._anm.GPR object>, test_method=<function hsic_test>, **kwargs)[source]
Set up and run the ANM_Nonlinear algorithm.
Parameters
- data: numpy.ndarray or Tensor
Training data.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- regressor: Class
Nonlinear regression estimator, if not provided, it is GPR. If user defined, must implement estimate method. such as :
regressor.estimate(x, y)
- test_method: callable, default test_method
independence test method, if not provided, it is HSIC. If user defined, must accept three arguments–x, y and keyword argument–alpha. such as :
test_method(x, y, alpha=0.05)
GES
- class castle.algorithms.GES(criterion='bic', method='scatter', k=0.001, N=10)[source]
Greedy equivalence search for causal discovering
References
[1]: https://www.sciencedirect.com/science/article/pii/S0888613X12001636 [2]: https://www.jmlr.org/papers/volume3/chickering02b/chickering02b.pdf
Parameters
- criterion: str for DecomposableScore object
scoring criterion, one of [‘bic’, ‘bdeu’].
- Notes:
‘bdeu’ just for discrete variable.
2. if you want to customize criterion, you must create a class and inherit the base class DecomposableScore in module ges.score.local_scores
- method: str
effective when criterion=’bic’, one of [‘r2’, ‘scatter’].
- k: float, default: 0.001
structure prior, effective when criterion=’bdeu’.
- N: int, default: 10
prior equivalent sample size, effective when criterion=’bdeu’
Examples
>>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> from castle.datasets import load_dataset
>>> X, true_dag, _ = load_dataset(name='IID_Test') >>> algo = GES() >>> algo.learn(X) >>> GraphDAG(algo.causal_matrix, true_dag, save_name='result_pc') >>> met = MetricsDAG(algo.causal_matrix, true_dag) >>> print(met.metrics)
DirectLiNGAM
- class castle.algorithms.DirectLiNGAM(prior_knowledge=None, measure='pwling', thresh=0.3)[source]
DirectLiNGAM Algorithm. A direct learning algorithm for linear non-Gaussian acyclic model (LiNGAM). Implementation of DirectLiNGAM Algorithm [1]_ [2], Construct a DirectLiNGAM model.
Parameters
- prior_knowledgearray-like, shape (n_features, n_features), optional (default=None)
Prior knowledge used for causal discovery, where
n_featuresis the number of features.The elements of prior knowledge matrix are defined as follows [1]_:
0: \(x_i\) does not have a directed path to \(x_j\)1: \(x_i\) has a directed path to \(x_j\)-1: No prior knowledge is available to know if either of the two cases above (0 or 1) is true.
- measure{‘pwling’, ‘kernel’}, default=’pwling’
Measure to evaluate independence: ‘pwling’ [2] or ‘kernel’ [1]_.
- threshfloat, default=’0.3’
Drop edge if |weight| < threshold
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix.
- weight_causal_matrix: numpy.ndarray
Learned weighted causal structure matrix.
References
Examples
>>> from castle.algorithms import DirectLiNGAM >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset(name='IID_Test') >>> n = DirectLiNGAM() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- fit(X)[source]
Fit the model to X.
Parameters
- Xarray-like, shape (n_samples, n_features)
Training data, where
n_samplesis the number of samples andn_featuresis the number of features.
Returns
- selfobject
Returns the instance itself.
- learn(data, columns=None, **kwargs)[source]
Set up and run the DirectLiNGAM algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
ICALiNGAM
- class castle.algorithms.ICALiNGAM(random_state=None, max_iter=1000, thresh=0.3)[source]
ICALiNGAM Algorithm. An ICA-based learning algorithm for linear non-Gaussian acyclic model (LiNGAM). Implementation of ICA-based LiNGAM Algorithm [1]_, Construct a ICA-based LiNGAM model.
Parameters
- random_stateint, optional (default=None)
random_stateis the seed used by the random number generator.- max_iterint, optional (default=1000)
The maximum number of iterations of FastICA.
- threshfloat, default=’0.3’
Drop edge if |weight| < threshold
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix
- weight_causal_matrix: numpy.ndarray
Learned weighted causal structure matrix.
References
[1] S. Shimizu, P. O. Hoyer, A. Hyvärinen, and A. J. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003-2030, 2006.
Examples
>>> from castle.algorithms import ICALiNGAM >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset(name='IID_Test') >>> n = ICALiNGAM() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
PC
- class castle.algorithms.PC(variant='original', alpha=0.05, ci_test='fisherz', priori_knowledge=None)[source]
PC algorithm
A classic causal discovery algorithm based on conditional independence tests.
References
- [1] original-PC
https://www.jmlr.org/papers/volume8/kalisch07a/kalisch07a.pdf
- [2] stable-PC
- [3] parallel-PC
Parameters
- variantstr
A variant of PC-algorithm, one of [original, stable, parallel].
- alpha: float, default 0.05
Significance level.
- ci_teststr, callable
ci_test method, if str, must be one of [fisherz, g2, chi2] See more: castle.common.independence_tests.CITest
- priori_knowledge: PrioriKnowledge
a class object PrioriKnowledge
Attributes
- causal_matrixarray
Learned causal structure matrix.
Examples
>>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> from castle.datasets import load_dataset
>>> X, true_dag, _ = load_dataset(name='IID_Test') >>> pc = PC(variant='stable') >>> pc.learn(X) >>> GraphDAG(pc.causal_matrix, true_dag, save_name='result_pc') >>> met = MetricsDAG(pc.causal_matrix, true_dag) >>> print(met.metrics)
>>> pc = PC(variant='parallel') >>> pc.learn(X, p_cores=2) >>> GraphDAG(pc.causal_matrix, true_dag, save_name='result_pc') >>> met = MetricsDAG(pc.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, columns=None, **kwargs)[source]
Set up and run the PC algorithm.
Parameters
- data: array or Tensor
Training data
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- kwargs: [optional]
- p_coresint
number of CPU cores to be used
- sboolean
memory-efficient indicator
- batchint
number of edges per batch
if s is None or False, or without batch, batch=|J|. |J| denote number of all pairs of adjacency vertices (X, Y) in G.
TTPM
- class castle.algorithms.TTPM(topology_matrix, delta=0.1, epsilon=1, max_hop=0, penalty='BIC', max_iter=20, priori_knowledge=None)[source]
TTPM Algorithm.
- A causal structure learning algorithm based on Topological Hawkes process
for spatio-temporal event sequences.
Parameters
- topology_matrix: np.matrix
Interpreted as an adjacency matrix to generate the graph. It should have two dimensions, and should be square.
- delta: float, default=0.1
Time decaying coefficient for the exponential kernel.
- epsilon: int, default=1
BIC penalty coefficient.
- max_hop: positive int, default=6
The maximum considered hops in the topology, when
max_hop=0, it is divided by nodes, regardless of topology.- penalty: str, default=BIC
Two optional values: ‘BIC’ or ‘AIC’.
- max_iter: int
Maximum number of iterations.
- priori_knowledge: PrioriKnowledge, default=None
a class object PrioriKnowledge
Examples
>>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> from castle.datasets import load_dataset >>> from castle.algorithms import TTPM # Data Simulation for TTPM >>> X, true_causal_matrix, topology_matrix = load_dataset('THP_Test') >>> ttpm = TTPM(topology_matrix, max_hop=2) >>> ttpm.learn(X) >>> causal_matrix = ttpm.causal_matrix # plot est_dag and true_dag >>> GraphDAG(ttpm.causal_matrix, true_causal_matrix) # calculate accuracy >>> ret_metrix = MetricsDAG(ttpm.causal_matrix, true_causal_matrix) >>> ret_metrix.metrics
- learn(tensor, *args, **kwargs)[source]
Set up and run the TTPM algorithm.
Parameters
- tensor: pandas.DataFrame
(V 1.0.0, we’ll eliminate this constraint in the next version) The tensor is supposed to contain three cols:
[‘event’, ‘timestamp’, ‘node’]
- Description of the three columns:
event: event name (type). timestamp: occurrence timestamp of event, i.e., ‘1615962101.0’. node: topological node where the event happened.
CORL
- class castle.algorithms.CORL(batch_size=64, input_dim=100, embed_dim=256, normalize=False, encoder_name='transformer', encoder_heads=8, encoder_blocks=3, encoder_dropout_rate=0.1, decoder_name='lstm', reward_mode='episodic', reward_score_type='BIC', reward_regression_type='LR', reward_gpr_alpha=1.0, iteration=10000, lambda_iter_num=500, actor_lr=0.0001, critic_lr=0.001, alpha=0.99, init_baseline=-1.0, random_seed=0, device_type='cpu', device_ids=0)[source]
Causal discovery with Ordering-based Reinforcement Learning
A RL- and order-based algorithm that improves the efficiency and scalability of previous RL-based approach, contains CORL1 with
episodicreward type and CORL2 withdensereward type``.References
Parameters
- batch_size: int, default: 64
training batch size
- input_dim: int, default: 64
dimension of input data
- embed_dim: int, default: 256
dimension of embedding layer output
- normalize: bool, default: False
whether normalization for input data
- encoder_name: str, default: ‘transformer’
Encoder name, must be one of [‘transformer’, ‘lstm’, ‘mlp’]
- encoder_heads: int, default: 8
number of multi-head of transformer Encoder.
- encoder_blocks: int, default: 3
blocks number of Encoder
- encoder_dropout_rate: float, default: 0.1
dropout rate for encoder
- decoder_name: str, default: ‘lstm’
Decoder name, must be one of [‘lstm’, ‘mlp’]
- reward_mode: str, default: ‘episodic’
reward mode, ‘episodic’ or ‘dense’, ‘episodic’ denotes
episodic-reward, ‘dense’ denotesdense-reward.- reward_score_type: str, default: ‘BIC’
type of score function
- reward_regression_type: str, default: ‘LR’
type of regression function, must be one of [‘LR’, ‘QR’]
- reward_gpr_alpha: float, default: 1.0
alpha of GPR
- iteration: int, default: 5000
training times
- actor_lr: float, default: 1e-4
learning rate of Actor network, includes
encoderanddecoder.- critic_lr: float, default: 1e-3
learning rate of Critic network
- alpha: float, default: 0.99
alpha for score function, includes
dense_actor_lossanddense_critic_loss.- init_baseline: float, default: -1.0
initilization baseline for score function, includes
dense_actor_lossanddense_critic_loss.- random_seed: int, default: 0
random seed for all random process
- device_type: str, default: cpu
cpuorgpu- device_ids: int or str, default None
CUDA devices, it’s effective when
use_gpuis True. For single-device modules,device_idscan be int or str, e.g. 0 or ‘0’, For multi-device modules,device_idsmust be str, format like ‘0, 1’.
Examples
>>> from castle.algorithms.gradient.corl.torch import CORL >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> n = CORL() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, columns=None, **kwargs) None[source]
Set up and run the Causal discovery with Ordering-based Reinforcement Learning algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- Other Parameters:
- dag_maskndarray
two-dimensional array with [0, 1], shape = [n_nodes, n_nodes]. (i, j) indicated element 0 denotes there must be no edge between nodes i and j , the element 1 indicates that there may or may not be an edge.
DAG_GNN
- class castle.algorithms.DAG_GNN(encoder_type='mlp', decoder_type='mlp', encoder_hidden=64, latent_dim=None, decoder_hidden=64, encoder_dropout=0.0, decoder_dropout=0.0, epochs=300, k_max_iter=100.0, tau_a=0.0, batch_size=100, lr=0.003, lr_decay=200, gamma=1.0, init_lambda_a=0.0, init_c_a=1.0, c_a_thresh=1e+20, eta=10, multiply_h=0.25, h_tolerance=1e-08, use_a_connect_loss=False, use_a_positiver_loss=False, graph_threshold=0.3, optimizer='adam', seed=42, device_type='cpu', device_ids='0')[source]
DAG Structure Learning with Graph Neural Networks
References
Parameters
- encoder_type: str, default: ‘mlp’
choose an encoder, ‘mlp’ or ‘sem’.
- decoder_type: str, detault: ‘mlp’
choose a decoder, ‘mlp’ or ‘sem’.
- encoder_hidden: int, default: 64
MLP encoder hidden layer dimension, just one hidden layer.
- latent_dim: int, default equal to input dimension
encoder output dimension
- decoder_hidden: int, default: 64
MLP decoder hidden layer dimension, just one hidden layer.
- encoder_dropout: float, default: 0.0
Dropout rate (1 - keep probability).
- decoder_dropout: float, default: 0.0
Dropout rate (1 - keep probability).
- epochs: int, default: 300
train epochs
- k_max_iter: int, default: 1e2
the max iteration number for searching lambda and c.
- batch_size: int, default: 100
Sample size of each training batch
- lr: float, default: 3e-3
learning rate
- lr_decay: int, default: 200
Period of learning rate decay.
- gamma: float, default: 1.0
Multiplicative factor of learning rate decay.
- lambda_a: float, default: 0.0
coefficient for DAG constraint h(A).
- c_a: float, default: 1.0
coefficient for absolute value h(A).
- c_a_thresh: float, default: 1e20
control loop by c_a
- eta: int, default: 10
use for update c_a, greater equal than 1.
- multiply_h: float, default: 0.25
use for judge whether update c_a.
- tau_a: float, default: 0.0
coefficient for L-1 norm of A.
- h_tolerance: float, default: 1e-8
the tolerance of error of h(A) to zero.
- use_a_connect_loss: bool, default: False
flag to use A connect loss
- use_a_positiver_loss: bool, default: False
flag to enforce A must have positive values
- graph_threshold: float, default: 0.3
threshold for learned adjacency matrix binarization. greater equal to graph_threshold denotes has causal relationship.
- optimizer: str, default: ‘Adam’
choose optimizer, ‘Adam’ or ‘SGD’
- seed: int, default: 42
random seed
- device_type: str, default: cpu
cpuorgpu- device_ids: int or str, default None
CUDA devices, it’s effective when
use_gpuis True. For single-device modules,device_idscan be int or str, e.g. 0 or ‘0’, For multi-device modules,device_idsmust be str, format like ‘0, 1’.
Examples
>>> from castle.algorithms.gradient.dag_gnn.torch import DAG_GNN >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> m = DAG_GNN() >>> m.learn(X) >>> GraphDAG(m.causal_matrix, true_dag) >>> met = MetricsDAG(m.causal_matrix, true_dag) >>> print(met.metrics)
GAE
- class castle.algorithms.GAE(input_dim=1, hidden_layers=1, hidden_dim=4, activation=LeakyReLU(negative_slope=0.05), epochs=10, update_freq=3000, init_iter=3, lr=0.001, alpha=0.0, beta=2.0, init_rho=1.0, rho_thresh=1e+30, gamma=0.25, penalty_lambda=0.0, h_thresh=1e-08, graph_thresh=0.3, early_stopping=False, early_stopping_thresh=1.0, seed=1230, device_type='cpu', device_ids='0')[source]
GAE Algorithm. A gradient-based algorithm using graph autoencoder to model non-linear causal relationships.
Parameters
- input_dim: int, default: 1
dimension of vector for x
- hidden_layers: int, default: 1
number of hidden layers for encoder and decoder
- hidden_dim: int, default: 4
hidden size for mlp layer
- activation: callable, default: nn.LeakyReLU(0.05)
nonlinear functional
- epochs: int, default: 10
Number of iterations for optimization problem
- update_freq: int, default: 3000
Number of steps for each iteration
- init_iter: int, default: 3
Initial iteration to disallow early stopping
- lr: float, default: 1e-3
learning rate
- alpha: float, default: 0.0
Lagrange multiplier
- beta: float, default: 2.0
Multiplication to amplify rho each time
- init_rho: float, default: 1.0
Initial value for rho
- rho_thresh: float, default: 1e30
Threshold for rho
- gamma: float, default: 0.25
Threshold for h
- penalty_lambda: float, default: 0.0
L1 penalty for sparse graph. Set to 0.0 to disable
- h_thresh: float, default: 1e-8
Tolerance of optimization problem
- graph_thresh: float, default: 0.3
Threshold to filter out small values in the graph
- early_stopping: bool, default: False
Whether to use early stopping
- early_stopping_thresh: float, default: 1.0
Threshold ratio for early stopping
- seed: int, default: 1230
Reproducibility, must be int
- device_type: str, default: ‘cpu’
‘cpu’ or ‘gpu’
- device_ids: int or str, default ‘0’
CUDA devices, it’s effective when
use_gpuis True. For single-device modules,device_idscan be int or str, e.g. 0 or ‘0’, For multi-device modules,device_idsmust be str, format like ‘0, 1’.
GraNDAG
- class castle.algorithms.GraNDAG(input_dim, hidden_num=2, hidden_dim=10, batch_size=64, lr=0.001, iterations=10000, model_name='NonLinGaussANM', nonlinear='leaky-relu', optimizer='rmsprop', h_threshold=1e-08, device_type='cpu', device_ids='0', use_pns=False, pns_thresh=0.75, num_neighbors=None, normalize=False, precision=False, random_seed=42, jac_thresh=True, lambda_init=0.0, mu_init=0.001, omega_lambda=0.0001, omega_mu=0.9, stop_crit_win=100, edge_clamp_range=0.0001, norm_prod='paths', square_prod=False)[source]
Gradient Based Neural DAG Learner
A gradient-based algorithm using neural network modeling for non-linear additive noise data
References: https://arxiv.org/pdf/1906.02226.pdf
Parameters
- input_dimint
number of input layer, must be int
- hidden_numint, default 2
number of hidden layers
- hidden_dimint, default 10
number of dimension per hidden layer
- batch_sizeint, default 64
batch size of per training of NN
- lrfloat, default 0.001
learning rate
- iterationsint, default 10000
times of iteration
- model_namestr, default ‘NonLinGaussANM’
name of model, ‘NonLinGauss’ or ‘NonLinGaussANM’
- nonlinearstr, default ‘leaky-relu’
name of Nonlinear activation function, ‘sigmoid’ or ‘leaky-relu’
- optimizerstr, default ‘rmsprop’
Method of optimize, rmsprop or sgd
- h_thresholdfloat, default 1e-8
constrained threshold
- device_typestr, default ‘cpu’
use gpu or cpu
- use_pnsbool, default False
whether use pns before training, if nodes > 50, use it.
- pns_threshfloat, default 0.75
threshold for feature importance score in pns
- num_neighborsint, default None
number of potential parents for each variables
- normalizebool, default False
whether normalize data
- precisionbool, default False
whether use Double precision if True, use torch.FloatTensor; if False, use torch.DoubleTensor
- random_seedint, default 42
random seed
- norm_prodstr, default ‘paths’
use norm of product of paths, ‘none’ or ‘paths’ ‘paths’: use norm, ‘none’: with no norm
- square_prodbool, default False
use squared product of paths
- jac_threshbool, default True
get the average Jacobian with the trained model
- lambda_initfloat, default 0.0
initialization of Lagrangian coefficient in the optimization of augmented Lagrangian
- mu_initfloat, default 0.001
initialization of penalty coefficient in the optimization of augmented Lagrangian
- omega_lambdafloat, default 0.0001
tolerance on the delta lambda, to find saddle points
- omega_mufloat, default 0.9
check whether the constraint decreases sufficiently if it decreases at least (1-omega_mu) * h_prev
- stop_crit_winint, default 100
number of iterations for updating values
- edge_clamp_rangefloat, default 0.0001
threshold for keeping the edge (if during training)
Examples
Load data
>>> from castle.datasets import load_dataset >>> data, true_dag, _ = load_dataset('IID_Test')
>>> gnd = GraNDAG(input_dim=data.shape[1]) >>> gnd.learn(data=data)
Also print GraN_DAG.model.adjacency with torch.Tensor type or print GranN_DAG.causal_matrix with numpy.ndarray.
>>> print(gnd.causal_matrix) >>> print(gnd.model.adjacency)
MCSL
- class castle.algorithms.MCSL(model_type='nn', num_hidden_layers=4, hidden_dim=16, graph_thresh=0.5, l1_graph_penalty=0.002, learning_rate=0.03, max_iter=25, iter_step=1000, init_iter=2, h_tol=1e-10, init_rho=1e-05, rho_thresh=100000000000000.0, h_thresh=0.25, rho_multiply=10, temperature=0.2, device_type='cpu', device_ids='0', random_seed=1230)[source]
Masked Gradient-Based Causal Structure Learning
A gradient-based algorithm for non-linear additive noise data by learning the binary adjacency matrix.
Parameters
- model_type: str, default: ‘nn’
nn denotes neural network, qr denotes quatratic regression.
- num_hidden_layers: int, default: 4
Number of hidden layer in neural network when model_type is ‘nn’.
- hidden_dim: int, default: 16
Number of hidden dimension in hidden layer, when model_type is ‘nn’.
- graph_thresh: float, default: 0.5
Threshold used to determine whether has edge in graph, element greater than the graph_thresh means has a directed edge, otherwise has not.
- l1_graph_penalty: float, default: 2e-3
Penalty weight for L1 normalization
- learning_rate: float, default: 3e-2
learning rate for opitimizer
- max_iter: int, default: 25
Number of iterations for optimization problem
- iter_step: int, default: 1000
Number of steps for each iteration
- init_iter: int, default: 2
Initial iteration to disallow early stopping
- h_tol: float, default: 1e-10
Tolerance of optimization problem
- init_rho: float, default: 1e-5
Initial value for penalty parameter.
- rho_thresh: float, default: 1e14
Threshold for penalty parameter.
- h_thresh: float, default: 0.25
Threshold for h
- rho_multiply: float, default: 10.0
Multiplication to amplify rho each time
- temperature: float, default: 0.2
Temperature for gumbel sigmoid
- device_type: str, default: ‘cpu’
‘cpu’ or ‘gpu’
- device_ids: int or str, default ‘0’
CUDA devices, it’s effective when
use_gpuis True. For single-device modules,device_idscan be int or str, e.g. 0 or ‘0’, For multi-device modules,device_idsmust be str, format like ‘0, 1’.- random_seed: int, default: 1230
random seed for every random value
References
Examples
>>> from castle.algorithms import MCSL >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> true_dag, X = load_dataset(name='iid_test') >>> n = MCSL(iter_step=1000, rho_thres=1e14, init_rho=1e-5, ... rho_multiply=10, graph_thres=0.5, l1_graph_penalty=2e-3) >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, columns=None, pns_mask=None, **kwargs) None[source]
Set up and run the MCSL algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columns: Index or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- pns_mask: array_like or None
The mask matrix. array with element in {0, 1},
0denotes has no edge in i -> j,1denotes maybe has edge in i -> j or not.
Notears
- class castle.algorithms.Notears(lambda1=0.1, loss_type='l2', max_iter=100, h_tol=1e-08, rho_max=1e+16, w_threshold=0.3)[source]
Notears Algorithm. A gradient-based algorithm for linear data models (typically with least-squares loss).
Parameters
- lambda1: float
l1 penalty parameter
- loss_type: str
l2, logistic, poisson
- max_iter: int
max num of dual ascent steps
- h_tol: float
exit if |h(w_est)| <= htol
- rho_max: float
exit if rho >= rho_max
- w_threshold: float
drop edge if |weight| < threshold
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix
References
Examples
>>> from castle.algorithms import Notears >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> n = Notears() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, columns=None, **kwargs)[source]
Set up and run the Notears algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
NotearsLowRank
- class castle.algorithms.NotearsLowRank(w_init=None, max_iter=15, h_tol=1e-06, rho_max=1e+20, w_threshold=0.3)[source]
NotearsLowRank Algorithm. Adapting NOTEARS for large problems with low-rank causal graphs.
Parameters
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix
References
Examples
>>> import numpy as np >>> from castle.algorithms import NotearsLowRank >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> rank = np.linalg.matrix_rank(true_dag) >>> n = NotearsLowRank() >>> n.learn(X, rank=rank) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, rank, columns=None, **kwargs)[source]
Set up and run the NotearsLowRank algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- rank: int
The algebraic rank of the weighted adjacency matrix of a graph.
- notears_low_rank(X, rank, w_init=None)[source]
Solve min_W ell(W; X) s.t. h(W) = 0 using augmented Lagrangian.
Parameters
- X: [n,d] sample matrix
max_iter: max number of dual ascent steps.
- rank: int
The rank of data.
- w_init: None or numpy.ndarray
Initialized weight matrix
Return
- W_est: np.ndarray
estimate [d,d] dag matrix
GOLEM
- class castle.algorithms.GOLEM(B_init=None, lambda_1=0.02, lambda_2=5.0, equal_variances=True, learning_rate=0.001, num_iter=100000.0, checkpoint_iter=5000, seed=1, graph_thres=0.3, device_type='cpu', device_ids=0)[source]
GOLEM Algorithm. A more efficient version of NOTEARS that can reduce number of optimization iterations.
Paramaters
- B_init: None
File of weighted matrix for initialization. Set to None to disable.
- lambda_1: float
Coefficient of L1 penalty.
- lambda_2: float
Coefficient of DAG penalty.
- equal_variances: bool
Assume equal noise variances for likelibood objective.
- learning_rate: float
Learning rate of Adam optimizer.
- num_iter: float
Number of iterations for training.
- checkpoint_iter: int
Number of iterations between each checkpoint. Set to None to disable.
- seed: int
Random seed.
- graph_thres: float
Threshold for weighted matrix.
- device_type: bool
whether to use GPU or not
- device_ids: int
choose which gpu to use
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix (binary)
- weight_causal_matrix: numpy.ndarray
Learned causal structure matrix (weighted)
References
Examples
>>> from castle.algorithms import GOLEM >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, topology_matrix = load_dataset(name='IID_Test') >>> n = GOLEM() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, columns=None, **kwargs)[source]
Set up and run the GOLEM algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- X: numpy.ndarray
[n, d] data matrix.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
NotearsNonlinear
- class castle.algorithms.NotearsNonlinear(lambda1: float = 0.01, lambda2: float = 0.01, max_iter: int = 100, h_tol: float = 1e-08, rho_max: float = 1e+16, w_threshold: float = 0.3, hidden_layers: tuple = (10, 1), expansions: int = 10, bias: bool = True, model_type: str = 'mlp', device_type: str = 'cpu', device_ids=None)[source]
Notears Nonlinear. include notears-mlp and notears-sob. A gradient-based algorithm using neural network or Sobolev space modeling for non-linear causal relationships.
Parameters
- lambda1: float
l1 penalty parameter
- lambda2: float
l2 penalty parameter
- max_iter: int
max num of dual ascent steps
- h_tol: float
exit if |h(w_est)| <= htol
- rho_max: float
exit if rho >= rho_max
- w_threshold: float
drop edge if |weight| < threshold
- hidden_layers: Iterrable
Dimension of per hidden layer, and the last element must be 1 as output dimension. At least contains 2 elements. For example: hidden_layers=(5, 10, 1), denotes two hidden layer has 5 and 10 dimension and output layer has 1 dimension. It is effective when model_type=’mlp’.
- expansions: int
expansions of each variable, it is effective when model_type=’sob’.
- bias: bool
Indicates whether to use weight deviation.
- model_type: str
The Choice of Two Nonlinear Network Models in a Notears Framework: Multilayer perceptrons value is ‘mlp’, Basis expansions value is ‘sob’.
- device_type: str, default: cpu
cpuorgpu- device_ids: int or str, default None
CUDA devices, it’s effective when
use_gpuis True. For single-device modules,device_idscan be int or str, e.g. 0 or ‘0’, For multi-device modules,device_idsmust be str, format like ‘0, 1’.
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix
References
Examples
>>> from castle.algorithms import NotearsNonlinear >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> n = NotearsNonlinear() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- dual_ascent_step(model, X)[source]
Perform one step of dual ascent in augmented Lagrangian.
Parameters
- model: nn.Module
network model
- X: torch.tenser
sample data
Returns
- :tuple
cycle control parameter
- get_model(input_dim)[source]
Choose a different model.
Parameters
- input_dim: int
Enter the number of data dimensions.
Returns
- learn(data, columns=None, **kwargs)[source]
Set up and run the NotearsNonlinear algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
PNL
- class castle.algorithms.PNL(hidden_layers=1, hidden_units=10, batch_size=64, epochs=100, lr=0.0001, alpha=0.01, bias=True, activation=LeakyReLU(negative_slope=0.01), device_type='cpu', device_ids=None)[source]
On the Identifiability of the Post-Nonlinear Causal Model
References
Parameters
- hidden_layers: int
number of hidden layer of mlp
- hidden_units: int
number of unit of per hidden layer
- batch_size: int
size of training batch
- epochs: int
training times on all samples
- lr: float
learning rate
- alpha: float
significance level
- bias: bool
whether use bias
- activation: callable
nonlinear activation function
- device_type: str
‘cpu’ or ‘gpu’, default: ‘cpu’
- device_ids: int or str
e.g. 0 or ‘0,1’, denotes which gpu that you want to use.
Examples
>>> from castle.algorithms.gradient.pnl.torch import PNL >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> n = PNL() >>> n.learn(X) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
RL
- class castle.algorithms.RL(encoder_type='TransformerEncoder', hidden_dim=64, num_heads=16, num_stacks=6, residual=False, decoder_type='SingleLayerDecoder', decoder_activation='tanh', decoder_hidden_dim=16, use_bias=False, use_bias_constant=False, bias_initial_value=False, batch_size=64, input_dimension=64, normalize=False, transpose=False, score_type='BIC', reg_type='LR', lambda_iter_num=1000, lambda_flag_default=True, score_bd_tight=False, lambda2_update=10, score_lower=0.0, score_upper=0.0, seed=8, nb_epoch=20000, lr1_start=0.001, lr1_decay_step=5000, lr1_decay_rate=0.96, alpha=0.99, init_baseline=-1.0, l1_graph_reg=0.0, verbose=False, device_type='cpu', device_ids=0)[source]
RL Algorithm. A RL-based algorithm that can work with flexible score functions (including non-smooth ones).
Parameters
- encoder_type: str
type of encoder used
- hidden_dim: int
actor LSTM num_neurons
- num_heads: int
actor input embedding
- num_stacks: int
actor LSTM num_neurons
- residual: bool
whether to use residual for gat encoder
- decoder_type: str
type of decoder used
- decoder_activation: str
activation for decoder
- decoder_hidden_dim: int
hidden dimension for decoder
- use_bias: bool
Whether to add bias term when calculating decoder logits
- use_bias_constant: bool
Whether to add bias term as CONSTANT when calculating decoder logits
- bias_initial_value: float
Initial value for bias term when calculating decoder logits
- batch_size: int
batch size for training
- input_dimension: int
dimension of reshaped vector
- normalize: bool
whether the inputdata shall be normalized
- transpose: bool
whether the true graph needs transposed
- score_type: str
score functions
- reg_type: str
regressor type (in combination wth score_type)
- lambda_iter_num: int
how often to update lambdas
- lambda_flag_default: bool
with set lambda parameters; true with default strategy and ignore input bounds
- score_bd_tight: bool
if bound is tight, then simply use a fixed value, rather than the adaptive one
- lambda1_update: float
increasing additive lambda1
- lambda2_update: float
increasing multiplying lambda2
- score_lower: float
lower bound on lambda1
- score_upper: float
upper bound on lambda1
- lambda2_lower: float
lower bound on lambda2
- lambda2_upper: float
upper bound on lambda2
- seed: int
seed
- nb_epoch: int
nb epoch
- lr1_start: float
actor learning rate
- lr1_decay_step: int
lr1 decay step
- lr1_decay_rate: float
lr1 decay rate
- alpha: float
update factor moving average baseline
- init_baseline: float
initial baseline - REINFORCE
- temperature: float
pointer_net initial temperature
- C: float
pointer_net tan clipping
- l1_graph_reg: float
L1 graph regularization to encourage sparsity
- inference_mode: bool
switch to inference mode when model is trained
- verbose: bool
print detailed logging or not
- device_type: str
whether to use GPU or not
- device_ids: int
choose which gpu to use
Attributes
- causal_matrixnumpy.ndarray
Learned causal structure matrix
References
Examples
>>> from castle.algorithms import RL >>> from castle.datasets import load_dataset >>> from castle.common import GraphDAG >>> from castle.metrics import MetricsDAG >>> X, true_dag, _ = load_dataset('IID_Test') >>> n = RL() >>> n.learn(X, dag=true_dag) >>> GraphDAG(n.causal_matrix, true_dag) >>> met = MetricsDAG(n.causal_matrix, true_dag) >>> print(met.metrics)
- learn(data, columns=None, dag=None, **kwargs)[source]
Set up and run the RL algorithm.
Parameters
- data: castle.Tensor or numpy.ndarray
The castle.Tensor or numpy.ndarray format data you want to learn.
- columnsIndex or array-like
Column labels to use for resulting tensor. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
- dagndarray
two-dimensional, prior matrix