API Reference

API Reference#

This section provides detailed documentation for ProRCA’s modules, classes, and functions, automatically generated from the source code docstrings.

prorca.pathway#

class prorca.pathway.CausalResultsVisualizer(analysis_results)[source]#

Bases: object

Visualizes the results from CausalRootCauseAnalyzer. The class expects the analysis results to contain:

‘paths’: a list of tuples (path, significance), where each path is a list of tuples (node, combined_score)
‘node_scores’: a dict mapping node -> array of structural scores
‘noise_contributions’: a dict mapping node -> array of noise contributions

It offers several plotting methods:

plot_root_cause_paths: a network diagram of the root cause pathways.
plot_node_scores: a bar chart of average structural scores per node.
plot_noise_contributions_distribution: a boxplot for the distribution of noise contributions.
plot_consistency_heatmap: a heatmap of the correlation between nodes’ noise contributions.
plot_timeline: a timeline plot if you run separate analyses per anomaly date.

plot_root_cause_paths()[source]#: Visualize the discovered root cause pathways using Graphviz for clarity. Each path is displayed as a separate cluster with a background color in a gradient that starts from light green and moves to yellow. The order of nodes is reversed such that the root cause appears first and the final outcome (e.g., ‘PROFIT_MARGIN’) appears last. Duplicate arrows for identical edges across different paths are omitted. The chart is rendered inline in a Jupyter Notebook.

class prorca.pathway.CausalRootCauseAnalyzer(scm, min_score_threshold: float = 0.8)[source]#

Bases: object

Advanced root cause analyzer combining structural and noise-based approaches.

analyze(df_agg: DataFrame, anomaly_dates, start_node: str = 'PROFIT_MARGIN') → Dict[source]#: Main analysis method combining all approaches.

analyze_by_date(df_agg: DataFrame, anomaly_dates, start_node: str = 'PROFIT_MARGIN') → Dict[source]#

Run the analysis separately for each anomaly date so that date-specific root causes are captured.

Parameters:

df_agg (pd.DataFrame) – The aggregated data containing an ‘ORDERDATE’ column.
anomaly_dates (iterable) – An iterable of anomaly dates (e.g., a list or DatetimeIndex).
start_node (str, default 'PROFIT_MARGIN') – The starting node for the root cause analysis.

Returns:

results – A dictionary where each key is an anomaly date and the value is the analysis result for that date.

Return type:

dict

class prorca.pathway.ScmBuilder(edges, nodes=None, visualize=False, viz_filename='dag_relationships', random_seed=0)[source]#

Bases: object

A builder class to construct a Structural Causal Model (SCM) from a given set of edges (and optionally nodes). It also provides a visualization of the causal graph if desired.

Parameters:

edges (list of tuple) – List of edges in the format (source, target) representing causal relationships.
nodes (list of str, optional) – List of nodes to include in the graph. If not provided, nodes are automatically inferred from edges.
visualize (bool, default False) – Whether to visualize the constructed causal graph using Graphviz.
viz_filename (str, default "dag_relationships") – The base filename (without extension) to use for saving the graph visualization.
random_seed (int, default 0) – Random seed for reproducibility when building and fitting the SCM.

build(df=None)[source]#

Convenience method to build the causal graph, optionally visualize it, and then construct the SCM (with optional auto-assignment and fitting if data is provided).

Parameters:: df (pd.DataFrame, optional) – The data to use for automatically assigning causal mechanisms and fitting the SCM.
Returns:: scm – The final Structural Causal Model.
Return type:: gcm.StructuralCausalModel

build_graph()[source]#

Build a networkx directed graph (DiGraph) from the provided nodes and edges.

Returns:: causal_graph – The constructed causal graph.
Return type:: nx.DiGraph

build_scm(df=None)[source]#

Build the Structural Causal Model (SCM) from the causal graph. If a DataFrame is provided, the method will automatically assign causal mechanisms and fit the model.

Parameters:: df (pd.DataFrame, optional) – The data to use for automatically assigning generative models to each node and fitting the SCM.
Returns:: scm – The constructed (and possibly fitted) SCM.
Return type:: gcm.StructuralCausalModel

visualize_graph()[source]#

Visualize the causal graph using Graphviz.

Returns:: image – The rendered image of the graph (useful in notebook environments).
Return type:: IPython.display.Image
Raises:: ValueError – If the causal graph has not yet been built.

prorca.dag_builder#

prorca.dag_builder.DagBuilder(cols)[source]#

anomaly.adtk#

class anomaly.adtk.AnomalyDetector(df, date_col='ORDERDATE', value_col='PROFIT_MARGIN')[source]#

Bases: object

A class for detecting anomalies in time series data using ADTK’s InterQuartileRangeAD. It also provides a visualization of the detected anomalies.

Parameters:

df (pd.DataFrame) – The input DataFrame containing the time series data.
date_col (str, default "ORDERDATE") – The column name containing date/time information. This column will be set as the DataFrame’s index.
value_col (str, default "PROFIT_MARGIN") – The column on which anomaly detection is performed.

detect()[source]#

Detect anomalies in the specified value column using InterQuartileRangeAD.

Returns:

anomalies –

A DataFrame that includes:

’is_anamoly’: a boolean flag indicating whether an anomaly was detected.
’value’: the original value from the input DataFrame.

Return type:

pd.DataFrame

get_anomaly_dates()[source]#

Retrieve the dates where anomalies were detected.

Returns:: anomaly_dates – The dates (from the DataFrame’s index) where an anomaly was detected.
Return type:: pd.DatetimeIndex

visualize(figsize=(12, 6), title='Daily Profit Margin Over Time with Anomalies Highlighted', xlabel='Date', ylabel='Profit Margin Value', ylim=(40, 60))[source]#

Visualize the time series data with anomalies highlighted.

Parameters:

figsize (tuple, default (12, 6)) – Size of the figure.
title (str, default "Daily Profit Margin Over Time with Anomalies Highlighted") – Title for the plot.
xlabel (str, default "Date") – Label for the x-axis.
ylabel (str, default "Profit Margin Value") – Label for the y-axis.
ylim (tuple, optional, default (40, 60)) – Y-axis limits.

data_generators.synthetic_sales_data#

data_generators.synthetic_sales_data.calculate_discount(sales, promo_code, category, sales_channel, customer_loyalty)[source]#: Calculate discount based on promo code, adjusted by category, channel, and loyalty. Ensures discount doesn’t exceed sales.

data_generators.synthetic_sales_data.calculate_fulfillment_cost(quantity, sales_channel, territory, product_weight)[source]#: Calculate fulfillment cost based on quantity, channel, territory, and weight.

data_generators.synthetic_sales_data.generate_fashion_data_with_brand(start_date, end_date)[source]#: Generate realistic retail transactional data based on the flowchart relationships.

data_generators.synthetic_sales_data.inject_anomalies_by_date(df, anomaly_schedule)[source]#: Inject anomalies into the data with specific scopes and recalculate all downstream metrics, ensuring clear pathways for anomaly detection and root cause analysis.

API Reference

Contents

API Reference#

prorca.pathway#

prorca.dag_builder#

anomaly.adtk#

data_generators.synthetic_sales_data#