API Reference#

This section provides detailed documentation for ProRCA’s modules, classes, and functions, automatically generated from the source code docstrings.

prorca.pathway#

class prorca.pathway.CausalResultsVisualizer(analysis_results)[source]#

Bases: object

Visualizes the results from CausalRootCauseAnalyzer. The class expects the analysis results to contain:
  • ‘paths’: a list of tuples (path, significance), where each path is a list of tuples (node, combined_score)

  • ‘node_scores’: a dict mapping node -> array of structural scores

  • ‘noise_contributions’: a dict mapping node -> array of noise contributions

It offers several plotting methods:
  • plot_root_cause_paths: a network diagram of the root cause pathways.

  • plot_node_scores: a bar chart of average structural scores per node.

  • plot_noise_contributions_distribution: a boxplot for the distribution of noise contributions.

  • plot_consistency_heatmap: a heatmap of the correlation between nodes’ noise contributions.

  • plot_timeline: a timeline plot if you run separate analyses per anomaly date.

plot_root_cause_paths()[source]#

Visualize the discovered root cause pathways using Graphviz for clarity. Each path is displayed as a separate cluster with a background color in a gradient that starts from light green and moves to yellow. The order of nodes is reversed such that the root cause appears first and the final outcome (e.g., ‘PROFIT_MARGIN’) appears last. Duplicate arrows for identical edges across different paths are omitted. The chart is rendered inline in a Jupyter Notebook.

class prorca.pathway.CausalRootCauseAnalyzer(scm, min_score_threshold: float = 0.8)[source]#

Bases: object

Advanced root cause analyzer combining structural and noise-based approaches.

analyze(df_agg: DataFrame, anomaly_dates, start_node: str = 'PROFIT_MARGIN') Dict[source]#

Main analysis method combining all approaches.

analyze_by_date(df_agg: DataFrame, anomaly_dates, start_node: str = 'PROFIT_MARGIN') Dict[source]#

Run the analysis separately for each anomaly date so that date-specific root causes are captured.

Parameters:
  • df_agg (pd.DataFrame) – The aggregated data containing an ‘ORDERDATE’ column.

  • anomaly_dates (iterable) – An iterable of anomaly dates (e.g., a list or DatetimeIndex).

  • start_node (str, default 'PROFIT_MARGIN') – The starting node for the root cause analysis.

Returns:

results – A dictionary where each key is an anomaly date and the value is the analysis result for that date.

Return type:

dict

class prorca.pathway.ScmBuilder(edges, nodes=None, visualize=False, viz_filename='dag_relationships', random_seed=0)[source]#

Bases: object

A builder class to construct a Structural Causal Model (SCM) from a given set of edges (and optionally nodes). It also provides a visualization of the causal graph if desired.

Parameters:
  • edges (list of tuple) – List of edges in the format (source, target) representing causal relationships.

  • nodes (list of str, optional) – List of nodes to include in the graph. If not provided, nodes are automatically inferred from edges.

  • visualize (bool, default False) – Whether to visualize the constructed causal graph using Graphviz.

  • viz_filename (str, default "dag_relationships") – The base filename (without extension) to use for saving the graph visualization.

  • random_seed (int, default 0) – Random seed for reproducibility when building and fitting the SCM.

build(df=None)[source]#

Convenience method to build the causal graph, optionally visualize it, and then construct the SCM (with optional auto-assignment and fitting if data is provided).

Parameters:

df (pd.DataFrame, optional) – The data to use for automatically assigning causal mechanisms and fitting the SCM.

Returns:

scm – The final Structural Causal Model.

Return type:

gcm.StructuralCausalModel

build_graph()[source]#

Build a networkx directed graph (DiGraph) from the provided nodes and edges.

Returns:

causal_graph – The constructed causal graph.

Return type:

nx.DiGraph

build_scm(df=None)[source]#

Build the Structural Causal Model (SCM) from the causal graph. If a DataFrame is provided, the method will automatically assign causal mechanisms and fit the model.

Parameters:

df (pd.DataFrame, optional) – The data to use for automatically assigning generative models to each node and fitting the SCM.

Returns:

scm – The constructed (and possibly fitted) SCM.

Return type:

gcm.StructuralCausalModel

visualize_graph()[source]#

Visualize the causal graph using Graphviz.

Returns:

image – The rendered image of the graph (useful in notebook environments).

Return type:

IPython.display.Image

Raises:

ValueError – If the causal graph has not yet been built.

prorca.dag_builder#

prorca.dag_builder.DagBuilder(cols)[source]#

anomaly.adtk#

class anomaly.adtk.AnomalyDetector(df, date_col='ORDERDATE', value_col='PROFIT_MARGIN')[source]#

Bases: object

A class for detecting anomalies in time series data using ADTK’s InterQuartileRangeAD. It also provides a visualization of the detected anomalies.

Parameters:
  • df (pd.DataFrame) – The input DataFrame containing the time series data.

  • date_col (str, default "ORDERDATE") – The column name containing date/time information. This column will be set as the DataFrame’s index.

  • value_col (str, default "PROFIT_MARGIN") – The column on which anomaly detection is performed.

detect()[source]#

Detect anomalies in the specified value column using InterQuartileRangeAD.

Returns:

anomalies

A DataFrame that includes:
  • ’is_anamoly’: a boolean flag indicating whether an anomaly was detected.

  • ’value’: the original value from the input DataFrame.

Return type:

pd.DataFrame

get_anomaly_dates()[source]#

Retrieve the dates where anomalies were detected.

Returns:

anomaly_dates – The dates (from the DataFrame’s index) where an anomaly was detected.

Return type:

pd.DatetimeIndex

visualize(figsize=(12, 6), title='Daily Profit Margin Over Time with Anomalies Highlighted', xlabel='Date', ylabel='Profit Margin Value', ylim=(40, 60))[source]#

Visualize the time series data with anomalies highlighted.

Parameters:
  • figsize (tuple, default (12, 6)) – Size of the figure.

  • title (str, default "Daily Profit Margin Over Time with Anomalies Highlighted") – Title for the plot.

  • xlabel (str, default "Date") – Label for the x-axis.

  • ylabel (str, default "Profit Margin Value") – Label for the y-axis.

  • ylim (tuple, optional, default (40, 60)) – Y-axis limits.

data_generators.synthetic_sales_data#

data_generators.synthetic_sales_data.calculate_discount(sales, promo_code, category, sales_channel, customer_loyalty)[source]#

Calculate discount based on promo code, adjusted by category, channel, and loyalty. Ensures discount doesn’t exceed sales.

data_generators.synthetic_sales_data.calculate_fulfillment_cost(quantity, sales_channel, territory, product_weight)[source]#

Calculate fulfillment cost based on quantity, channel, territory, and weight.

data_generators.synthetic_sales_data.generate_fashion_data_with_brand(start_date, end_date)[source]#

Generate realistic retail transactional data based on the flowchart relationships.

data_generators.synthetic_sales_data.inject_anomalies_by_date(df, anomaly_schedule)[source]#

Inject anomalies into the data with specific scopes and recalculate all downstream metrics, ensuring clear pathways for anomaly detection and root cause analysis.