API Reference#
This section provides detailed documentation for ProRCA’s modules, classes, and functions, automatically generated from the source code docstrings.
prorca.pathway#
- class prorca.pathway.CausalResultsVisualizer(analysis_results)[source]#
Bases:
object- Visualizes the results from CausalRootCauseAnalyzer. The class expects the analysis results to contain:
‘paths’: a list of tuples (path, significance), where each path is a list of tuples (node, combined_score)
‘node_scores’: a dict mapping node -> array of structural scores
‘noise_contributions’: a dict mapping node -> array of noise contributions
- It offers several plotting methods:
plot_root_cause_paths: a network diagram of the root cause pathways.
plot_node_scores: a bar chart of average structural scores per node.
plot_noise_contributions_distribution: a boxplot for the distribution of noise contributions.
plot_consistency_heatmap: a heatmap of the correlation between nodes’ noise contributions.
plot_timeline: a timeline plot if you run separate analyses per anomaly date.
- plot_root_cause_paths()[source]#
Visualize the discovered root cause pathways using Graphviz for clarity. Each path is displayed as a separate cluster with a background color in a gradient that starts from light green and moves to yellow. The order of nodes is reversed such that the root cause appears first and the final outcome (e.g., ‘PROFIT_MARGIN’) appears last. Duplicate arrows for identical edges across different paths are omitted. The chart is rendered inline in a Jupyter Notebook.
- class prorca.pathway.CausalRootCauseAnalyzer(scm, min_score_threshold: float = 0.8)[source]#
Bases:
objectAdvanced root cause analyzer combining structural and noise-based approaches.
- analyze(df_agg: DataFrame, anomaly_dates, start_node: str = 'PROFIT_MARGIN') Dict[source]#
Main analysis method combining all approaches.
- analyze_by_date(df_agg: DataFrame, anomaly_dates, start_node: str = 'PROFIT_MARGIN') Dict[source]#
Run the analysis separately for each anomaly date so that date-specific root causes are captured.
- Parameters:
df_agg (pd.DataFrame) – The aggregated data containing an ‘ORDERDATE’ column.
anomaly_dates (iterable) – An iterable of anomaly dates (e.g., a list or DatetimeIndex).
start_node (str, default 'PROFIT_MARGIN') – The starting node for the root cause analysis.
- Returns:
results – A dictionary where each key is an anomaly date and the value is the analysis result for that date.
- Return type:
dict
- class prorca.pathway.ScmBuilder(edges, nodes=None, visualize=False, viz_filename='dag_relationships', random_seed=0)[source]#
Bases:
objectA builder class to construct a Structural Causal Model (SCM) from a given set of edges (and optionally nodes). It also provides a visualization of the causal graph if desired.
- Parameters:
edges (list of tuple) – List of edges in the format (source, target) representing causal relationships.
nodes (list of str, optional) – List of nodes to include in the graph. If not provided, nodes are automatically inferred from edges.
visualize (bool, default False) – Whether to visualize the constructed causal graph using Graphviz.
viz_filename (str, default "dag_relationships") – The base filename (without extension) to use for saving the graph visualization.
random_seed (int, default 0) – Random seed for reproducibility when building and fitting the SCM.
- build(df=None)[source]#
Convenience method to build the causal graph, optionally visualize it, and then construct the SCM (with optional auto-assignment and fitting if data is provided).
- Parameters:
df (pd.DataFrame, optional) – The data to use for automatically assigning causal mechanisms and fitting the SCM.
- Returns:
scm – The final Structural Causal Model.
- Return type:
gcm.StructuralCausalModel
- build_graph()[source]#
Build a networkx directed graph (DiGraph) from the provided nodes and edges.
- Returns:
causal_graph – The constructed causal graph.
- Return type:
nx.DiGraph
- build_scm(df=None)[source]#
Build the Structural Causal Model (SCM) from the causal graph. If a DataFrame is provided, the method will automatically assign causal mechanisms and fit the model.
- Parameters:
df (pd.DataFrame, optional) – The data to use for automatically assigning generative models to each node and fitting the SCM.
- Returns:
scm – The constructed (and possibly fitted) SCM.
- Return type:
gcm.StructuralCausalModel
prorca.dag_builder#
anomaly.adtk#
- class anomaly.adtk.AnomalyDetector(df, date_col='ORDERDATE', value_col='PROFIT_MARGIN')[source]#
Bases:
objectA class for detecting anomalies in time series data using ADTK’s InterQuartileRangeAD. It also provides a visualization of the detected anomalies.
- Parameters:
df (pd.DataFrame) – The input DataFrame containing the time series data.
date_col (str, default "ORDERDATE") – The column name containing date/time information. This column will be set as the DataFrame’s index.
value_col (str, default "PROFIT_MARGIN") – The column on which anomaly detection is performed.
- detect()[source]#
Detect anomalies in the specified value column using InterQuartileRangeAD.
- Returns:
anomalies –
- A DataFrame that includes:
’is_anamoly’: a boolean flag indicating whether an anomaly was detected.
’value’: the original value from the input DataFrame.
- Return type:
pd.DataFrame
- get_anomaly_dates()[source]#
Retrieve the dates where anomalies were detected.
- Returns:
anomaly_dates – The dates (from the DataFrame’s index) where an anomaly was detected.
- Return type:
pd.DatetimeIndex
- visualize(figsize=(12, 6), title='Daily Profit Margin Over Time with Anomalies Highlighted', xlabel='Date', ylabel='Profit Margin Value', ylim=(40, 60))[source]#
Visualize the time series data with anomalies highlighted.
- Parameters:
figsize (tuple, default (12, 6)) – Size of the figure.
title (str, default "Daily Profit Margin Over Time with Anomalies Highlighted") – Title for the plot.
xlabel (str, default "Date") – Label for the x-axis.
ylabel (str, default "Profit Margin Value") – Label for the y-axis.
ylim (tuple, optional, default (40, 60)) – Y-axis limits.
data_generators.synthetic_sales_data#
- data_generators.synthetic_sales_data.calculate_discount(sales, promo_code, category, sales_channel, customer_loyalty)[source]#
Calculate discount based on promo code, adjusted by category, channel, and loyalty. Ensures discount doesn’t exceed sales.
- data_generators.synthetic_sales_data.calculate_fulfillment_cost(quantity, sales_channel, territory, product_weight)[source]#
Calculate fulfillment cost based on quantity, channel, territory, and weight.