PMLB Regression Datasets#
Loading regression datasets#
First, load a trained agent and get PMLB regression datasets names list. Although there are hundreds of datasets available, let’s sample 10% from the list to demonstrate the agents capabilities.
import random
import pmlb
from IPython.display import Markdown
from ostatslib.agents import PPOAgent
SAMPLE_FRACTION = 0.1
sample_size = int(len(pmlb.regression_dataset_names) * SAMPLE_FRACTION)
sampled_dataset_names = random.sample(pmlb.regression_dataset_names, sample_size)
AGENT_FILE = '../trained_ppo_model.zip'
agent = PPOAgent(AGENT_FILE)
Markdown(f'Sampled {sample_size} regression datasets: {", ".join(sampled_dataset_names)}.')
Sampled 12 regression datasets: 599_fri_c2_1000_5, 654_fri_c0_500_10, 1029_LEV, 485_analcatdata_vehicle, 225_puma8NH, 564_fried, 201_pol, 294_satellite_image, 663_rabe_266, 1096_FacultySalaries, 227_cpu_small, 547_no2.
Analyses#
Next step is to fetch data and analyze each selected dataset. PMLB provides a function to fetch data from their repo. It’s also required to add to the initial state which variable is the target.
%%capture
from ostatslib.states import State
results = []
for name in sampled_dataset_names:
data = pmlb.fetch_data(name, local_cache_dir='.pmlb_cache/')
initial_state = State()
initial_state.set('response_variable_label', 'target')
analysis = agent.analyze(data, initial_state)
results.append({"name": name, "analysis": analysis})
Results#
from IPython.display import display
for result in results:
display(Markdown(f"### {result['name']}"))
print(result['analysis'].summary())
599_fri_c2_1000_5
Analysis executed at 2024-12-23 23:54:14.515775
Final status is Complete
Initial State known features:
response_variable_label target
score 0.8598579258922238
response_unique_values_ratio 1.0
is_response_positive_values_only -1
adaboost_square_loss_regression_score_reward 0.8598579258922238
Steps:
Order Step Reward State Change
------- ------------------------------------ -------- ------------------------------------------------------
1 Is Response Positive Values Only 0.1
2 Response Unique Values Ratio 0.1 response_unique_values_ratio 1
3 AdaBoost Regression with Square Loss 0.859858 score 0.859858
adaboost_square_loss_regression_score_reward 0.859858
654_fri_c0_500_10
Analysis executed at 2024-12-23 23:54:18.778478
Final status is Complete
Initial State known features:
response_variable_label target
score 0.7779667879151972
response_unique_values_ratio 1.0
is_response_positive_values_only -1
adaboost_square_loss_regression_score_reward 0.7779667879151972
Steps:
Order Step Reward State Change
------- ------------------------------------ -------- ------------------------------------------------------
1 Is Response Positive Values Only 0.1
2 Response Unique Values Ratio 0.1 response_unique_values_ratio 1
3 AdaBoost Regression with Square Loss 0.777967 score 0.777967
adaboost_square_loss_regression_score_reward 0.777967
1029_LEV
Analysis executed at 2024-12-23 23:54:22.443075
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.005
response_inferred_dtype floating
is_response_positive_values_only 1
standardized_variables_ratio -1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype floating
4 Standardized Variables Ratio 0.1 standardized_variables_ratio -1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.005
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
485_analcatdata_vehicle
Analysis executed at 2024-12-23 23:54:24.217007
Final status is Not Complete
Initial State known features:
response_variable_label target
score 0.721723428852402
time_convertible_variable
response_unique_values_ratio 0.9791666666666666
response_inferred_dtype floating
is_response_positive_values_only 1
standardized_variables_ratio -1
are_linear_model_regression_residuals_correlated -1
are_linear_model_regression_residuals_homoscedastic -1
are_linear_model_regression_residuals_normally_distributed -0.5
n_100_estimators_random_forest_regression_score_reward -0.4246151756805713
linear_regression_score_reward 0.721723428852402
Steps:
Order Step Reward State Change
------- --------------------------------------- --------- ---------------------------------------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype floating
4 Standardized Variables Ratio 0.1 standardized_variables_ratio -1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.979167
6 Random Forest Regression 100 Estimators -0.424615 score 0.575385
n_100_estimators_random_forest_regression_score_reward -0.424615
7 Linear Regression 0.121723 score 0.721723
are_linear_model_regression_residuals_correlated -1
are_linear_model_regression_residuals_homoscedastic -1
are_linear_model_regression_residuals_normally_distributed -0.5
linear_regression_score_reward 0.721723
8 Linear Regression 0.121723
9 Linear Regression 0.121723
10 Linear Regression 0.121723
11 Linear Regression 0.121723
12 Linear Regression 0.121723
13 Linear Regression 0.121723
14 Linear Regression 0.121723
15 Linear Regression 0.121723
225_puma8NH
Analysis executed at 2024-12-23 23:55:00.870837
Final status is Not Complete
Initial State known features:
response_variable_label target
score 0.5216692082454134
response_unique_values_ratio 1.0
is_response_positive_values_only -1
adaboost_square_loss_regression_score_reward -0.4783307917545866
Steps:
Order Step Reward State Change
------- ------------------------------------ --------- -------------------------------------------------------
1 Is Response Positive Values Only 0.1
2 Response Unique Values Ratio 0.1 response_unique_values_ratio 1
3 AdaBoost Regression with Square Loss -0.489275 score 0.510725
adaboost_square_loss_regression_score_reward -0.489275
4 AdaBoost Regression with Square Loss -0.476162 score 0.523838
adaboost_square_loss_regression_score_reward -0.476162
5 AdaBoost Regression with Square Loss -0.48204 score 0.51796
adaboost_square_loss_regression_score_reward -0.48204
6 AdaBoost Regression with Square Loss -0.486956 score 0.513044
adaboost_square_loss_regression_score_reward -0.486956
7 AdaBoost Regression with Square Loss -0.479421 score 0.520579
adaboost_square_loss_regression_score_reward -0.479421
8 AdaBoost Regression with Square Loss -0.472902 score 0.527098
adaboost_square_loss_regression_score_reward -0.472902
9 AdaBoost Regression with Square Loss -0.479261 score 0.520739
adaboost_square_loss_regression_score_reward -0.479261
10 AdaBoost Regression with Square Loss -0.4819 score 0.5181
adaboost_square_loss_regression_score_reward -0.4819
11 AdaBoost Regression with Square Loss -0.4852 score 0.5148
adaboost_square_loss_regression_score_reward -0.4852
12 AdaBoost Regression with Square Loss -0.483541 score 0.516459
adaboost_square_loss_regression_score_reward -0.483541
13 AdaBoost Regression with Square Loss -0.483397 score 0.516603
adaboost_square_loss_regression_score_reward -0.483397
14 AdaBoost Regression with Square Loss -0.484468 score 0.515532
adaboost_square_loss_regression_score_reward -0.484468
15 AdaBoost Regression with Square Loss -0.478331 score 0.521669
adaboost_square_loss_regression_score_reward -0.478331
564_fried
Analysis executed at 2024-12-23 23:55:04.178782
Final status is Not Complete
Initial State known features:
response_variable_label target
response_unique_values_ratio 0.4267072213500785
is_response_positive_values_only -1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- --------------------------------------
1 Is Response Positive Values Only 0.1
2 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.426707
3 Response Unique Values Ratio -1
4 Response Unique Values Ratio -1
5 Response Unique Values Ratio -1
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
201_pol
Analysis executed at 2024-12-23 23:55:06.564421
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0007333333333333333
response_inferred_dtype floating
is_response_positive_values_only 1
standardized_variables_ratio -1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype floating
4 Standardized Variables Ratio 0.1 standardized_variables_ratio -1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000733333
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
294_satellite_image
Analysis executed at 2024-12-23 23:55:08.135082
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0009324009324009324
response_inferred_dtype floating
is_response_positive_values_only 1
standardized_variables_ratio -1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype floating
4 Standardized Variables Ratio 0.1 standardized_variables_ratio -1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000932401
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
663_rabe_266
Analysis executed at 2024-12-23 23:55:10.997639
Final status is Complete
Initial State known features:
response_variable_label target
score 0.9016138592769071
response_unique_values_ratio 0.8
is_response_positive_values_only -1
n_100_estimators_gradient_boosting_regression_score_reward 0.9016138592769071
Steps:
Order Step Reward State Change
------- ------------------------------------------- -------- --------------------------------------------------------------------
1 Is Response Positive Values Only 0.1
2 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.8
3 Gradient Boosting Regression 100 Estimators 0.901614 score 0.901614
n_100_estimators_gradient_boosting_regression_score_reward 0.901614
1096_FacultySalaries
Analysis executed at 2024-12-23 23:55:12.069061
Final status is Complete
Initial State known features:
response_variable_label target
score 0.7627897299290225
time_convertible_variable
response_unique_values_ratio 0.78
response_inferred_dtype floating
is_response_positive_values_only 1
standardized_variables_ratio -1
n_100_estimators_gradient_boosting_regression_score_reward 0.7627897299290225
Steps:
Order Step Reward State Change
------- ------------------------------------------- -------- -------------------------------------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype floating
4 Standardized Variables Ratio 0.1 standardized_variables_ratio -1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.78
6 Gradient Boosting Regression 100 Estimators 0.76279 score 0.76279
n_100_estimators_gradient_boosting_regression_score_reward 0.76279
227_cpu_small
Analysis executed at 2024-12-23 23:55:13.776720
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0068359375
response_inferred_dtype floating
is_response_positive_values_only 1
standardized_variables_ratio -1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype floating
4 Standardized Variables Ratio 0.1 standardized_variables_ratio -1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.00683594
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
547_no2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 5
3 for result in results:
4 display(Markdown(f"### {result['name']}"))
----> 5 print(result['analysis'].summary())
File ~/work/ostatslib/ostatslib/ostatslib/agents/analysis_result.py:49, in AnalysisResult.summary(self)
38 def summary(self) -> str:
39 """
40 Returns analysis summary
41
42 Returns:
43 str: analysis summary
44 """
45 return (
46 f'\nAnalysis executed at {self.timestamp}\n'
47 f'Final status is {"Complete" if self.done else "Not Complete"}\n'
48 f'Initial State known features:\n{self.__fill_initial_state_row()}\n'
---> 49 f'Steps:\n{self.__fill_summary_table_steps_rows()}'
50 )
File ~/work/ostatslib/ostatslib/ostatslib/agents/analysis_result.py:65, in AnalysisResult.__fill_summary_table_steps_rows(self)
58 table_rows: StepsRows = []
60 for i, (reward, info) in enumerate(self.steps):
61 table_rows.append((
62 i+1,
63 str(info.action_name),
64 reward,
---> 65 tabulate(self.__get_state_delta(info, i).list_known_features(),
66 tablefmt="plain")
67 ))
69 return tabulate(table_rows, steps_headers)
File ~/work/ostatslib/ostatslib/ostatslib/agents/analysis_result.py:84, in AnalysisResult.__get_state_delta(self, info, i)
79 return info.next_state - previous_state
81 raise ValueError(
82 f'Cannot write State delta, step {i-1} State is None')
---> 84 raise ValueError(f'Cannot write State delta, step {i} State is None')
ValueError: Cannot write State delta, step 6 State is None