PMLB Classification Datasets#
Loading classification datasets#
First, load a trained agent and get PMLB classification datasets names list. Although there are hundreds of datasets available, let’s sample 10% from the list to demonstrate the agents capabilities.
import random
import pmlb
from IPython.display import Markdown
from ostatslib.agents import PPOAgent
SAMPLE_FRACTION = 0.1
sample_size = int(len(pmlb.classification_dataset_names) * SAMPLE_FRACTION)
sampled_dataset_names = random.sample(pmlb.classification_dataset_names, sample_size)
AGENT_FILE = '../trained_ppo_model.zip'
agent = PPOAgent(AGENT_FILE)
Markdown(f'Sampled {sample_size} classification datasets: {", ".join(sampled_dataset_names)}.')
Sampled 16 classification datasets: mnist, clean2, agaricus_lepiota, satimage, parity5, corral, analcatdata_creditscore, lupus, schizo, australian, analcatdata_boxing2, balance_scale, biomed, GAMETES_Epistasis_3_Way_20atts_0.2H_EDM_1_1, mux6, hypothyroid.
Analyses#
Next step is to fetch data and analyze each selected dataset. PMLB provides a function to fetch data from their repo. It’s also required to add to the initial state which variable is the target.
%%capture
from ostatslib.states import State
results = []
for name in sampled_dataset_names:
data = pmlb.fetch_data(name, local_cache_dir='.pmlb_cache/')
initial_state = State()
initial_state.set('response_variable_label', 'target')
analysis = agent.analyze(data, initial_state)
results.append({"name": name, "analysis": analysis})
Results#
from IPython.display import display
for result in results:
display(Markdown(f"### {result['name']}"))
print(result['analysis'].summary())
mnist
Analysis executed at 2024-12-23 23:53:16.283468
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.00014285714285714287
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000142857
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
clean2
Analysis executed at 2024-12-23 23:53:24.989608
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0003031221582297666
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000303122
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
agaricus_lepiota
Analysis executed at 2024-12-23 23:53:26.109863
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.00024554941682013506
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000245549
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
satimage
Analysis executed at 2024-12-23 23:53:28.633382
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0009324009324009324
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000932401
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
parity5
Analysis executed at 2024-12-23 23:53:29.586091
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0625
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.0625
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
corral
Analysis executed at 2024-12-23 23:53:30.617077
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0125
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.0125
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
analcatdata_creditscore
Analysis executed at 2024-12-23 23:53:33.525891
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.02
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ----------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.02
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
lupus
Analysis executed at 2024-12-23 23:53:36.827557
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.022988505747126436
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ---------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.0229885
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
schizo
Analysis executed at 2024-12-23 23:53:38.080557
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.008823529411764706
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.00882353
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
australian
Analysis executed at 2024-12-23 23:53:42.062402
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.002898550724637681
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.00289855
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
analcatdata_boxing2
Analysis executed at 2024-12-23 23:53:44.237954
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.015151515151515152
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ---------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.0151515
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
balance_scale
Analysis executed at 2024-12-23 23:53:45.504031
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0048
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.0048
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
biomed
Analysis executed at 2024-12-23 23:53:49.798742
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.009569377990430622
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- ----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.00956938
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
GAMETES_Epistasis_3_Way_20atts_0.2H_EDM_1_1
Analysis executed at 2024-12-23 23:53:51.130657
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.00125
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.00125
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
mux6
Analysis executed at 2024-12-23 23:53:51.651679
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.015625
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- --------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.015625
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1
hypothyroid
Analysis executed at 2024-12-23 23:53:54.824063
Final status is Not Complete
Initial State known features:
response_variable_label target
time_convertible_variable
response_unique_values_ratio 0.0006323110970597534
response_inferred_dtype integer
is_response_discrete 1
is_response_positive_values_only 1
Steps:
Order Step Reward State Change
------- -------------------------------- -------- -----------------------------------------
1 Is Response Positive Values Only 0.1
2 Time Convertible Variable Search 0.1 time_convertible_variable
3 Infer Response DType 0.1 response_inferred_dtype integer
4 Is Response Discrete 0.1 is_response_discrete 1
5 Response Unique Values Ratio 0.1 response_unique_values_ratio 0.000632311
6 Response Unique Values Ratio -1
7 Response Unique Values Ratio -1
8 Response Unique Values Ratio -1
9 Response Unique Values Ratio -1
10 Response Unique Values Ratio -1
11 Response Unique Values Ratio -1
12 Response Unique Values Ratio -1
13 Response Unique Values Ratio -1
14 Response Unique Values Ratio -1
15 Response Unique Values Ratio -1