Modern Benchmark Report¶
Este notebook resume la réplica moderna del TFG y muestra el resultado del benchmark de forma rápida: tabla comparativa, gráficas y ejemplos visuales del conjunto de evaluación.
Contexto¶
La réplica moderna compara modelos actuales de anomalib sobre data/mandarins_pynq_cropped con seeds fijas. La métrica principal es image_AUROC, con desempate por image_AUPR y después por latencia.
from pathlib import Path
import json
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import Image, display
PROJECT_ROOT = None
for candidate in [Path.cwd().resolve(), *Path.cwd().resolve().parents]:
if (candidate / 'artifacts').exists() and (candidate / 'README.md').exists():
PROJECT_ROOT = candidate
break
if PROJECT_ROOT is None:
raise RuntimeError('Could not locate the project root from the current working directory.')
benchmark_root = PROJECT_ROOT / 'artifacts' / 'modern' / 'benchmark'
final_root = PROJECT_ROOT / 'artifacts' / 'modern' / 'final_model'
leaderboard = pd.read_csv(benchmark_root / 'leaderboard.csv')
runs = pd.read_csv(benchmark_root / 'benchmark_runs.csv')
summary = json.loads((benchmark_root / 'metrics_summary.json').read_text(encoding='utf-8'))
winner_name = summary['winner']['model']
leaderboard
| model | mean_image_AUROC | std_image_AUROC | mean_image_AUPR | std_image_AUPR | mean_image_F1 | std_image_F1 | mean_latency_ms | std_latency_ms | completed_runs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | patchcore | 0.933333 | 0.049889 | 0.939841 | 0.050396 | 0.840741 | 0.036665 | 173.664240 | 2.374339 | 3 |
| 1 | anomalydino | 0.786667 | 0.018856 | 0.811111 | 0.015713 | 0.551587 | 0.170681 | 106.279883 | 1.855926 | 3 |
Ganador del benchmark¶
Aquà se ve directamente el modelo ganador y sus medias agregadas sobre las tres seeds ejecutadas.
pd.DataFrame([summary['winner']])
| completed_runs | mean_image_AUPR | mean_image_AUROC | mean_image_F1 | mean_latency_ms | model | std_image_AUPR | std_image_AUROC | std_image_F1 | std_latency_ms | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 0.939841 | 0.933333 | 0.840741 | 173.66424 | patchcore | 0.050396 | 0.049889 | 0.036665 | 2.374339 |
Gráficas rápidas¶
Estas dos vistas permiten ver de un vistazo el equilibrio entre calidad predictiva y latencia.
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].bar(leaderboard['model'], leaderboard['mean_image_AUROC'], color=['#d97706', '#2563eb'])
axes[0].set_ylim(0, 1.05)
axes[0].set_title('Mean image AUROC')
axes[0].set_ylabel('score')
axes[1].bar(leaderboard['model'], leaderboard['mean_latency_ms'], color=['#b45309', '#1d4ed8'])
axes[1].set_title('Latency per image (ms)')
axes[1].set_ylabel('ms')
fig.tight_layout()
fig
Resultados por seed¶
La tabla completa sirve para comprobar la estabilidad del comportamiento del modelo entre particiones.
runs
| seed | model | status | error_message | image_AUROC | image_F1Score | image_AUPR | latency_ms | |
|---|---|---|---|---|---|---|---|---|
| 0 | 13 | patchcore | ok | NaN | 1.00 | 0.888889 | 1.000000 | 176.98990 |
| 1 | 13 | anomalydino | ok | NaN | 0.80 | 0.571429 | 0.800000 | 107.14405 |
| 2 | 23 | patchcore | ok | NaN | 0.88 | 0.800000 | 0.876667 | 171.59989 |
| 3 | 23 | anomalydino | ok | NaN | 0.80 | 0.333333 | 0.800000 | 107.99410 |
| 4 | 42 | patchcore | ok | NaN | 0.92 | 0.833333 | 0.942857 | 172.40293 |
| 5 | 42 | anomalydino | ok | NaN | 0.76 | 0.750000 | 0.833333 | 103.70150 |
Ejemplos visuales del conjunto final¶
El pipeline moderno ya genera imágenes exportadas por anomalib. Aquà se muestran ejemplos normales y anómalos del split final para que la inspección visual sea inmediata.
gallery_root = final_root / 'Patchcore' / 'mandarine_cropped_modern' / 'v0' / 'images'
good_examples = sorted((gallery_root / 'good').glob('*'))[:3]
bad_examples = sorted((gallery_root / 'bad').glob('*'))[:3]
print('Good examples:')
for path in good_examples:
print(path.name)
display(Image(filename=str(path), width=420))
print('Bad examples:')
for path in bad_examples:
print(path.name)
display(Image(filename=str(path), width=420))
Good examples: cropped_normal_0002.jpg
cropped_normal_0008.jpg
cropped_normal_0009.jpg
Bad examples: cropped_abnormal_0001.jpg
cropped_abnormal_0003.jpg
cropped_abnormal_0004.jpg