SynthForge SynthForge SynthForge IO

Independent, reproducible benchmarks

We run SynthForge against the open-source CTGAN baseline (a GAN-based tabular synthesizer from the SDV project) on two canonical public datasets (UCI Adult and Credit Card Default), across statistical fidelity, ML utility, privacy, and constraint conformance. The raw JSON powering this page is published alongside it.

Last run: 2026-05-05T18:57:11.295000+00:00 · synthforge e76aa646c836 · raw results.json

adult

Statistical fidelity

SynthesizerOverallColumn shapesColumn pair trends
synthforge0.5170.5730.461
ctgan0.8810.8960.866

ML utility (TSTR vs TRTR, AUC)

SynthesizerLogReg TRTRLogReg TSTRGBM TRTRGBM TSTR
synthforge0.9020.6850.9160.668
ctgan0.9020.8890.9160.891

Privacy

SynthesizerDCR p5 (normalised)NNDR (median)
synthforge18.7880.998
ctgan0.2780.951

Constraint conformance

SynthesizerValidity rate
synthforge1.000
ctgan0.729

credit

Statistical fidelity

SynthesizerOverallColumn shapesColumn pair trends
synthforge0.6450.6820.609
ctgan0.9400.9190.960

ML utility (TSTR vs TRTR, AUC)

SynthesizerLogReg TRTRLogReg TSTRGBM TRTRGBM TSTR
synthforge0.7160.5130.7800.485
ctgan0.7160.7050.7800.756

Privacy

SynthesizerDCR p5 (normalised)NNDR (median)
synthforge7.2450.971
ctgan0.3800.915

Constraint conformance

SynthesizerValidity rate
synthforge1.000
ctgan1.000

Run history

Every benchmark run is appended here so you can see how the numbers change over time. We rerun whenever the generator changes meaningfully, or on a schedule. One run so far; future runs append below.

adult

Date Fidelity (overall) TSTR (LogReg AUC) DCR (privacy) Integrity
SFCTGAN SFCTGAN SFCTGAN SFCTGAN
2026-05-05 0.517 0.881 0.685 0.889 18.788 0.278 1.000 0.729

credit

Date Fidelity (overall) TSTR (LogReg AUC) DCR (privacy) Integrity
SFCTGAN SFCTGAN SFCTGAN SFCTGAN
2026-05-05 0.645 0.940 0.513 0.705 7.245 0.380 1.000 1.000

Full per-run reports: results-history.json.

Methodology

What we measure

FamilyMetricDirection
Statistical fidelity SDMetrics QualityReport: overall, column shapes, column pair trends higher = better
ML utility TSTR (Train-Synthetic-Test-Real) and TRTR AUC, for logistic regression and gradient boosting higher = better; closer TSTR-to-TRTR is the meaningful signal
Privacy DCR (5th-percentile distance to closest record, normalised by intra-real median) and NNDR (median nearest-neighbour distance ratio) DCR higher = better; NNDR closer to 1 = better
Constraint conformance Fraction of synthetic rows that satisfy all schema constraints (range, enum membership) higher = better

What we run

Both synthesizers produce a synthetic dataset of the same size as the real dataset, then every metric is run on the (real, synthetic) pair.

Datasets

Schema authoring: the important caveat

SynthForge is schema-driven, not data-fitted. For each dataset we hand-author a SynthForge schema from the public UCI data dictionary (documented column types, ranges, categorical sets, and standard demographic priors). We do not fit the SynthForge schema to the real CSV.

This is the honest framing: given only the public data dictionary, what does SynthForge produce? CTGAN, in contrast, sees the real data during training. This is a deliberate asymmetry the benchmark exists to measure. A tool that needs no data access has different operational properties from one that does.

How to read the privacy numbers

A higher DCR means synthetic rows are not close copies of real rows. An NNDR closer to 1 means a synthetic row is not anomalously close to one specific real row. SynthForge's high DCR is structural: it cannot memorise data it never saw.

Reproducibility

What we explicitly do not measure (and why)

Want to inspect the raw numbers? Download results.json. The harness source is not currently public; reach out at hello@synthforge.io if you want to review or replicate the methodology.