# Comparison with other Python packages This page positions eunoia among the other Python packages for set-membership diagrams. It has two halves: a **qualitative map** of the landscape, and a **quantitative benchmark** of the packages that actually solve the same problem eunoia does — fitting *area-proportional* Euler diagrams. We deliberately leave out [eulerr](https://github.com/jolars/eulerr): it is eunoia's sibling for R, built on the same Rust core, so comparing the two would measure nothing about the Python ecosystem. ## The landscape "Draw a Venn diagram" covers several genuinely different problems, and the Python packages split along those lines. Only the first group is directly comparable to eunoia. ```{list-table} :header-rows: 1 :widths: 22 16 10 16 18 18 * - Package - Area-proportional - Max sets - Shapes - Method - License * - **eunoia** - yes - arbitrary - circle, ellipse, square, rectangle - numerical optimization (Rust core); reports residuals + goodness-of-fit - MIT * - [matplotlib-set-diagrams](https://github.com/paulbrodersen/matplotlib_set_diagrams) - yes - arbitrary - circles - optimization-based layout - GPL-3 * - [matplotlib-venn](https://github.com/konstantint/matplotlib-venn) - yes - 3 - circles - closed-form (2 sets); cost-based layout (3 sets) - MIT * - [BioVenn](https://pypi.org/project/BioVenn/) - yes - 3 - circles - area-proportional, with biological ID mapping - MIT * - [vennplot](https://pypi.org/project/vennplot/) - yes - 3 - circles / balls (2D + 3D) - area-proportional - MIT * - [supervenn](https://github.com/gecko984/supervenn) - yes (exact) - many - bars / chunks - splits sets into parts; not an Euler diagram - MIT * - [pyvenn / venn](https://github.com/tctianchi/pyvenn) - no - 6 - fixed templates - static shapes; only labels move - MIT * - [matplotlib-subsets](https://pypi.org/project/matplotlib-subsets/) - no - — - nested rectangles - set hierarchy, not an Euler layout - MIT * - [eule](https://github.com/quivero/eule) - n/a - arbitrary - none - set algebra only — computes region sizes, draws nothing - MIT ``` Reading the table: - **Genuine area-proportional fitters** — `matplotlib-set-diagrams` and `matplotlib-venn` — are the only packages that solve eunoia's problem, so they are the ones we benchmark below. `BioVenn` and `vennplot` are also area-proportional but capped at three circles, so they add little signal beyond `matplotlib-venn`. - **A different representation** — `supervenn` is exactly proportional but draws bar/chunk strips rather than overlapping shapes. It is a great tool, just not an Euler diagram, so it can't be scored on the same geometry metric. - **Not area-proportional** — `pyvenn`/`venn` and `matplotlib-subsets` use fixed templates or nested rectangles; the picture does not encode the set sizes. - **Complementary, not a competitor** — `eule` computes the disjoint region sizes from membership and draws nothing. It is the kind of preprocessing that *feeds* a fitter; eunoia does the same thing internally when you pass it membership lists (`eu.euler({"A": [...], "B": [...]})`) or a pandas/polars DataFrame used as a membership matrix. ## Benchmark ### Why the comparison is grouped by objective Comparing area-proportional fitters is subtle because **they do not all minimize the same thing**, and scoring a fitter on an objective it never targeted is unfair. Each package minimizes a different loss: | Package / config | Minimizes | |---|---| | `matplotlib-venn` `venn2` | closed-form exact | | `matplotlib-venn` `venn3` (default) | Σ\|log(1+fitted) − log(1+target)\| (logarithmic L1) | | `matplotlib-set-diagrams` | a selectable cost: `"squared"` (Σ(f−t)²), `"simple"` (Σ\|f−t\|), `"logarithmic"`, `"relative"`, `"inverse"` | | **eunoia** | a selectable `loss=`: `"sum_squared"`, `"sum_absolute"`, `"log_sum_absolute"`, `"stress"`, `"diag_error"`, … | So the only fair comparison is **within an objective**: pick a loss family, run the packages that can minimize it (configuring each to do so), and score them on that same loss. eunoia and `matplotlib-set-diagrams` are configurable, so they appear in several groups; `matplotlib-venn` is fixed, so it appears only in the logarithmic group its `venn3` default defines. Each group is scored on a **scale-invariant** version of its loss — a single multiplicative scale on the fitted areas is absorbed, because each package draws its diagram at an arbitrary size. For the squared family this scale-invariant score is exactly venneuler/eulerr **`stress`**; the absolute and logarithmic families use the analogous scale-invariant L1 and log-L1. Of these packages, only eunoia reports any goodness-of-fit number itself; the harness re-measures every fitter identically (rasterizing the returned shapes) and validates that eunoia's rasterized `stress` matches the value eunoia computes analytically (they agree to grid resolution). The specifications are a curated subset of the eunoia Rust corpus (`crates/eunoia/src/test_utils/corpus.rs`), itself ported from [eulerr's reproducibility tests](https://github.com/jolars/eulerr) and real datasets from the eulerr issue tracker. They span 2 to 6 sets and include layouts circles provably cannot fit exactly, plus real biology and kinase data. The full harness lives in [`benchmarks/`](https://github.com/jolars/eunoia-py/tree/main/benchmarks); reproduce with `task benchmark`. ### Accuracy, grouped by objective ```{include} _generated/benchmark_table.md ``` ```{figure} _static/benchmarks/objective_groups.png :alt: Grouped bar charts, one panel per objective, log scale. :width: 100% Each panel is one objective; bars are the scale-invariant score for that objective (lower is better, log scale). Within a panel every fitter minimized the same loss, so the comparison is apples-to-apples. Bars are absent where a package cannot represent that set count. ``` Three things stand out: 1. **Matched on the same objective, eunoia's circles beat the other circle fitters — in every group.** In the squared-error group eunoia's circles reach a lower `stress` than `matplotlib-set-diagrams("squared")` on every case, often by an order of magnitude on the harder specs; the absolute-error group tells the same story with `matplotlib-set-diagrams("simple")`. Most tellingly, in the **logarithmic** group — `matplotlib-venn`'s *own* default objective — eunoia's circles beat both `matplotlib-venn` and `matplotlib-set-diagrams("logarithmic")` on every case (the two competitors are neck-and-neck with each other, as expected since they minimize the same thing). Given the same loss, eunoia's optimizer simply lands closer. 2. **Ellipses then win outright.** eunoia's ellipses are the best fit in every group, on every case — reaching essentially zero error under the squared loss, and the lowest error by a wide margin under the absolute and logarithmic ones. This is geometry no circle-only package can match. 3. **`matplotlib-venn` is fixed and capped.** It offers no choice of objective (its `venn3` is a fixed logarithmic layout) and cannot draw four or more sets. `matplotlib-set-diagrams` scales to any set count and is configurable, but loses to eunoia within every shared objective. eunoia only joined the logarithmic group because the core gained a `"log_sum_absolute"` loss in 1.1 (closing [jolars/eunoia#96](https://github.com/jolars/eunoia/issues/96)); choosing the objective to match the data is itself part of what eunoia offers here. ### Wall-clock fit time Accuracy is not the whole story; here is end-to-end `fit` time (one representative configuration per package). ```{include} _generated/benchmark_timing.md ``` ```{figure} _static/benchmarks/timing.png :alt: Grouped bar chart of median fit time per case, log scale. :width: 100% Median fit time per case (log scale), each package under the same configuration as the gallery. `matplotlib-venn` is fastest but capped at three sets; eunoia and `matplotlib-set-diagrams` are broadly comparable, both taking up to a few seconds on the hardest high-set specs. (Separately, eunoia's non-smooth losses — `"sum_absolute"`, `"log_sum_absolute"` — are markedly slower to optimize than the smooth `"sum_squared"` default shown here.) ``` ### Fitted layouts ```{figure} _static/benchmarks/gallery.png :alt: Grid of fitted layouts, one column per fitter, one row per case. :width: 100% Fitted layouts on representative corpus cases (eunoia under its default squared loss; set-diagrams under `"squared"`). The four-, five-, and six-set rows show `matplotlib-venn` dropping out, and eunoia's ellipse column staying faithful where the circle columns visibly distort. ``` ## When to reach for what - **eunoia** when you want the most faithful diagram — especially with ellipses, with four or more sets, when you need to choose the objective (`loss=`) to suit your data, or when you need the residuals and goodness-of-fit numbers to judge whether the diagram can be trusted at all. MIT-licensed. - **matplotlib-venn** for a quick, dependency-light two- or three-circle Venn where exactness is not critical. Also MIT. - **matplotlib-set-diagrams** if you specifically want its word-cloud subset labels and are comfortable with GPL-3 — and be prepared to try its `cost_function_objective` options, since the default (`"inverse"`) fits area-dominated diagrams poorly. - **supervenn** when exact proportionality matters more than the Euler-diagram shape, e.g. many sets with complex overlaps. ## Reproducing ```bash task benchmark # or uv sync --group benchmark uv run --group benchmark python -m benchmarks.run ``` The competitor packages are confined to an isolated `benchmark` dependency group and are never part of the published `eunoia` wheel. We only run them to measure fit quality; `matplotlib-set-diagrams` is GPL-3, so no competitor source is vendored into or redistributed by eunoia. ```{note} The numbers and figures on this page are committed to the repo and refreshed by running the benchmark; the documentation build itself does **not** install or execute the competitor packages. ```