Authority page

CalcFi Open Data.

A single landing page for every open distribution of CalcFi data: Kaggle, Hugging Face, datahub.io, data.world, DoltHub, MotherDuck, GitHub, PyPI, Anaconda, Read the Docs, Streamlit, plus the BigQuery Analytics Hub listing. Everything ships under Creative Commons Attribution 4.0 International.

Last reviewed: ·Canonical DOI: 10.6084/m9.figshare.32332290·License: CC BY 4.0

What is CalcFi Open Data?

CalcFi Open Data is a curated, free, license-clean bundle of 34 financial and macro time series mirrored from US federal primary sources (FRED, BLS, Treasury, Freddie Mac, SSA) into a single consistent schema. The bundle ships as a CSV pack plus a Parquet pack, with companion clients in Python, JavaScript, and Julia for typed access. It is the data layer behind a large fraction of CalcFi calculators on the live site, exposed publicly so that journalists, researchers, students, and downstream developers can use the same numbers without rebuilding the ingest pipeline.

Why publish it as a separate dataset instead of leaving it locked inside the calculators? Because a site is a closed surface and a dataset is an open one. A reader can verify a CalcFi calculator result against the underlying series in one click, an academic can reuse the data under CC BY 4.0 without scraping, and a future product can build on the same source without re-implementing the ingest. The cost is small (publish once, mirror everywhere). The payoff is a credible, citable, long-lived data surface that anchors CalcFi as a source rather than just another tool.

The dataset is intentionally narrow. CalcFi ingests far more than 34 series for the live calculators (state-level salary data, city rent-vs-home, county-level COL, IRS reference tables, SSA bend points, the full Treasury yield curve). The 34 series in the open bundle are the highest-reuse macro layer; the rest stays inside the application boundary. Future releases will widen the bundle as bandwidth allows.

Distribution mirrors

14 active mirrors across the major open-data registries. Each entry links to the live distribution page where you can download or install.

Per-series datasets on Kaggle

24 individually-citable Kaggle datasets, one per series. Useful when you only need a single time series and want to cite the per-series Kaggle DOI rather than the bundle.

Interactive apps (Hugging Face Spaces)

10 Gradio apps backed by the dataset. Each is open source and reproducible.

Source code repositories

Pipeline code is open source. Two repositories cover the canonical data bundle and the dbt warehouse models that downstream pipelines can subscribe to.

  • calcfi-open-data (GitHub)

    Canonical source repository with CSV + Parquet under data/, methodology under docs/, ingest under scripts/. CI auto-mints a Zenodo Software DOI on each version tag.

  • dbt-calcfi-open-data (GitHub)

    dbt project that materializes the CalcFi Open Data series as warehouse tables. Targets BigQuery, Snowflake, Redshift, and DuckDB out of the box.

BigQuery Analytics Hub listing

The dataset is listed on Google Cloud BigQuery Analytics Hub for enterprise subscribers that prefer an in-warehouse subscription model. Search for “CalcFi Open Data” inside Analytics Hub or contact hello@calcfi.app for the listing URL.

FAQ

What license covers CalcFi Open Data?

Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, adapt, redistribute, and reuse the dataset for any purpose including commercial use, provided you credit the original author, link to the license, and indicate any changes you made. Recommended attribution: "Salmisto, J. (2026). CalcFi Open Data. Figshare. DOI 10.6084/m9.figshare.32332290".

How do I install the Python package?

Run pip install calcfidata. The package exposes pandas DataFrames keyed by FRED-style series codes. The same package is published to the Anaconda channel under jeresalmisto/calcfidata for conda-managed environments. Documentation is on Read the Docs at calcfidata.readthedocs.io.

Which mirror should I use?

For research papers, cite the Figshare canonical DOI 10.6084/m9.figshare.32332290 and download from whichever mirror is most convenient. For data-science workflows, Kaggle and Hugging Face are the most integrated. For SQL analytics, MotherDuck (cloud DuckDB) gives sub-second queries; data.world has a SQL workbook. For git-style versioned data, use DoltHub. For static CSV downloads with datapackage.json, use datahub.io. All mirrors carry identical content under the same license.

How often does the dataset refresh?

The pipeline pulls from primary sources on each source agency's native cadence (nightly for Treasury and FRED, weekly for Freddie Mac PMMS, monthly for BLS CPI and CES, quarterly for BEA national accounts, annual for IRS and SSA reference tables). Refreshes propagate to the canonical Figshare DOI on each release cycle; downstream mirrors are re-synced shortly after. See the data sources page for the refresh schedule per agency.

Can I embed CalcFi data in a commercial product?

Yes. CC BY 4.0 explicitly permits commercial reuse. The only requirements are attribution back to the canonical source and an indication of any modifications. For embedded use cases (dashboards, internal reports, customer-facing widgets), the API or the PyPI package is the lowest-friction integration; the bulk CSV download is best for offline analysis.

Related authority pages