Skip to Content
Atlas Research Beta 1.0 is released πŸŽ‰
Data & API Resources

Data & API Resources

A curated collection of datasets and APIs to power your research and data science workflows. Each resource includes Python integration details and access information.

NameURLTypeAccessDocumentation URLLicensing InformationCategoryNotesPython 🐍
Kaggle Datasets Platformhttps://www.kaggle.com/datasets DatasetFree (with account)https://www.kaggle.com/docs/api Varies by dataset (CC0, CC-BY, etc.)Machine LearningLarge repository of user-contributed datasets across domains; Kaggle API available for Python to download data.Yes 🐍
UCI Machine Learning Repositoryhttps://archive.ics.uci.edu/ DatasetFreehttps://archive.ics.uci.edu/ CC BY 4.0Machine LearningCollection of 600+ datasets for ML algorithms; widely used for benchmarking and teaching.Yes 🐍
OpenMLhttps://www.openml.org/ Dataset/APIFreehttps://docs.openml.org/ Open data platform (licenses vary)Machine LearningOpen platform to share datasets, models, experiments. Provides Python APIs (OpenML-Python).Yes 🐍
Hugging Face Datasets Hubhttps://huggingface.co/datasets DatasetFreehttps://huggingface.co/docs/datasets Apache 2.0 (library); dataset licenses varyNLP/CVLarge collection of ready-to-use ML datasets accessible via Python datasets library.Yes 🐍
TensorFlow Datasets (TFDS)https://www.tensorflow.org/datasets DatasetFreehttps://www.tensorflow.org/datasets/overview Apache 2.0; original dataset licenses varyMachine LearningCatalog of ML datasets (images, text, audio, etc.) accessible via Python. Integrated with TensorFlow & others.Yes 🐍
OpenAI API (GPT models)https://openai.com/api/ APIPaid (pay-as-you-go)https://platform.openai.com/docs/ Proprietary (OpenAI terms)NLPProvides AI models (GPT, DALLΒ·E, etc.) via REST API. Python client for integration in notebooks.Yes 🐍
Wikidata SPARQL APIhttps://query.wikidata.org/ API/DatasetFreehttps://www.wikidata.org/wiki/Wikidata:SPARQL_query_service CC0 1.0 Public DomainKnowledge GraphsCollaborative knowledge base of 100+ million items. SPARQL endpoint. Public domain content for data science usage.Yes 🐍
Data.gov (US Gov Open Data)https://data.gov DatasetFreehttps://data.gov Public domain or open licensesGovernment DataPortal aggregating hundreds of thousands of U.S. government datasets. Often analyzed with Python.Yes 🐍
Nasdaq Data Link (Quandl)https://data.nasdaq.com/ API/DatasetFree & Paidhttps://docs.data.nasdaq.com/ Free (some); premium subscriptionsQuant FinanceAggregator of financial, economic, alternative datasets. Many free time-series, plus premium. REST API & Python SDK.Yes 🐍
Alpha Vantagehttps://www.alphavantage.co/ APIFree (limited) & Paidhttps://www.alphavantage.co/documentation/ Free for personal/academic; paid for moreFinancial DataStock/forex/crypto market data API. Free tier allows limited calls; higher rates with paid plans. Widely used in Python.Yes 🐍
FRED (Federal Reserve Data)https://fred.stlouisfed.org/ API/DatasetFree (API key)https://fred.stlouisfed.org/docs/api/fred/ Public domain (U.S. gov)Macro-EconomicsMacro-economic time series (interest rates, GDP, etc.). Free with registration. Often accessed via Python fredapi.Yes 🐍
Bloomberg API (BLPAPI)https://www.bloomberg.com/professional/support/api-library/ APIPaid (Terminal subscription)https://bloomberg.github.io/blpapi-docs/ Proprietary (Bloomberg T&Cs)Market DataEnterprise financial data API for Bloomberg Terminal users (~$24k/yr). Python access via blpapi. Strict usage terms.Yes 🐍
SEC EDGAR Filings APIhttps://data.sec.gov/ APIFreehttps://www.sec.gov/edgar/sec-api-documentation Public domain (SEC data)Regulatory FilingsREST API for corporate filings (10-K, 10-Q, etc.). No key required but must set user agent. Commonly used with Python for financial/NLP analysis.Yes 🐍
NASA Open APIshttps://api.nasa.gov APIFree (API key)https://api.nasa.gov Public domain (NASA)Astronomy/Earth SciCatalog of NASA APIs for space/earth data. Free API key for higher rate limits. Used in Python for education, astrophysics, etc.Yes 🐍
CERN Open Data Portalhttp://opendata.cern.ch DatasetFreehttp://opendata.cern.ch/docs/guide CC0 or CC BY (varies)High Energy PhysicsReal LHC experimental data for particle physics. Often analyzed in Python (ROOT, Scikit-HEP).Yes 🐍
Sloan Digital Sky Survey (SDSS)https://www.sdss.org Dataset/APIFree (optional reg)http://skyserver.sdss.org Public domain (astronomy)Astrophysics SurveyAstronomical survey data (spectra, images). SQL query via SkyServer/CasJobs. Widely used in Python (AstroPy, Astroquery).Yes 🐍
CDS VizieR Catalog Servicehttp://vizier.cds.unistra.fr Dataset/APIFreehttps://vizier.cds.unistra.fr/vizier-doc/ Open access catalogs (citation needed)Astrophysical CatalogsThousands of published astronomical catalogs. Query by object or coordinates. Often used with Python (Astroquery). Must cite original sources.Yes 🐍
Materials Projecthttps://materialsproject.org API/DatasetFree (login)https://docs.materialsproject.org/ CC BY 4.0 (data)Materials ScienceComputed properties for inorganic materials (band gaps, etc.). Free REST API w/ key. Often used with Python (pymatgen).Yes 🐍
PhysioNethttps://physionet.org DatasetFree (some credentialed)https://physionet.org/about/ Often open-access; some restrictedBiomedical SignalsPhysiological/clinical datasets (ECG, EEG, etc.). Many open; some need credentialing. Widely used in Python (WFDB, Pandas).Yes 🐍
NCBI Entrez (E-utilities)https://www.ncbi.nlm.nih.gov APIFreehttps://www.ncbi.nlm.nih.gov/home/develop/api/ Public domain (NLM/NCBI)BioinformaticsPublic APIs for searching biological databases (PubMed, GenBank, etc.). Often used with Biopython or requests.Yes 🐍
UniProt Knowledgebase APIhttps://www.uniprot.org API/DatasetFreehttps://www.uniprot.org/help/api_queries CC BY 4.0 (UniProt data)ProteomicsProtein sequence & annotation DB. REST & SPARQL APIs. Commonly used with Python (Biopython, Bioservices).Yes 🐍
RCSB Protein Data Bank (PDB)https://www.rcsb.org API/DatasetFreehttps://data.rcsb.org/#documentation Public domain (citation suggested)Structural Biology3D biomolecular structures. No restrictions, but citing PDB is encouraged. Used with Python (Biopython, PyMOL, etc.).Yes 🐍
Cancer Genomics Data Commons (GDC)https://gdc.cancer.gov API/DatasetFree & controlledhttps://docs.gdc.cancer.gov/API/Users_Guide/ NIH/NCI public domain; some restrictedCancer GenomicsPortal for cancer genomics (TCGA, etc.). Open-access data via REST & Python SDK; raw genomic data is controlled-access.Yes 🐍
IEEE DataPorthttps://ieee-dataport.org DatasetFree/Paid (subscription)https://ieee-dataport.org Varies by datasetEngineering DataRepository for EE/related fields. Up to 2TB free upload, but downloads often need subscription. Common in signal/image processing.Yes 🐍
LibriSpeech ASR Corpushttp://www.openslr.org/12 DatasetFreehttp://www.openslr.org/12 Public Domain (LibriVox)Speech Recognition~1000 hours of English speech audio w/ transcripts for ASR. From LibriVox audiobooks. Widely used in Python (TensorFlow, PyTorch).Yes 🐍
Mozilla Common Voicehttps://commonvoice.mozilla.org DatasetFreehttps://commonvoice.mozilla.org/datasets CC0 1.0 (public domain)Speech RecognitionCrowdsourced multilingual speech dataset. Released under CC0. Commonly used in Python (Torchaudio, etc.).Yes 🐍
ImageNethttp://www.image-net.org DatasetFree (research only)http://www.image-net.org/download Non-commercial research onlyComputer Vision14+ million labeled images across 20k categories. Free for non-commercial research; acceptance of terms needed. Fueled DL breakthroughs.Yes 🐍
COCO (Common Objects in Context)https://cocodataset.org DatasetFreehttps://cocodataset.org/#home Images (Flickr), labels CC BY 4.0Computer Vision~330K images with dense annotations. Widely used in CV tasks. Python tools (pycocotools) for loading/eval.Yes 🐍
OR-Library (Beasley’s OR Library)http://people.brunel.ac.uk/~mastjjb/jeb/orlib.html DatasetFreehttp://people.brunel.ac.uk/~mastjjb/jeb/info.html Free for research useOperations ResearchCollection of benchmark datasets for TSP, set covering, bin packing, etc. Plain text files. Commonly used with Python OR-Tools, PuLP, etc.Yes 🐍
TSPLIBhttp://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ DatasetFreehttp://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ Public domain (benchmark instances)Combinatorial OptStandard library of TSP & related problems. Widely used to test optimization algorithms. Data parseable in Python (NetworkX, TSPLIB parsers).Yes 🐍
NEOS Optimization Serverhttps://neos-server.org/neos/ APIFreehttps://neos-guide.org/neos/ Free service (academic/public)OptimizationCloud-based solver service. Users submit LP/IP/NLP models; NEOS runs them on hosted solvers. Often used with Python Pyomo or similar.Yes 🐍
Last updated on