Data analyst portfolio: what to include
Contents:
Why analysts need a portfolio
A portfolio is proof of work, not a list of claims. A resume says "I know SQL"; a portfolio shows the exact join you wrote, the cohort you built, and the recommendation you drew from the chart. For junior analysts without commercial experience, the portfolio is often the single signal that decides whether the recruiter at Stripe, DoorDash, or Notion clicks "schedule phone screen" or "reject."
The dirty secret of entry-level hiring is that almost every applicant lists the same coursework — Coursera, DataCamp, Google Data Analytics Certificate — and almost none of them carry that work into a repo a hiring manager can scan in ninety seconds. A portfolio with three solid projects, each with a written conclusion, beats ten certificates every time.
Load-bearing rule: every chart in your portfolio needs a sentence underneath it explaining what a business should do with that information. Charts without conclusions are noise. Conclusions are the entire point.
Who actually needs one:
| Career stage | Portfolio status | Project count |
|---|---|---|
| Junior, no commercial experience | Required | 2–3 deep projects |
| Career switcher (QA, marketing, ops) | Required | 2–3 with domain angle |
| Mid-level with 2+ years | Strongly recommended | 1–2 showing depth |
| Senior with 5+ years | Optional | Replace with case studies in interview |
Five project archetypes
Pick projects that cover different skills. A portfolio with three EDA notebooks signals one-trick pony. Mix the archetypes below.
1. Exploratory data analysis (EDA)
Load a public dataset, clean it, find patterns, visualize. The classic Jupyter notebook with Pandas, Matplotlib, and Seaborn. Good source data: Airbnb listings for a major city, Spotify track features, NYC TLC taxi trips, Uber pickup geodata.
Minimum bar: data load, null and duplicate handling, 5–7 charts, 3–5 written insights with business interpretation. A "business interpretation" is one sentence longer than "users in Brooklyn prefer entire apartments" — it ends with and therefore the host should X or the platform should Y.
2. Pure SQL project
A series of SQL queries solving a business problem end to end. Cohort retention, funnel conversion, RFM segmentation, time-to-second-purchase. Use a public Postgres-flavored dataset or load Kaggle CSVs into a local Postgres instance.
Minimum bar: 5–10 queries with comments, a README explaining the business question and the answer, and at least one query that uses a CTE plus a window function (the combination recruiters scan for).
3. Interactive dashboard
A live dashboard in Tableau Public, Looker Studio, or Metabase Cloud. The hiring manager should be able to click the link in your README and play with it inside thirty seconds — no login wall, no broken embed.
Minimum bar: 3–5 visualizations, at least two interactive filters, and a KPI block at the top. Pro move: write each chart title as the conclusion ("Brooklyn revenue grew 34% YoY") instead of the topic ("Revenue by borough"). Conclusion-titled charts make non-analysts read your dashboard correctly without you in the room.
4. A/B test simulation
Take a dataset with two cohorts, formulate a hypothesis, run the statistical test, write the recommendation. This is the project that separates analysts who talk about experimentation from analysts who can run one.
Minimum bar: clearly stated null and alternative hypothesis, power calculation up front (not after looking at the result), a Z-test or t-test in Python with scipy.stats, a p-value, a confidence interval, and a one-paragraph business recommendation. Include at least one section on what would invalidate the result — survivorship bias, peeking, sample-ratio mismatch.
5. End-to-end project
The full loop: ingest → clean → analyze → conclude → recommend. The most common version is a churn analysis: pull the IBM Telco Customer Churn dataset (7,043 customers, 21 features), explore drivers, build a simple logistic regression or decision tree, and write a retention playbook.
Minimum bar: the deliverable looks like something a PM could forward to their VP. That means an executive summary at the top, methodology in the middle, code linked at the bottom — not a 40-cell notebook that buries the conclusion in cell 38.
How to structure the GitHub repo
A single repository called data-analyst-portfolio with one folder per project beats five separate repos. Recruiters spend roughly forty seconds on the top README before deciding to click deeper.
data-analyst-portfolio/
├── README.md ← landing page with project links + 1-line outcomes
├── 01-eda-airbnb-nyc/
│ ├── README.md
│ ├── notebook.ipynb
│ └── data/ ← raw CSV or a download script
├── 02-sql-cohort-retention/
│ ├── README.md
│ ├── schema.sql
│ └── queries/
│ ├── 01_cohort_table.sql
│ ├── 02_retention_curve.sql
│ └── 03_revenue_by_cohort.sql
├── 03-dashboard-ecommerce/
│ └── README.md ← link + screenshot of Tableau Public
└── 04-ab-test-pricing/
├── README.md
└── analysis.ipynbThe top-level README is your homepage. Lead with a two-sentence positioning line — who you are, what you analyze — and then a table of projects with one-sentence outcomes. Not "I did EDA on Airbnb." Try: Found that hosts with response rate above 90% earn 28% more per night, controlling for location and listing type. That sentence is the difference between a click and a skip.
Project README template
Every project folder needs a README that a hiring manager can scan without opening a single notebook. Use this skeleton:
# Customer Churn: Drivers and Retention Playbook
## Question
What predicts churn for a SaaS telecom product, and which interventions
should the retention team prioritize for Q3?
## Data
IBM Telco Customer Churn (Kaggle): 7,043 customers, 21 features.
## Tools
Python (pandas, scikit-learn, matplotlib), SQL (Postgres).
## Headline findings
1. Month-to-month contracts churn 3.1x more than two-year contracts.
2. Customers without tech support churn 2.4x more than those with it.
3. The first 6 months hold 62% of all churn events.
## Recommendations
- Offer a discount on annual contracts during onboarding month one.
- Proactive outreach to month-to-month users without tech support.
- Build a churn risk score in production using the top 5 features.
## Reproduce
`pip install -r requirements.txt` → run `notebook.ipynb` end to end.Note what is missing from the template: any sentence that starts with "I learned" or "I practiced." Hiring managers do not care what you learned; they care what the business should do. Phrase everything as a recommendation, not a reflection.
Common pitfalls
The single most common portfolio failure is shipping unmodified course projects. Recruiters at Airbnb, Snowflake, and Linear have seen the same Coursera Capstone fifty times this quarter. If you must use a course project as the starting point, do not ship it as is — add your own hypothesis, swap the visualization library, expand the dataset, or rewrite the conclusion section around a different business question. The course version is a baseline, not a deliverable.
The second is charts without conclusions. A notebook with twenty Seaborn plots and zero interpretation tells the reviewer that you can call sns.barplot but cannot think. Every chart needs a one-sentence caption answering so what? — and that caption should mention a metric, a magnitude, and an action. If the caption could be deleted without losing meaning, the chart should be too.
The third is portfolio bloat. Two strong projects with airtight READMEs beat ten thin notebooks. Reviewers do not finish projects four through ten; they form an opinion in the first two and skim the rest. Pick your strongest work and cut the rest. A portfolio is a highlight reel, not a homework folder.
The fourth is missing or stub READMEs. A repository without a README is a closed door. The reviewer is not going to scroll through .ipynb files to figure out what the project does. If only one piece of writing in your portfolio gets read end to end, it will be the top-level README — invest accordingly.
The fifth is dirty code on display: hardcoded absolute paths like C:/Users/me/Desktop/data.csv, leftover debug prints, commented-out cells, credentials in plaintext, datasets that exceed GitHub's file-size limit. Before publishing, run a fresh clone in a clean environment and verify that the notebook executes top to bottom. If it does not, neither will the reviewer's patience.
Where to source data
Public datasets are fine — recruiters do not care whether you used proprietary data, they care whether your analysis was sharp. The strongest sources:
| Source | What's there | Best for |
|---|---|---|
| Kaggle Datasets | Thousands of cleaned CSVs with descriptions | EDA, ML, A/B sims |
| Google Dataset Search | Index across public datasets globally | Niche domains |
| Our World in Data | Socioeconomic, health, climate panels | Long-horizon trends |
| BigQuery Public Datasets | Live SQL-queryable warehouses | Pure SQL projects |
| data.gov | US government open data | Policy, transport, health |
| Your own app exports | Spotify Wrapped, fitness apps, bank CSVs | Storytelling angle |
The unfair-advantage move is the last row: a portfolio project built on data only you have access to — your own Strava history, your podcast listening logs, your Notion task exports — gives you a story no other applicant can copy. The dataset can be tiny. The angle is what sells.
Related reading
- Data analyst resume guide
- Complete guide to becoming a data analyst
- How to become a data analyst from scratch
- How to land a FAANG data analyst role
- SQL for cohort analysis
- SQL on the data analyst interview
If you want to pressure-test the SQL parts of your portfolio against the exact patterns hiring managers grill on, NAILDD ships 500+ SQL problems built around real interview prompts at FAANG, Stripe, and Snowflake.
FAQ
How many projects do I need for a junior role?
Two or three deep projects beats ten shallow ones. The recommended mix is one EDA notebook, one pure SQL project, and either a dashboard or an A/B test simulation. Recruiters at Notion, Airbnb, and Stripe optimize their first scan for breadth across tooling (Python, SQL, viz) and depth in one place (a project with a real recommendation). Anything beyond three projects starts diluting the average rather than raising the ceiling.
Do I have to use GitHub?
Yes, for code projects. GitHub is the universal default — recruiters use it to gauge how you handle version control, file structure, and documentation. For dashboards, link to Tableau Public, Looker Studio, or a deployed Streamlit app, but still mention the dashboard in your top-level GitHub README so everything lives behind a single URL on your resume.
Can I use Kaggle datasets, or does it have to be original data?
Kaggle datasets are fine. The analysis is what's evaluated, not the source of the rows. The one rule: do not fork a top-voted notebook and reskin it — reviewers recognize the popular Telco Churn analyses on sight. Use the dataset, ignore the existing notebooks, and bring your own questions.
How do I make my portfolio stand out?
Conclusions and recommendations. Roughly 90% of analyst portfolios stop at charts and call it done. The 10% that include a written recommendation at the end of every project move to the top of the stack instantly. Read your project README out loud: if it ends with a chart instead of a sentence starting with "The retention team should...", you're still in the 90%.
Should I include a machine learning project?
For an analyst role, optional but useful. A simple logistic regression or decision tree on a churn or conversion problem signals comfort with predictive thinking without overcommitting to a data-scientist title. Skip deep learning unless you're targeting roles that explicitly require it — a CNN on cat photos is not a data-analyst signal.
How long should I spend on the portfolio before applying?
Two to four weekends per project, capped at three projects, then ship and start applying. Iterating on a published portfolio while you interview is more valuable than polishing in private. Recruiter feedback from rejected applications is the fastest signal on what to fix next — feedback you cannot get until the portfolio is live.