Build a CSV Sanity-Check Agent in Python with LangChain

A practical guide for data scientists to automate EDA checks with schema, nulls and describe

Data Science Espresso by Sarah

Sep 14, 2025

When you get a new CSV, what do you check first?

Typical questions are:

Which columns does the dataset contain?
Are there columns with missing values (NaNs)?
What do the descriptive statistics of the numerical columns look like?

These are standard steps of any Exploratory Data Analysis (EDA). Necessary but repetitive.

I wanted to test whether an agent can take over these EDA tasks:

My approach: Agent = LLM + Tools + Control Logic

With LangChain I built a CSV Sanity-Check Agent that can apply three tools:

schema: Returns column names and data types.
nulls: Counts missing values per column.
describe: Displays statistical key figures (mean, minimum, maximum) for numerical columns.

And yes, it works: For example, for “What’s the average age?” the agent calls df.describe() and returns a clear value.

Terminal output after running the csv agent.

Sure, you could do this manually (or faster with Pandas). Not revolutionary, but a good entry point into LangChain and agent logic.

👉 Practical step-by-step LangChain guide for data scientists to build a CSV Sanity-Check Agent (incl. code, GitHub-Repo & mini-evaluation)

What is LangChain?

LangChain is a framework that lets us do more with LLMs than text generation.

It provides building blocks so we can connect models with tools, memory and control logic instead of coding everything from scratch.

In practice, we compose a system from a few key pieces: LLM wrappers for a uniform API, small tools or chains to perform concrete tasks, optional memory to preserve short-term context and an agent executor that runs the policy loop.

What are Agents?

An agent is a system that combines an LLM with a small set of allowed tools and control logic.

Instead of answering immediately, it plans a step, selects a tool, executes it, interprets the result and decides on the next step until it can respond.

Let us think of an intern who is given a task and a list of permitted actions. More formally, an agent is a policy-driven loop that joins LLM reasoning with restricted tool access.

This is different from a fixed if-else script and from simple prompting because the system can combine multiple steps and act through the tools you define.

See an example in the image:

How an agent uses multi-step reasoning with tools.

What else readers loved:

OneNote meets LaTeX:
I was looking for an easier way to write my bachelor thesis. And I tried a tool that surprised me in a positive way. No formatting chaos, much easier addition of citations and bibliography, no manual rearranging of page breaks. Instead, I was able to concentrate much more on writing:
→ The Smarter Way to Write Your Thesis: OneNote Meets LaTeX
→ Friends link to the article
Want a practical RAG project? How does a chatbot know which part of a PDF to answer from?
Build your own local PDF chatbot using Python, LangChain, FAISS and Mistral with this step by step guide. And learn how RAG actually works.
→ RAG in Action: Build your Own Local PDF Chatbot as a Beginner
→ Friends link to the article

Thanks for reading. Enjoy your weekend,
Sarah 💕

LangChain for EDA: Build a CSV Sanity-Check Agent in Python

If this post was helpful, feel free to hit the ❤️ to help others discover it too or share it with someone who might enjoy it. Thanks!

Data Science Espresso by Sarah Lea

Discussion about this post