Build a CSV Sanity-Check Agent in Python with LangChain
A practical guide for data scientists to automate EDA checks with schema, nulls and describe
When you get a new CSV, what do you check first?
Typical questions are:
Which columns does the dataset contain?
Are there columns with missing values (NaNs)?
What do the descriptive statistics of the numerical columns look like?
These are standard steps of any Exploratory Data Analysis (EDA). Necessary but repetitive.
I wanted to test whether an agent can take over these EDA tasks:
My approach: Agent = LLM + Tools + Control Logic
With LangChain I built a CSV Sanity-Check Agent that can apply three tools:
schema: Returns column names and data types.
nulls: Counts missing values per column.
describe: Displays statistical key figures (mean, minimum, maximum) for numerical columns.
And yes, it works: For example, for “What’s the average age?” the agent calls df.describe() and returns a clear value.
Sure, you could do this manually (or faster with Pandas). Not revolutionary, but a good entry point into LangChain and agent logic.
What is LangChain?
LangChain is a framework that lets us do more with LLMs than text generation.
It provides building blocks so we can connect models with tools, memory and control logic instead of coding everything from scratch.
In practice, we compose a system from a few key pieces: LLM wrappers for a uniform API, small tools or chains to perform concrete tasks, optional memory to preserve short-term context and an agent executor that runs the policy loop.
What are Agents?
An agent is a system that combines an LLM with a small set of allowed tools and control logic.
Instead of answering immediately, it plans a step, selects a tool, executes it, interprets the result and decides on the next step until it can respond.
Let us think of an intern who is given a task and a list of permitted actions. More formally, an agent is a policy-driven loop that joins LLM reasoning with restricted tool access.
This is different from a fixed if-else script and from simple prompting because the system can combine multiple steps and act through the tools you define.
See an example in the image:
What else readers loved:
OneNote meets LaTeX:
I was looking for an easier way to write my bachelor thesis. And I tried a tool that surprised me in a positive way. No formatting chaos, much easier addition of citations and bibliography, no manual rearranging of page breaks. Instead, I was able to concentrate much more on writing:
→ The Smarter Way to Write Your Thesis: OneNote Meets LaTeX
→ Friends link to the articleWant a practical RAG project? How does a chatbot know which part of a PDF to answer from?
Build your own local PDF chatbot using Python, LangChain, FAISS and Mistral with this step by step guide. And learn how RAG actually works.
→ RAG in Action: Build your Own Local PDF Chatbot as a Beginner
→ Friends link to the article
Thanks for reading. Enjoy your weekend,
Sarah 💕
If this post was helpful, feel free to hit the ❤️ to help others discover it too or share it with someone who might enjoy it. Thanks!