Claude Code Plugins That Actually Matter
Six tools that turn Claude Code from a coding assistant into a full operating system. Adversarial reviews, autonomous experiments, persistent memory, and 200+ app integrations.
Claude Code by itself is already the most capable coding agent available. But the plugin and skill ecosystem around it is where things get interesting. Most people are still using Claude Code the way they used ChatGPT in 2024, paste a problem in, get an answer out. That's fine for quick fixes. It's not how you build systems.
I've been going through the plugins, skills, and CLI tools that have shipped in the last few months. Some are genuinely useful. Some are overhyped. Here are six that change how you work.
Codex Plugin: a second opinion from a different brain
OpenAI shipped a Codex plugin that runs inside Claude Code. You run /codex:adversarial-review and it sends your code to OpenAI's models for a pressure test. It looks for race conditions, bad tradeoffs, edge cases Claude might have missed, and architectural problems you didn't ask about.
The idea of running one AI's code through another AI's review sounds like a gimmick. It's not. Different model families have different blind spots. Claude tends to be thorough but sometimes overengineers. Codex tends to be pragmatic but sometimes cuts corners. Having both look at the same code catches things neither would catch alone.
It costs OpenAI tokens on top of your Anthropic usage, so it's not free. But for anything going to production, spending a few cents on an adversarial review is cheaper than debugging a race condition at 2am.
Obsidian Skills: persistent memory that actually works
Steph Ango, the CEO of Obsidian, published an official skills repo that teaches Claude Code how to work natively with Obsidian. Wikilinks, frontmatter, Bases, JSON Canvas, all the Obsidian-specific formats that Claude wouldn't otherwise understand.
Why this matters: Obsidian is the best second brain tool for technical people, and Claude Code is the best coding agent. Connecting them means your agent can pull from your knowledge base while it works. Project notes, architecture decisions, meeting summaries, technical references. Instead of re-explaining context every session, Claude reads your Obsidian vault and understands what you've already decided.
If you're not using Obsidian, this particular plugin isn't relevant. But the pattern it establishes is important. Your agent should have access to your knowledge base, whatever form that takes. The days of starting every conversation from zero context are ending.
Auto Research: the Karpathy experiment loop
Andrej Karpathy released Auto Research in March 2026. It's a 630-line Python script that does something deceptively simple: it points Claude Code at a training repo, lets it modify the code, trains for about five minutes, keeps improvements, discards regressions, and loops.
Karpathy left it running overnight on code that was already well-tuned. It ran 83 experiments autonomously. Fifteen of them were genuine improvements, including catching an attention bug that humans had missed. The result was an 11% speedup on already-optimized code. The repo has over 21,000 stars on GitHub.
This is not a coding assistant. This is autonomous scientific experimentation. The agent forms a hypothesis about what might improve performance, tests it, measures the result, and decides whether to keep the change. It does this in a loop without human intervention.
For anyone working in machine learning, this changes the workflow fundamentally. You don't sit there trying variations manually anymore. You describe what you're optimizing for, point Auto Research at the codebase, and come back to a list of improvements with data backing each one. Some will be obvious things you would have tried eventually. Some will be things you never would have thought of.
Skill Creator: building and testing your own skills
This is an official Anthropic tool and it's probably the most underrated one in the ecosystem. Skill Creator doesn't just help you write Claude Code skills. It tests them.
You create a skill, and Skill Creator runs automated evaluations on it. Blind A/B comparisons where two versions are judged without knowing which is which. Trigger tuning to make sure the skill activates at the right times. Multi-agent parallel evaluation to test the skill under different conditions. It measures token usage, execution time, and output quality, then gives you objective data on whether the skill is actually worth adding to your stack.
Most people writing Claude Code skills are guessing whether they work well. Skill Creator removes the guessing. You get actual numbers on performance, cost, and reliability before you commit to using a skill in production.
If you're building skills for your team or for clients, this is the tool that turns skill development from art into engineering.
notebooklm-py: querying your research from the terminal
This one is a community project, not an official Google product. It's a Python wrapper around NotebookLM's internal APIs that lets you query your NotebookLM sources from the command line and from Claude Code.
The use case is straightforward. You've uploaded research papers, documentation, or reference material to NotebookLM. Now you can ask questions about that material directly from your coding environment without switching tabs or copy-pasting. Claude Code calls the NotebookLM skill, gets the answer, and keeps working.
The caveat is real: it uses undocumented Google endpoints that can break at any time. Google hasn't officially blessed this integration. Use it knowing that it might stop working if Google changes their internal APIs. For now, it works, and it's genuinely useful if NotebookLM is part of your research workflow.
Composio: 200+ app integrations without building auth
Composio is the infrastructure layer that connects Claude Code to everything else. Gmail, Slack, Google Drive, Notion, Salesforce, over 200 services. It handles all the OAuth flows, authentication, and API formatting, then exposes each service as an MCP tool that Claude Code can call natively.
The practical impact is that Claude Code stops being just a coding tool and becomes an operating system. You tell it to draft an email based on the latest commit, and it actually sends the email through Gmail. You tell it to update a client's project status, and it writes to Notion. You tell it to pull the latest sales data, and it reads from Salesforce.
Without Composio, building each of these integrations means writing OAuth flows, handling token refresh, parsing API responses, and dealing with rate limits. For each service. Individually. Composio handles all of that so you can focus on what the agent should do, not how it connects to things.
There's some vendor lock-in risk since you're routing through Composio's platform. But the time it saves on auth plumbing is significant, especially if you're building agent workflows that need to touch multiple services.
What this means for how we work
These six tools represent a shift in what Claude Code actually is. It started as a coding assistant. With the right plugins, it's becoming something closer to a personal operating system that can code, research, review, remember, experiment, and interact with every tool you use.
The important thing is that none of these are theoretical. They're all shipping and usable right now. The Codex plugin is production-ready. Obsidian Skills is maintained by the CEO of Obsidian. Auto Research has been validated by Karpathy himself with published results. Skill Creator is first-party Anthropic. Composio has a real company behind it with active development.
If you're building agent workflows, the combination of these tools with Claude Managed Agents gives you a stack that would have been science fiction eighteen months ago. Persistent agents with memory, browser automation through Open CLI, app integrations through Composio, code quality through Codex adversarial reviews, and autonomous experimentation through Auto Research.
The question isn't whether AI agents can do real work anymore. The question is whether you've set up the right tools to let them.