tkml-research-suite

Documentation

# TKML Research Suite — AI/ML

> **Enterprise-grade Claude plugin for AI/ML research, MLOps automation, and autonomous research workflows.**
> Built by [TechKnowmad AI](https://techknowmad.ai) for serious AI/ML practitioners.

[![Version](https://img.shields.io/badge/version-1.0.0-blue.svg)](CHANGELOG.md)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Claude](https://img.shields.io/badge/Claude-Cowork%20%2F%20Claude%20Code-orange.svg)](https://claude.ai)
[![Frameworks](https://img.shields.io/badge/frameworks-PyTorch%20%7C%20JAX%20%7C%20TF-red.svg)]()

---

## What It Does

Three intelligent layers added to your Claude sessions:

**1. Domain-Aware Prompt Architecture** *(automatic)* — Fires on every message. Classifies your request (ML research, MLOps, agents, DevOps), scores clarity 1–10. Clear prompts pass through silently. Vague ML/research/agent requests get 1–3 targeted, domain-specific clarifying questions. Eliminates wasted iterations on complex tasks.

**2. MLOps Quality Gate** *(automatic)* — Fires before writing any Python file. Audits ML training scripts for the four most-missed production patterns: reproducibility seeds, experiment tracking, clean paths, and device detection. Offers to auto-fix before writing.

**3. Research & Development Commands** *(on demand)* — Four slash commands: deep research synthesis, full experiment scaffolding, MLOps infrastructure generation, and ML code review.

---

## Quickstart

### Install
1. Download [`tkml-research-suite.plugin`](../../releases/latest)
2. Open Claude Desktop → Plugins → Install → select the file
3. Restart Claude — hooks activate automatically on next message

No configuration required. No external API keys needed.

### First Use in 60 Seconds

**The hook saves you an iteration:**
```
train a model on my NER dataset
```
> Quick context check:
> 1. Dataset size and label schema? (e.g., 8k sentences, CoNLL-style BIO tags)
> 2. HuggingFace Trainer, PyTorch custom loop, or spaCy?
> 3. Target metric and compute budget?

**Scaffold a complete experiment:**
```
/experiment fine-tune DistilBERT on 15k customer intent dataset, 8 classes, W&B tracking, RTX 3090
```
Generates full directory structure, training loop, Hydra configs, Dockerfile, CI/CD workflows, and Makefile.

**Deep research with saved brief:**
```
/research speculative decoding — production-ready approaches 2024
```
Searches arXiv, engineering blogs, and GitHub. Synthesizes findings. Saves `research-speculative-decoding-YYYYMMDD.md`.

**MLOps quality gate in action:**

When you write a training script without seeds or tracking, you'll see:
> MLOps gaps:
> • SEED → add `torch.manual_seed(42); np.random.seed(42)` before model init
> • TRACKING → add `wandb.init(project="...", config=cfg)`
>
> → Proceed as-is, or I'll add these automatically?

---

## Commands Reference

| Command | Description | Example |
|---------|-------------|---------|
| `/research <topic>` | Multi-source AI/ML research → synthesized brief | `/research mixture of experts routing strategies` |
| `/experiment <desc>` | Complete ML experiment scaffold | `/experiment train GPT-2 small on code, JAX/Flax, MLflow` |
| `/mlops <type>` | MLOps infrastructure generation | `/mlops docker ci-cd serving` or `/mlops all` |
| `/review-ml <path>` | ML code review: 6 dimensions, line-level findings | `/review-ml src/training/trainer.py` |

### `/mlops` component types

| Type | What Gets Generated |
|------|---------------------|
| `docker` | Multi-stage Dockerfile + docker-compose + .dockerignore optimized for ML |
| `ci-cd` | 4 GitHub Actions workflows (test, train, evaluate, deploy) |
| `tracking` | Unified W&B + MLflow tracker wrapper + run naming conventions |
| `serving` | FastAPI inference server with async batching + Prometheus metrics |
| `monitoring` | Drift detection + Grafana dashboards + Prometheus alert rules |
| `k8s` | K8s deployment, HPA, service, configmap, secret template manifests |
| `all` | Everything above |

### `/review-ml` scoring dimensions

| Dimension | What Gets Checked |
|-----------|-------------------|
| ML Correctness | Data leakage, loss function, DataLoader shuffle, eval mode, gradient handling |
| Reproducibility | Seeds (torch/numpy/random/CUDA), config externalization, env pinning |
| Experiment Tracking | W&B/MLflow init, hyperparameter logging, metric granularity, artifact versioning |
| Performance | num_workers, mixed precision, CPU syncs in training loop, memory leaks |
| Code Quality | Hardcoded paths, type hints, structured logging, error handling |
| Production Readiness | Dynamic device detection, OOM handling, checkpoint resume logic |

---

## Hooks: Detailed Behavior

### Prompt Architect

```
Every user message
    ├── Bypass (starts * / # | follow-up | conversational) → silent pass-through
    ├── Domain: GENERAL                                    → silent pass-through
    ├── Domain: ML/MLOps/Agents/DevOps + clarity ≥ 7     → silent pass-through
    └── Domain: ML/MLOps/Agents/DevOps + clarity < 7
            └── ask_user: 1–3 targeted, consequential questions
                    ├── User answers → executes with full context
                    └── User skips  → executes with available context
```

**Bypass prefixes** (skip Prompt Architect entirely):

| Prefix | When to use |
|--------|-------------|
| `*` | Force-proceed on any prompt: `* just do it exactly as described` |
| `/` | All slash commands bypass automatically |
| `#` | Memory updates, instruction overrides |

**Token overhead:** ~250 tokens when fired; zero for bypassed messages.

### MLOps Guard

```
Before every Write or Edit tool call
    ├── Not a .py file                        → approve silently
    ├── Test/config/utility filename pattern  → approve silently
    ├── No ML imports in content              → approve silently
    └── ML training script detected
            ├── 2–4 checks pass → approve silently
            └── 0–1 checks pass
                    └── ask_user: specific gaps + one-line fixes
                            ├── "proceed"   → write file as-is
                            └── "add them"  → Claude inserts patterns, then writes
```

**Four checks:** seed setup · experiment tracking · no hardcoded paths · dynamic device detection

**Token overhead:** ~175 tokens per Python write when fired; zero for non-ML files.

---

## Skills (on-demand, loaded when triggered)

| Skill | Trigger phrases |
|-------|----------------|
| `prompt-architect` | "improve this prompt", "prompt engineering", "why did this prompt fail", "craft a prompt for [task]" |
| `research-workflow` | "design an experiment", "ablation study", "literature review", "evaluation framework", "research roadmap" |
| `mlops-standards` | "MLOps best practices", "reproducibility standards", "deployment checklist", "production ML", "MLOps maturity" |

Each skill uses progressive disclosure: core knowledge in SKILL.md (~1,500 tokens), detailed references loaded on demand.

---

## Research Synthesizer Agent

Autonomous multi-source research. No hand-holding required.

**Triggers on:** "do a deep dive on...", "research all approaches to...", "find best current methods for..."

**Protocol:** formulates search strategy → systematic multi-source search → critical evaluation by actionability → synthesized brief → saves to file

**Stop condition:** ≥5 high-quality primary sources, or 20 tool calls (whichever first)

**Output structure:**
```
Executive Summary
Landscape Map (organized by technique, not chronologically)
Top Methods (with trade-offs: compute, ease, production-readiness)
Comparison Table
Open Problems
Recommended Starting Point (fastest to run + highest potential)
Key References (arXiv IDs, GitHub repos, star counts)
```

---

## Customization

See [`configs/README.md`](configs/README.md) for tuning:
- Hook clarity threshold (default: 7/10)
- CI/CD bypass via `CLAUDE_SKIP_HOOKS=1` environment variable
- MLOps guard strictness (warn vs. block mode)
- Domain keyword extension

---

## Examples

- [`examples/prompt-transformations.md`](examples/prompt-transformations.md) — 5 before/after prompts with domain classification and clarity scores
- [`examples/mlops-guard-scenarios.md`](examples/mlops-guard-scenarios.md) — 6 real intercept/bypass scenarios with exact hook output

---

## Project Structure

```
tkml-research-suite/
├── .claude-plugin/plugin.json    # Plugin manifest
├── hooks/hooks.json              # UserPromptSubmit + PreToolUse hooks
├── commands/
│   ├── research.md               # /research command
│   ├── experiment.md             # /experiment command
│   ├── mlops.md                  # /mlops command
│   └── review-ml.md              # /review-ml command
├── skills/
│   ├── prompt-architect/         # Prompt evaluation + improvement framework
│   ├── research-workflow/        # Experiment design + literature review
│   └── mlops-standards/          # MLOps maturity + reproducibility standards
├── agents/
│   └── research-synthesizer.md  # Autonomous research agent
├── configs/README.md             # Customization guide
├── examples/                     # Concrete usage examples
├── .github/                      # Issue templates, PR template
├── CHANGELOG.md
└── CONTRIBUTING.md
```

---

## Contributing

See [`CONTRIBUTING.md`](CONTRIBUTING.md) — covers local development setup, testing hooks, and the PR process.

---

## License

MIT — see [`LICENSE`](LICENSE).

---

*Built and maintained by [TechKnowmad AI](https://techknowmad.ai)*
Keywords

Commands

Documentation