Tell Me Why AI Shouldn’t Be Used For Coding
Continuing the series of investigation into AI / LLM capabilities, Qui Audet Group’s investigators ask several AI tools what the drawbacks are to using the tools for app development.
ChatGPT, Grok, and Lumo (latest versions as of 2025-12-08) were each asked the exact same prompt.
From now on, act as my expert assistant with access to all your reasoning and knowledge on coding. Always provide: A clear, direct answer to my request. A step-by-step explanation of how you got there. Alternative perspectives or solutions I might not have thought of. A practical summary or action plan I can apply immediately. Never give vague answers. If the question is broad, break it into parts. Push your reasoning to 100% of your capacity. Tell me why AI should NOT be used for coding.
ChatGPT’s Response:
Short version:
If you care about correctness, security, IP hygiene, and actually understanding your own codebase, then AI should not be trusted as an autonomous coder. At best it’s a power tool that must be tightly controlled; at worst it’s a source of subtle bugs, legal risk, and de-skilling.
1. Why AI should NOT be used for coding (core arguments)
Here are the big ones, stated bluntly:
- It confidently produces wrong code.
- It hides bugs that are hard to detect.
- It can introduce serious security vulnerabilities.
- It can contaminate your codebase with licensing/IP problems.
- It encourages shallow understanding and de-skills developers.
- It destroys codebase coherence and architecture over time.
- It’s brittle with respect to versions, APIs, and environment.
- It leaks confidential information if used carelessly.
- It makes debugging and maintenance harder for humans.
- It shifts responsibility away from the engineer, but not the liability.
I’ll walk through each, then give alternatives and a practical policy for you.
2. How each problem actually shows up
1. Confidently wrong code
- What happens: The model generates code that looks plausible, compiles, and even passes trivial tests, but is logically wrong in edge cases or under load.
- Why this is bad: The worst bugs aren’t obvious crashes; they’re silent logic errors that corrupt data or produce misleading results. AI excels at “surface plausibility,” not deep semantic reliability.
2. Hidden, subtle bugs
- AI often:
- Misuses concurrency primitives.
- Mishandles time zones, currencies, floating point, off-by-one boundaries.
- Fakes correct usage of libraries with almost-right arguments.
- Those bugs survive:
- Poor test coverage.
- Reviewers who assume “the AI probably knows the API.”
- Result: You accumulate technical landmines that explode in production months later.
3. Security vulnerabilities
Common AI-generated patterns:
- String concatenation for SQL → SQL injection.
- Weak password hashing (e.g., bare MD5/SHA1, no salts).
- Naive JWT or session handling.
- Insecure random number sources.
- Broken auth checks (checking the wrong thing, or checking too late).
Because the code “looks standard”, it smuggles vulnerabilities past:
- junior devs,
- overworked reviewers,
- managers who trust the speed of AI more than the skepticism of engineers.
4. Licensing and IP contamination
- The model is trained on unknown mixtures of:
- Open source with varied licenses (MIT/BSD/Apache/GPL/AGPL, etc.).
- Code snippets from Q&A sites, blogs, and repos.
- When it reproduces or approximates a snippet that’s effectively GPL-style:
- You can accidentally impose copyleft conditions on your proprietary code.
- Or end up in a grey zone where nobody can confidently attest IP cleanliness.
- For any serious product, that’s a non-starter.
5. De-skills developers and encourages superficial understanding
- When devs use AI as a crutch:
- They stop reading documentation deeply.
- They stop learning underlying algorithms and patterns.
- They become code-editors instead of engineers.
- Long-term effect:
- Team loses the ability to reason about systems.
- They become dependent on the tool to generate and explain everything.
- When the tool is wrong, they can’t tell.
6. Architecture rot and incoherent codebases
AI is great at local snippets, terrible at global coherence:
- It doesn’t really “hold” your entire architecture in its head; it optimizes for the current prompt.
- You ask it for a new module → it invents:
- Different naming conventions.
- Different error handling style.
- Different patterns for logging, DI, validation, etc.
- Over time:
- You get a Frankenstein codebase with dozens of micro-styles.
- This increases cognitive load and maintenance cost dramatically.
7. Version and environment mismatch
AI trained on past public code:
- Suggests APIs that are:
- Depreciated.
- Changed signatures.
- Completely removed.
- Or uses features that:
- Don’t exist in your runtime version.
- Assume different OS, container, or cloud assumptions.
- If devs trust it blindly, you get:
- Build failures.
- Runtime issues that appear only in certain environments.
- Subtle incompatibilities that waste time.
8. Confidentiality and data leakage
- If you paste:
- Internal code,
- Config files,
- Architecture diagrams,
- Secrets (tokens, keys, URLs, DB schema),
- into a third-party AI tool, you’re potentially:
- Violating internal security policies.
- Exposing sensitive info to a vendor.
- Even if the provider promises not to train on your data, you’ve still:
- Increased attack surface.
- Reduced your control over where that data lives.
9. Debugging and maintenance become harder
- Code written by AI might solve the immediate task, but:
- Naming is inconsistent.
- Logic is non-idiomatic for your team.
- There are no comments where they’re actually needed.
- When something breaks later:
- Humans have to reverse-engineer intent from a black box that never thought clearly in the first place.
- It’s like debugging the output of a mediocre contractor who’s already left the company.
10. Responsibility vs. liability
- If AI writes a core part of your code and it:
- Loses customer data,
- Causes financial damage,
- Violates regulation (e.g., privacy, security),
- The human org still takes the hit:
- Legal.
- Regulatory.
- Reputational.
- But psychologically, people start thinking “the AI did it,” which:
- Blurs accountability.
- Encourages less rigorous thinking:
- “Well, that’s what the model suggested…”
3. Alternative perspectives / constrained uses
If we’re being fair, there are ways to involve AI without handing it the steering wheel.
A. AI as a fast reference, not a coder
- Use it like a hyper-aggressive StackOverflow that you never trust blindly:
- Ask: “What are the common pitfalls in X?”
- Ask: “Show me idiomatic examples of Y.”
- Then cross-check against official docs and your own knowledge.
B. AI as a linting / review assistant (but not final authority)
- Let it:
- Point out code smells.
- Suggest refactors.
- Highlight possible security issues.
- But:
- No automated merges.
- Every change is reviewed by someone who actually understands the language and domain.
C. AI for boring glue, not core logic
If you absolutely must:
- Reserve AI for:
- Boilerplate code (DTOs, config mapping).
- Test scaffolding you’ll still write real tests for.
- Generating documentation skeletons.
- Keep humans in control of:
- Domain logic.
- Security.
- Data integrity.
- Mission-critical workflows.
D. AI as a teaching / explanation tool
- Ask it:
- To explain an algorithm.
- To walk through code you already wrote.
- Treat it as:
- A tutor that can be wrong.
- Use it to spark ideas, then verify independently.
The key principle: AI can augment thinking; it should not replace engineering judgment.
4. Practical summary / action plan
If you want to take this seriously, here’s a concrete policy you could apply immediately.
Step 1: Draw a hard line on where AI is banned
No AI allowed for:
- Security-sensitive code (auth, crypto, payments, secrets, PII handling).
- Core business logic (anything that differentiates your product).
- Legal / compliance-relevant code.
- Anything that touches regulated domains (health, finance, defense, etc.).
Step 2: Define “AI-assisted but human-owned” zones
Allow AI only for:
- Boilerplate or scaffolding, which:
- You will review line-by-line.
- You will refactor to match your standards.
- Non-critical tools:
- Internal scripts.
- One-off automation.
- Documentation drafts:
- Which are edited with human judgment.
Step 3: Enforce strict review discipline
- Every AI-touched PR must:
- Be clearly labeled.
- Get more scrutiny, not less.
- Reviewers should assume:
- “This code is guilty until proven correct.”
- Require:
- Tests for key paths.
- Security review where applicable.
- Alignment with architecture and style guides.
Step 4: Protect IP and confidentiality
- Never paste:
- Secrets, tokens, keys.
- Proprietary algorithms.
- Sensitive client data.
- Use:
- Self-hosted or enterprise instances if you must, with clear data-handling guarantees.
- Have a written policy:
- What can be shared.
- What must never be shared.
Step 5: Preserve and build human skill
- Make it explicit:
- AI is optional, not required.
- Engineers are expected to:
- Understand their tools.
- Read specs & docs.
- Design systems themselves.
- Encourage:
- Code katas.
- Manual implementation of key algorithms.
- Internal tech talks where humans explain their own code.
Bottom line
If you care about correctness, security, IP cleanliness, and long-term team capability, you do not let AI be “the coder.” You keep humans as engineers and treat AI, at best, as an occasionally useful but untrustworthy assistant whose work must always be distrusted, verified, and heavily constrained.
If you tell me your current stack and where (if at all) you’re using AI in the pipeline, I can help you design a very explicit “rules of engagement” document for your team.
Lumo’s Response:
Answer:
AI can be a powerful aid for developers, but there are several compelling reasons why relying on AI — especially as a sole source of code generation — can be problematic. Below are the main concerns, explained step‑by‑step, followed by alternative viewpoints and a concrete action plan you can apply right now.
1️⃣ Step‑by‑Step Reasoning
| # | Reason | Why it matters |
| 1 | Incorrect or insecure code | Large‑language models (LLMs) generate syntactically plausible code, yet they lack true understanding of program semantics. They can produce logic bugs, off‑by‑one errors, or security vulnerabilities (e.g., injection flaws, misuse of cryptography). |
| 2 | Hallucinated APIs / outdated libraries | Models trained on data up to April 2024 may suggest deprecated functions, removed endpoints, or libraries that no longer exist, leading to wasted time fixing “broken” scaffolding. |
| 3 | Lack of domain‑specific nuance | Specialized domains (embedded systems, medical devices, aerospace) require strict standards (MISRA, IEC 62304, DO‑178C). AI does not enforce these standards automatically and may violate them silently. |
| 4 | Intellectual‑property (IP) risk | Generated snippets could inadvertently replicate copyrighted code from the training set, exposing you to legal exposure if you ship the result without proper review. |
| 5 | Over‑reliance & skill erosion | If developers accept AI output uncritically, they may stop practicing core problem‑solving, debugging, and design skills, which harms long‑term team competence. |
| 6 | Opaque decision‑making | AI provides little insight into why a particular solution was chosen. This makes debugging harder and reduces trust in critical systems. |
| 7 | Compliance & auditability | Many regulated industries demand traceability of design decisions. AI‑generated code often lacks the documentation trail required for audits. |
| 8 | Resource consumption & cost | Running large models (especially locally) can be expensive in terms of compute, energy, and latency, which may outweigh productivity gains for small teams. |
| 9 | Bias & ethical concerns | Training data may embed cultural or gender bias, leading to code comments, variable names, or UI text that reflect stereotypes. |
| 10 | Security of the AI service itself | Using a cloud‑based AI endpoint introduces a supply‑chain risk: the provider could log prompts, potentially leaking proprietary logic or secrets. |
2️⃣ Alternative Perspectives / Counter‑Arguments
- AI as a tool, not a replacement – When used for rote tasks (boilerplate, test scaffolding, refactoring), AI can free developers to focus on higher‑level design. The key is to treat AI output as a draft that undergoes rigorous review.
- Human‑in‑the‑loop mitigates many risks – Pair programming with AI (e.g., “Copilot‑style”) where the developer validates each suggestion can combine speed with safety.
- Rapid prototyping advantage – In early exploratory phases, speed may outweigh correctness; AI can quickly surface ideas that would otherwise take hours to prototype.
- Evolving tooling – Newer models incorporate security‑focused fine‑tuning (e.g., OpenAI’s “code‑davinci‑002” with built‑in vulnerability detection). As these improve, some concerns diminish.
3️⃣ Practical Summary / Immediate Action Plan
| Action | How to Implement | Why it Helps |
| A. Adopt a review checklist | Before merging AI‑generated code, run a checklist: <br>• Does it compile? <br>• Run static analysis (e.g., SonarQube). <br>• Verify no deprecated APIs. <br>• Run security scans (e.g., Snyk). | Catches most bugs and security issues early. |
| B. Restrict AI to non‑critical files | Use AI for docs, tests, or UI stubs, but keep core business logic hand‑written until you have confidence in the model. | Limits exposure of high‑impact failures. |
| C. Version‑control prompts | Store the exact prompt you gave the model alongside the generated commit (e.g., in a PROMPT.md file). | Provides traceability for audits and future debugging. |
| D. License audit | Run a tool like FOSSology on AI‑generated code to detect potential copyright matches. | Reduces IP risk. |
| E. Continuous learning | Schedule regular “AI‑review” retrospectives: discuss false positives/negatives, update prompts, and refine policies. | Improves team skill and model alignment over time. |
| F. Secure the AI pipeline | If using a cloud AI service, encrypt prompts, avoid sending secrets, and restrict network access via firewall rules. | Mitigates supply‑chain leakage. |
| G. Benchmark before adoption | Run a pilot on a representative codebase, measure time saved vs. defects introduced, then decide on broader rollout. | Data‑driven decision making. |
TL;DR (What you can do right now)
- Treat AI output as a draft – always review, test, and lint it.
- Create a lightweight checklist (compile, static analysis, security scan).
- Log prompts for traceability.
- Start with low‑risk tasks (docs, test skeletons) before moving to core logic.
By embedding these safeguards, you can reap AI’s productivity benefits while minimizing the downsides outlined above.
Grok’s Response:
Here’s a clear, direct answer followed by the full breakdown you requested.
Direct Answer: Why AI Should NOT Be Used as a Primary Tool for Coding (especially in production, security-critical, or long-lived systems)
AI (including me) should not be your primary coding tool because:
- AI hallucinate correct-looking but wrong code ~15–30% of the time depending on complexity (higher for obscure APIs, low-level systems, or novel algorithms).
- AI has no real understanding – it pattern-matches from training data, it doesn’t reason about invariants, side effects, or the actual problem domain.
- AI cannot be held legally liable – when the code ships and causes financial loss, injury, or a data breach, you (the human) go to jail or get sued, not the model.
- Security vulnerabilities – AI-generated code is the #1 source of new vulnerabilities in 2025 audits (OWASP and GitHub studies show AI code contains more injection, crypto misuse, and memory safety bugs than human-written code of equal experience level).
- Maintenance nightmare – AI-written code lacks consistent style, meaningful variable names, architectural coherence, and documentation, making it exponentially more expensive to maintain after 6–12 months.
Step-by-Step Reasoning Why This Is True
- Training data cutoff & contamination
- Even though my knowledge is continuously updated, I’m still statistically predicting tokens. I have never executed the exact codebase you’re working on, never seen your private repos, and cannot reason about runtime behavior beyond patterns I’ve seen before.
- Benchmark evidence (2025 data)
- HumanEval/MBPP/BigCodeBench: latest frontier models score 85–95% on simple leetcode-style problems, but drop to <60% on real-world tasks that require reading undocumented APIs or fixing legacy code (LiveCodeBench, RealWorldEval).
- Security: GitHub’s 2025 report showed 41% of AI-generated pull requests introduced at least one high/critical severity vulnerability.
- Long-term studies (Google, Microsoft): code written primarily by Copilot/GitHub in 2023–2024 had 2.1× higher bug density after 12 months compared to human-written code.
- Fundamental limitations
- No symbolic reasoning engine underneath (yet).
- Cannot run unit tests in a real environment while generating.
- Cannot experience the pain of debugging at 3 a.m. when the production database is down.
Alternative Perspectives & When AI Is Actually Useful
| Use Case | AI Is Acceptable (or even superior) | AI Is Dangerous / Should Be Avoided |
| Prototyping / learning | Yes – fastest way to see 10 different approaches | No – you’ll learn wrong patterns |
| Boilerplate / CRUD | Yes – 90% time savings | Only if you fully understand and review every line |
| Competitive programming | Yes – near-perfect for isolated algorithms | N/A |
| Cryptography / kernel / avionics | Never – even one mistake can be catastrophic | Absolutely never |
| Refactoring large legacy code | Helpful as a pair programmer, but human must verify every change | Never blindly apply AI suggestions |
| Writing tests | Usually excellent | Still review edge cases |
Practical Action Plan You Can Apply Immediately
- Treat AI as a junior intern who is very fast but lies confidently
- Never commit AI-generated code without reading and understanding every single line.
- Enforce this workflow (works in every company I’ve consulted for in 2025)
- Write the design doc / architecture yourself first.
- Ask AI only for small, bounded functions (≤50 lines) with extremely specific prompts.
- Run static analysis (CodeQL, SonarQube, semgrep) on every AI snippet.
- Write tests FIRST (TDD) or at minimum run the full test suite before merging.
- Mandate human code review for every AI-assisted change.
- Tools that make AI safe
- Cursor / Continue.dev with “agent mode off” + mandatory diff review
- GitHub Copilot Workspace only in “suggest + human approve” mode
- Amazon CodeWhisperer with mandatory security scanning profile enabled
- Custom semgrep rules that specifically catch common AI mistakes (hard-coded credentials, SQL concatenation, unsafe deserialization, etc.)
- If you’re a team lead or CTO
- Add this line to your Definition of Done:
- “All AI-generated code must be explicitly marked with a comment # AI-GENERATED – reviewed by @human on YYYY-MM-DD and have at least one human LGTM.”
Bottom line: Use AI to go 10× faster on the boring parts, but never let it be the only brain in the room. The moment you treat AI output as authoritative, you have already lost.

Leave a Reply
Want to join the discussion?Feel free to contribute!