EdGo Writeup Pt 1.

First Published: 2026-06-05

Last Updated: 2026-06-15

Agentic Coding with Non-Frontier Models

About

This is a writeup about the EdGo app, which is an unofficial app client for the EdDiscussion platform.
This app was an excuse to try building apps using React Native and/or Lynx. And since I was trying out new stuff, this was also my first attempt to adopt Effect-ts.
This was a bit of a mistake, as I tried too many new things in a single project, and a bunch of initial architectural decisions were shite, and there is still unideal usage of UseEffect.
Finally, as so much of this project was new to me, heavy use of LLMs were used to help.

LLM (AI) Usage

Lets get the elephant in the room out of the way first, I will cover the technical details of EdGo in part 2.
Although LLMs were heavily used in this project, this was not vibecoded, as most of the raw output needed heavy alterations to truly integrate into the codebase.
The rest of this blogpost will be documenting my workflow at this point, as I thought this may be interesting to document my workflow at this point for future reference when models inevitably get better.

Firstly my workflow is structured around a couple of things:

1. Skill Level

Currently, I would say that I can generally understand the generated code, and can determine when it is bad and can fix/debug the code. But I don't quite understand the systems enough to write the features myself without significant effort, hence the use of LLMs as a POC reference implementation and a learning scaffold.

2. The Models

There were four models used thoughout development (April - June 2026), which changed as new models released.

GLM Family (GLM 4.7 + GLM 5.1)
The workhorses of this project, worked well, but notably had quite a few issues with complex technical tasks and required quite a bit of steering for Effect-ts code.
Why GLM and not any other model? (GPT Series, Kimi, Minimax etc.)
I got it dirt cheap and I never hit any limits with it despite being on the Lite plan.
GPT 5.3 Codex
Wrote much better functional code, but since I had limited requests, I generally rationed its use for more difficult technical tasks.
About halfway through the project in April, I had to switch over to GLM-5.1 entirely as Github Student plans removed GPT 5.3 Codex access entirely.
GPT 5.4
Used to refine architecture decisions and to critique functions given to it.
I had to use other models for general programming, as this was used through my chat subscription (t3.chat).

Harnesses

I used Zed agents and OpenCode as my harnesses of choice.

Zed Agents were perfect for minor fixes and tweaks, but was not chosen for more complex features as it wrote notably worse code than OpenCode with the same model (GLM family).
OpenCode was used for creating more complicated features as its harness was more willing to use tools and was more willing to iterate on its own.

I suspect Zed Agents underperformed with the GLM models because its system prompt made it use tools much less, leading to a lack of "understanding" of the general codebase in its context, which caused it to write worse code.

The Workflow (Complex Features)

If I didn't have a good concept of the technical pinnings of a feature, the initial "spec" creation would be done with GPT-5.4, or rather a high level concept with some implementation specifics (e.g. I want ___ to be written to the DB, also create a Drizzle and Effect schema.)
This was necessary as although GLM-5.1 has no problems with simple feature requests, it generally has bad "taste" and struggles with complex technical features. GLM 5.1 would be then tasked to create a plan to implement that feature and with a reference to the important files.

I found that it was important to not over/understeer GLM 5.1 when prompting it as it needed a certain balance between not overfitting to my instructions when it lead to a worse implementation, and not making stupid decisions itself.

Then there might be a couple of back+forths to refine the plan before it would be set to build and told to implement. This generally worked quite well after a couple of tweaks needed to fix edge cases.

Occasionally, GLM would entirely fall on its face, this was often because it would find a bad piece of code in the codebase which would contaminate its context as it would proceed to use that as a reference.
The only solution then was to revert all changes and create a new thread with fresh context and be more thorough in its initial prompt to avoid that pattern.
What is notable is that my workflow GPT-5.3 Codex did not have most of these issues, and in my experience gave equal or better results with less steering.

Conclusion

Open models like GLM-5.1 are exceptionally capable, especially compared to the early GPT-3 days, however they are still lacking in certain ways that proprietary models by frontier labs aren't.
Although the experience is much worse, I would say that is better for developping my skills and understanding as each solution requires a thorough code review to find mistakes or edge cases it forgot to handle.
Overall I would say that open models tend to more heavily embody " garbage in garbage out", where the quality of its output is more dependant on the skills of the user, whereas more frontier models can generate better code with less skills.
Now that the AI writeup is out of the way, the next blogpost will be a bit more fun as it takes on the technical stuff.

This work is licensed under

CC BY 4.0