Let's discuss four approaches to producing code – manual coding, visual development, classical automation, and generative language models. We explore key dimensions of code production, including thinking, typing effort, code quality, product quality, and development cost, and outline our approach to producing code. First up, some definitions:
Manual coding
Writing and editing code such as HTML, CSS, Javascript, and Python directly.
Visual development
Building software by using visual tools such as drag-and-drop interfaces.
Classical automation
Software that automatically performs tasks using predefined rules, such as templates and code autocomplete.
Generative language models
Probabilistic systems that generate plausible text based on patterns learned from data.
Code production
The process of creating code, including thinking, design, implementation, and refinement.
Code generation (CG)
The process of automatically generating source code from a given input.
Workflow
The sequence of actions used to complete a task.
Manual coding
This is our 10-stage manual coding workflow diagram. Explanations for each stage are provided below.
Design input
Sketch, wireframe, mockup, and/or prototype, plus optional functional requirements and/or specifications. Or no explicit design input.
Think/type
Think. Think as you type. Code and comments. Deep thought, stay in flow, fast, creative, iterative process. Build the functionality step-by-step.
Experiment, stay sharp, and strengthen thinking skills. Understand every line of code. Classical automation can assist. Generative models can be used selectively, if needed.
Validate
Ensure the code meets functional requirements.
Manual test
Run and interact with the code in the browser to see how it behaves.
Debug
Find and fix problems in the code. Loop back to stage 2 if required.
Iterate
Once a function, section, or module is completed, you can loop back to stage 2. Refactor, simplify and improve whilst the code is still fresh in your mind. Repeat until the code reaches the desired quality.
Write tests
Once the code has stabilised, write tests (unit, integration, end-to-end) for code that is updated, focussing on key functionality and risk areas.
Run tests
Run automated tests. A safety net before audit. If tests fail, loop back to stage 5 (Debug) or stage 2 (Think/type).
Audit
Browser, accessibility, and QA tests. We can audit after a module has finished and also after the full feature is complete. If issues are found, you can loop back to previous stages or move to stage 10.
Audit can be performed at multiple phases – development, staging, and post-launch – and can include a mixture of manual and automated tests.
Enhancement
Add new functionality or refine existing code. Go to stage 1 or 2.
Visual development
Functionality is built through a visual interface by clicking, dragging, and configuring elements – rather than writing code. Visual tools can improve consistency, reduce errors, and accelerate development for numerous tasks.
CSS
Layout and structure, visual effects, shapes and clipping, backgrounds, motion and interaction, and various visual techniques.
Javascript
Interactions, behaviour, animations, and motion control.
Classical automation
Deterministic, reliable, approved, accurate, repeatable, consistent, control, trust, transparent, low-cost, scalable, efficient, reviewed, customisable, and tested.
Templates
We use templates – reusable blocks of code that help speed up development – HTML, CSS, Javascript, Python, SQL, Rust, C++, and other languages. We can browse a template library or get suggestions in the code editor or chat interface.
Code doesn't need to be repeatedly regenerated, as it would with generative language model approaches. We also receive all the classical benefits listed above: reliable, consistent, tested, approved...
Templates can be static or dynamic, and natural language (NL) can be used to adapt or transform them. We can also include versioning, comments, and notes for each template.
Generative language models
Probabilistic models for code generation that produce plausible outputs, ranging in size and capability. Here's an 11-stage generative language model workflow for code generation. Explanations for each stage are provided below.
Design input
Sketch, wireframe, mockup, and/or prototype, plus optional functional requirements and/or specifications. Or no explicit design input.
Think/prompt
Think and prompt. Translate design input into a generation instruction. You can prompt at varying levels of granularity: full specification or individual components.
Validate
Ensure the code meets functional requirements. You can loop back to stage 2 multiple times until you get the desired output – you can re-prompt at different granularities: full or component-level. Refine. Iterate.
Integrate
Incorporate generated output into the codebase, either as a whole or incrementally. Copy/paste (external) or accept/merge (integrated). If needed, adapt output (manual, tool-assisted, or automatic) to the codebase (architecture, conventions, dependencies).
Manual test
Run and interact with the code in the browser to see how it behaves. Test as a whole or incrementally.
Debug
Find and fix problems in the code. Debug as a whole or incrementally. Loop back to stage 2 if required.
Iterate
Once a function, section, or module is completed, you can loop back to stage 2, or transition to a manual coding workflow for all or selected parts of the generated code.
Refactor, simplify and improve whilst the code is still fresh in your mind. Repeat until the code reaches the desired quality.
Write tests
Once the code has stabilised, write tests (unit, integration, end-to-end) for code that is updated, focussing on key functionality and risk areas.
Run tests
Run automated tests. A safety net before audit. If tests fail, loop back to stage 6 (Debug), stage 2 (Think/prompt) or stage 2 (Think/type) in manual coding workflow.
Audit
Browser, accessibility, and QA tests. We can audit after a module has finished and also after the full feature is complete. If issues are found, you can loop back to previous stages or move to stage 11.
Audit can be performed at multiple phases – development, staging, and post-launch – and can include a mixture of manual and automated tests.
Enhancement
Add new functionality or refine existing code. Go to stage 1 or 2.
Language model costs
Language model use imposes four kinds of "tax": prompt, cognitive, system, and organisational. The broader, messier, and harder-to-verify the problem, the more you pay across all four dimensions.
Total tax rises with scope, ambiguity, coupling, verification difficulty, and output variability – the degree to which identical inputs can produce different results. Value can outweigh cost, but only with selective and appropriate use.
Prompt tax
The effort required to get a correct, usable output from a language model, spanning four activities: translation, validation, iteration, and integration.
Translation
Expressing intent as a precise natural-language specification for a language model, under conditions of complexity, ambiguity, multiple valid interpretations, and often verbose or imprecise language.
Validation
Ensure the code meets functional requirements and behaves as expected, requiring careful inspection, multi-level reasoning, and analytical judgement.
Iteration
The cost of of repeated prompting and refinement required to get a correct output.
Integration
The cost incurred to incorporate generated output into the codebase.
Cognitive tax
The cognitive overhead in engaging with the language model workflow. This includes prompting, processing, understanding, debugging, refining, iterating, reasoning, making decisions, validating, switching contexts, managing unfamiliar code, ensuring code quality, multitasking, and integrating outputs.
There is also the potential for skill atrophy due to reduced manual engagement, as less manual practice can cause knowledge to fade and weaken cognitive abilities.
System tax
The broader costs imposed on the user by the the language model workflow, such as subscription fees, privacy, security, and dependency.
Organisational tax
The costs an organisation incurs when using language model workflows, including impacts on staff attraction and retention, privacy management, culture management, training, monitoring, infrastructure, developer experience, and subscription fees.
Dimensions
Let's now discuss key dimensions of code production that influence our approach – thinking, typing effort, code quality, product quality, and development cost.
Thinking
Coding involves making many decisions, with the quantity and complexity varying across files. To get the precise functionality and UX we need, we break the problem down step-by-step, working through each decision explicitly.
Development speed
The quantity and complexity of decisions have a major influence on development speed, alongside factors such as tooling, developer skill, codebase familiarity, and cognitive aspects.
Decision reuse
Decisions made in one file can often be reused across other files, reducing the need to re-think solutions and speeding up development.
Thinking time
When coding, we think about the code and the user experience. I call this 'Design Thinking Time (DTT)' – the time spent evaluating and making user interface and code design decisions throughout a project.
DTT can be split into two categories – 'Code Thinking Time (CTT)' and 'UX Thinking Time (UXTT)'. The balance between them depends on the person, the task, and various cognitive factors.
Insights
New insights often emerge from thinking through each decision in depth, leading to changes in both the design and the code that needs to be written.
Typing effort
The physical cost of typing code, typically approximated by the number of characters or keystrokes.
Code typing speed
How fast someone physically enters code, measured in characters per minute (CPM). A person's CPM can differ across programming languages due to differences in syntax, verbosity, and symbol usage. You can test your CPM on code typing speed test websites.
Code assistance
Autocomplete, templates, and copy-paste can increase CPM by reducing the number of keystrokes needed to produce code.
Time split
Typing typically accounts for a small fraction of time spent coding. Most time is spent thinking – problem solving, decision-making, designing, iterating, and debugging.
Think as you type
Coding involves thinking, pausing, and thinking to varying degrees whilst typing.
Code quality
Code quality (CQ) is a measure of how "good" and effective the code is across multiple dimensions – usability, reliability, and performance, whilst also providing the most appropriate solution.
Thinking time
Code quality is strongly correlated with the amount of deliberate thinking, design, and planning invested during development.
Typically, the more time a developer spends refining and designing a piece of code, the higher the code quality becomes, provided there is still meaningful room for improvement, up to a point of diminishing returns.
CQ score
The code editor can automatically produce a code quality (CQ) score (out of 100) for each file.
Curve shape
The shape of the curve depends on a number of factors: initial quality of implementation, clarity of requirements, developer skill, complexity, codebase familiarity, domain knowledge, constraints and tradeoffs, cognitive factors, point of diminishing returns, and CQ aim.
Initial quality
The CQ score of a file after the first iteration. It determines the remaining room for improvement – the distance between the current CQ and the maximum achievable CQ (100).
CQ aim
The target CQ level a developer intends to reach for a file before stopping further refinement. This is determined by evaluating the return on investment (ROI) of additional thinking time in improving code quality.
The developer estimates the lifetime cost of the code and the user-facing cost, and balances these against the expected CQ gain from additional effort.
If a file is not being modified and already meets requirements, there is little value in spending time improving it. The aim is not to maximise CQ across the codebase, but to prioritise resources where improvements deliver the greatest impact.
Implementation quality
A measure of how well the code satisfies usability, reliability, and performance requirements. All code production techniques can provide high implementation quality. With language models, you might need multiple iterations to achieve that standard.
Solution quality
A measure of how well the code represents the simplest, most appropriate, and maintainable approach to the problem. Solution quality can require deeper thought and iteration to identify the right approach. Effective prompts for language models require a solid understanding of the problem.
Language model context
Context has a large effect on output quality in language models. Relevant context improves accuracy. Irrelevant, conflicting, or overly long context can reduce quality. For best results, provide only relevant context.
Product quality
Product quality (PQ) is how well a product works for users – functionality, ease of use, performance, and reliability. Our PQ goals influence our code production approach. The higher the PQ targets, the more thinking time and iteration are required.
The quality of a product reflects the quality of its decisions and the craftsmanship of its implementation.
Development cost
The total effort required to build, maintain, and evolve a system over its lifetime. Total effort can be converted into monetary cost by mapping work (time/resources) to money.
Code quality (CQ) strongly influences lifetime cost, as it effects both maintenance effort and the ease of future changes.
High initial CQ increases upfront cost but reduces long-term cost, while lower initial CQ reduces upfront cost but increases long-term cost.
Long-term goals
High CQ from the start makes sense when you expect the codebase to evolve, scale, and be maintained for a long time. CQ requirements are also influenced by developer experience (DX), number of contributors, and code criticality.
Short-term goals
Lower initial CQ can be favourable where there is minimal budget or in areas of high uncertainty, where code may not be used long term – proof of concepts, idea validation, prototypes, and experimental code.
Low initial CQ products can be refactored later, potentially funded by revenue generated by the product.
Hybrid
CQ can be adjusted at the file, module, or feature level. Critical components are assigned higher CQ requirements and greater development resources.
Maintenance cost
The ongoing effort required to keep a system working, reliable, and up to date after it has been built. Maintenance cost can far exceed initial development cost.
Technical debt
The future cost incurred by choosing a quicker or simpler solution now instead of a more robust one.
Tipping point
The moment in time where the cumulative cost of maintaining low-quality code exceeds the cumulative cost of investing in high-quality code.
The tipping point arrives slower for simple, stable, lightly used apps with minimal updates. Faster for complex, high use, rapidly evolving, and/or poorly structured code.
Time to tipping point
How long it takes to get to the tipping point varies – complexity, user base, initial CQ, update frequency, team size and experience, DX, staff retention, onboarding costs, and external dependencies.
Our approach
We follow a progressive production coding strategy, defaulting to manual coding, introducing classical automation where it adds value, and using language models only when they provide value other methods cannot – such as knowledge or speed.
Manual
Build the functionality step-by-step by writing code by hand or using visual tools. Most of our code is developed this way. Stay in flow and experiment, test, and debug as you go.
Classical automation
Static or dynamic templates, and autocomplete (classical or machine learning-based).
Language models
Probabilistic plausible code generators. Used occasionally and selectively, mainly for well-defined tasks or self-contained code snippets. Peripheral tools.
We optimise for
Code quality, developer experience (DX), skill development, staff retention and attraction, cost efficiency, insight generation, and user experience (UX).
We focus on
Internal knowledge sharing, classical automation, tooling, and reusing decisions and patterns.
Non-production code
We can use language models to produce non-production code, including prototypes, experimentation, and tooling, alongside our own prototyping software.
Language model use
The extent and manner in which an individual uses a language model for production code depend on various factors: core, operational, and environmental.
Core
Operational
Environmental
Task characteristics
The characteristics of a task, such as ambiguity, complexity, novelty, precision, and specificity, influence how an individual interacts with the language model.
Interaction types include directive, verification, explanatory, transformative, and exploratory. As understanding improves through iteration, task characteristics evolve, as do the interaction modes.
High specificity
Tasks such as CSS require high specificity and precision for precise control. Designers must explicitly define selectors, properties, and values, which is why most CSS is written by hand – assisted by autocomplete – or generated through visual tools. Javascript UI update code and other areas also require explicit definitions.
Knowledge
What an individual understands – domain, language-specific, task-specific, codebase, and language model knowledge.
Skill
How well an individual can use what they understand. The ability to apply knowledge effectively.
Cognitive
Cognitive factors are the mental processes that determine how effectively an individual can perform a task. These include cognitive load, attention, focus, fatigue, time pressure, memory quality, and memory recency.
Tooling
The set of tools and features that assist developers in writing, editing, and optimising code. Code editor functionality, including local documentation, classical automation, knowledge sharing, code diagrams, and scripts.
Model performance
The degree to which a language model produces useful, correct, and reliable outputs. Data quality, fine-tuning, and hybrid approaches can improve quality and consistency.
Codebase familiarity
The degree to which an individual knows and can navigate the codebase.
Codebase maturity
Refers to the development stage of a codebase. Its stability, scalability, maintainability, code quality, documentation, amount of functionality, and the range of problems solved.
As a codebase matures, development shifts towards reusing and adapting existing code rather than writing code from scratch. Make use of utility files, templates, APIs, existing patterns, and internal knowledge – documents, comments, notes...
Personal preference
Individuals differ in how they prefer to interact with language models.
Interaction time
The amount of time an individual spends interacting with a language model. As knowledge, skill, tooling, codebase familiarity and knowledge increase, language model usage will tend to decrease. I expect usage among team members to range from zero to peripheral.
Time saved
The amount of time saved per task when using language models is highly variable, ranging from time lost to meaningful gains. The overall workflow impact is likely marginal due to their peripheral usage.