Code production

Let's discuss four approaches to producing code – manual coding, visual development, classical automation, and generative language models. We explore key dimensions of code production, including thinking, typing effort, code quality, product quality, and development cost, and outline our approach to producing code. First up, some definitions:

Manual coding

Writing and editing code such as HTML, CSS, Javascript, and Python directly.

Visual development

Building software by using visual tools such as drag-and-drop interfaces.

Classical automation

Software that automatically performs tasks using predefined rules, such as templates and code autocomplete.

Generative language models

Probabilistic systems that generate plausible text based on patterns learned from data.

Code production

The process of creating code, including thinking, design, implementation, and refinement.

Code generation (CG)

The process of automatically generating source code from a given input.

Workflow

The sequence of actions used to complete a task.

Manual coding

This is our 10-stage manual coding workflow diagram. Explanations for each stage are provided below.

1Design input

2Think/type

3Validate

4Manual test

↩ 2

5Debug

↩ 2

6Iterate

7Write tests

↩ 2, 5

8Run tests

↩ 2–6, 10

9Audit

↩ 1, 2

10Enhancement

Manual coding workflow

Sketch, wireframe, mockup, and/or prototype, plus optional functional requirements and/or specifications. Or no explicit design input.

Think. Think as you type. Code and comments. Deep thought, stay in flow, fast, creative, iterative process. Build the functionality step-by-step.

Experiment, stay sharp, and strengthen thinking skills. Understand every line of code. Classical automation can assist. Generative models can be used selectively, if needed.

Ensure the code meets functional requirements.

Run and interact with the code in the browser to see how it behaves.

Find and fix problems in the code. Loop back to stage 2 if required.

Once a function, section, or module is completed, you can loop back to stage 2. Refactor, simplify and improve whilst the code is still fresh in your mind. Repeat until the code reaches the desired quality.

Once the code has stabilised, write tests (unit, integration, end-to-end) for code that is updated, focussing on key functionality and risk areas.

Run automated tests. A safety net before audit. If tests fail, loop back to stage 5 (Debug) or stage 2 (Think/type).

Browser, accessibility, and QA tests. We can audit after a module has finished and also after the full feature is complete. If issues are found, you can loop back to previous stages or move to stage 10.

Audit can be performed at multiple phases – development, staging, and post-launch – and can include a mixture of manual and automated tests.

Add new functionality or refine existing code. Go to stage 1 or 2.

Visual development

Functionality is built through a visual interface by clicking, dragging, and configuring elements – rather than writing code. Visual tools can improve consistency, reduce errors, and accelerate development for numerous tasks.

CSS

Layout and structure, visual effects, shapes and clipping, backgrounds, motion and interaction, and various visual techniques.

Javascript

Interactions, behaviour, animations, and motion control.

Classical automation

Deterministic, reliable, approved, accurate, repeatable, consistent, control, trust, transparent, low-cost, scalable, efficient, reviewed, customisable, and tested.

Templates

We use templates – reusable blocks of code that help speed up development – HTML, CSS, Javascript, Python, SQL, Rust, C++, and other languages. We can browse a template library or get suggestions in the code editor or chat interface.

Code doesn't need to be repeatedly regenerated, as it would with generative language model approaches. We also receive all the classical benefits listed above: reliable, consistent, tested, approved...

Templates can be static or dynamic, and natural language (NL) can be used to adapt or transform them. We can also include versioning, comments, and notes for each template.

Generative language models

Probabilistic models for code generation that produce plausible outputs, ranging in size and capability. Here's an 11-stage generative language model workflow for code generation. Explanations for each stage are provided below.

1Design input

2Think/prompt

↩ 2

3Validate

4Integrate

5Manual test

↩ 2

6Debug

↩ 2, M

7Iterate

8Write tests

↩ 2, 6

9Run tests

↩ 2–7, 11

10Audit

↩ 1, 2

11Enhancement

Generative language model workflow

Sketch, wireframe, mockup, and/or prototype, plus optional functional requirements and/or specifications. Or no explicit design input.

Think and prompt. Translate design input into a generation instruction. You can prompt at varying levels of granularity: full specification or individual components.

Ensure the code meets functional requirements. You can loop back to stage 2 multiple times until you get the desired output – you can re-prompt at different granularities: full or component-level. Refine. Iterate.

Incorporate generated output into the codebase, either as a whole or incrementally. Copy/paste (external) or accept/merge (integrated). If needed, adapt output (manual, tool-assisted, or automatic) to the codebase (architecture, conventions, dependencies).

Run and interact with the code in the browser to see how it behaves. Test as a whole or incrementally.

Find and fix problems in the code. Debug as a whole or incrementally. Loop back to stage 2 if required.

Once a function, section, or module is completed, you can loop back to stage 2, or transition to a manual coding workflow for all or selected parts of the generated code.

Refactor, simplify and improve whilst the code is still fresh in your mind. Repeat until the code reaches the desired quality.

Once the code has stabilised, write tests (unit, integration, end-to-end) for code that is updated, focussing on key functionality and risk areas.

Run automated tests. A safety net before audit. If tests fail, loop back to stage 6 (Debug), stage 2 (Think/prompt) or stage 2 (Think/type) in manual coding workflow.

Browser, accessibility, and QA tests. We can audit after a module has finished and also after the full feature is complete. If issues are found, you can loop back to previous stages or move to stage 11.

Audit can be performed at multiple phases – development, staging, and post-launch – and can include a mixture of manual and automated tests.

Add new functionality or refine existing code. Go to stage 1 or 2.

Language model costs

Language model use imposes four kinds of "tax": prompt, cognitive, system, and organisational. The broader, messier, and harder-to-verify the problem, the more you pay across all four dimensions.

Total tax rises with scope, ambiguity, coupling, verification difficulty, and output variability – the degree to which identical inputs can produce different results. Value can outweigh cost, but only with selective and appropriate use.

Prompt tax

The effort required to get a correct, usable output from a language model, spanning four activities: translation, validation, iteration, and integration.

Translation

Expressing intent as a precise natural-language specification for a language model, under conditions of complexity, ambiguity, multiple valid interpretations, and often verbose or imprecise language.

Validation

Ensure the code meets functional requirements and behaves as expected, requiring careful inspection, multi-level reasoning, and analytical judgement.

Iteration

The cost of of repeated prompting and refinement required to get a correct output.

Integration

The cost incurred to incorporate generated output into the codebase.

Cognitive tax

The cognitive overhead in engaging with the language model workflow. This includes prompting, processing, understanding, debugging, refining, iterating, reasoning, making decisions, validating, switching contexts, managing unfamiliar code, ensuring code quality, multitasking, and integrating outputs.

There is also the potential for skill atrophy due to reduced manual engagement, as less manual practice can cause knowledge to fade and weaken cognitive abilities.

System tax

The broader costs imposed on the user by the the language model workflow, such as subscription fees, privacy, security, and dependency.

Organisational tax

The costs an organisation incurs when using language model workflows, including impacts on staff attraction and retention, privacy management, culture management, training, monitoring, infrastructure, developer experience, and subscription fees.

Dimensions

Let's now discuss key dimensions of code production that influence our approach – thinking, typing effort, code quality, product quality, and development cost.

Thinking

Coding involves making many decisions, with the quantity and complexity varying across files. To get the precise functionality and UX we need, we break the problem down step-by-step, working through each decision explicitly.

Development speed

The quantity and complexity of decisions have a major influence on development speed, alongside factors such as tooling, developer skill, codebase familiarity, and cognitive aspects.

Decision reuse

Decisions made in one file can often be reused across other files, reducing the need to re-think solutions and speeding up development.

Thinking time

When coding, we think about the code and the user experience. I call this 'Design Thinking Time (DTT)' – the time spent evaluating and making user interface and code design decisions throughout a project.

DTT can be split into two categories – 'Code Thinking Time (CTT)' and 'UX Thinking Time (UXTT)'. The balance between them depends on the person, the task, and various cognitive factors.

Insights

New insights often emerge from thinking through each decision in depth, leading to changes in both the design and the code that needs to be written.

Typing effort

The physical cost of typing code, typically approximated by the number of characters or keystrokes.

Code typing speed

How fast someone physically enters code, measured in characters per minute (CPM). A person's CPM can differ across programming languages due to differences in syntax, verbosity, and symbol usage. You can test your CPM on code typing speed test websites.

Code assistance

Autocomplete, templates, and copy-paste can increase CPM by reducing the number of keystrokes needed to produce code.

Time split

Typing typically accounts for a small fraction of time spent coding. Most time is spent thinking – problem solving, decision-making, designing, iterating, and debugging.

Think as you type

Coding involves thinking, pausing, and thinking to varying degrees whilst typing.

Code quality

Code quality (CQ) is a measure of how "good" and effective the code is across multiple dimensions – usability, reliability, and performance, whilst also providing the most appropriate solution.

Thinking time

Code quality is strongly correlated with the amount of deliberate thinking, design, and planning invested during development.

Typically, the more time a developer spends refining and designing a piece of code, the higher the code quality becomes, provided there is still meaningful room for improvement, up to a point of diminishing returns.

CQ score

The code editor can automatically produce a code quality (CQ) score (out of 100) for each file.

Curve shape

The shape of the curve depends on a number of factors: initial quality of implementation, clarity of requirements, developer skill, complexity, codebase familiarity, domain knowledge, constraints and tradeoffs, cognitive factors, point of diminishing returns, and CQ aim.

Initial quality

The CQ score of a file after the first iteration. It determines the remaining room for improvement – the distance between the current CQ and the maximum achievable CQ (100).

CQ aim

The target CQ level a developer intends to reach for a file before stopping further refinement. This is determined by evaluating the return on investment (ROI) of additional thinking time in improving code quality.

The developer estimates the lifetime cost of the code and the user-facing cost, and balances these against the expected CQ gain from additional effort.

If a file is not being modified and already meets requirements, there is little value in spending time improving it. The aim is not to maximise CQ across the codebase, but to prioritise resources where improvements deliver the greatest impact.

Implementation quality

A measure of how well the code satisfies usability, reliability, and performance requirements. All code production techniques can provide high implementation quality. With language models, you might need multiple iterations to achieve that standard.

Solution quality

A measure of how well the code represents the simplest, most appropriate, and maintainable approach to the problem. Solution quality can require deeper thought and iteration to identify the right approach. Effective prompts for language models require a solid understanding of the problem.

Language model context

Context has a large effect on output quality in language models. Relevant context improves accuracy. Irrelevant, conflicting, or overly long context can reduce quality. For best results, provide only relevant context.

Product quality

Product quality (PQ) is how well a product works for users – functionality, ease of use, performance, and reliability. Our PQ goals influence our code production approach. The higher the PQ targets, the more thinking time and iteration are required.

The quality of a product reflects the quality of its decisions and the craftsmanship of its implementation.

Development cost

The total effort required to build, maintain, and evolve a system over its lifetime. Total effort can be converted into monetary cost by mapping work (time/resources) to money.

Code quality (CQ) strongly influences lifetime cost, as it effects both maintenance effort and the ease of future changes.

High initial CQ increases upfront cost but reduces long-term cost, while lower initial CQ reduces upfront cost but increases long-term cost.

Long-term goals

High CQ from the start makes sense when you expect the codebase to evolve, scale, and be maintained for a long time. CQ requirements are also influenced by developer experience (DX), number of contributors, and code criticality.

Short-term goals

Lower initial CQ can be favourable where there is minimal budget or in areas of high uncertainty, where code may not be used long term – proof of concepts, idea validation, prototypes, and experimental code.

Low initial CQ products can be refactored later, potentially funded by revenue generated by the product.

Hybrid

CQ can be adjusted at the file, module, or feature level. Critical components are assigned higher CQ requirements and greater development resources.

Maintenance cost

The ongoing effort required to keep a system working, reliable, and up to date after it has been built. Maintenance cost can far exceed initial development cost.

Technical debt

The future cost incurred by choosing a quicker or simpler solution now instead of a more robust one.

Tipping point

The moment in time where the cumulative cost of maintaining low-quality code exceeds the cumulative cost of investing in high-quality code.

The tipping point arrives slower for simple, stable, lightly used apps with minimal updates. Faster for complex, high use, rapidly evolving, and/or poorly structured code.

Time to tipping point

How long it takes to get to the tipping point varies – complexity, user base, initial CQ, update frequency, team size and experience, DX, staff retention, onboarding costs, and external dependencies.

Our approach

We follow a progressive production coding strategy, defaulting to manual coding, introducing classical automation where it adds value, and using language models only when they provide value other methods cannot – such as knowledge or speed.

1Manual

2Classical automation

3Language models

Production code decision stack

Build the functionality step-by-step by writing code by hand or using visual tools. Most of our code is developed this way. Stay in flow and experiment, test, and debug as you go.

Static or dynamic templates, and autocomplete (classical or machine learning-based).

Probabilistic plausible code generators. Used occasionally and selectively, mainly for well-defined tasks or self-contained code snippets. Peripheral tools.

We optimise for

Code quality, developer experience (DX), skill development, staff retention and attraction, cost efficiency, insight generation, and user experience (UX).

We focus on

Internal knowledge sharing, classical automation, tooling, and reusing decisions and patterns.

Non-production code

We can use language models to produce non-production code, including prototypes, experimentation, and tooling, alongside our own prototyping software.

Language model use

The extent and manner in which an individual uses a language model for production code depend on various factors: core, operational, and environmental.

Core

Task characteristics

Knowledge

Skill

Operational

Cognitive

Tooling

Model performance

Environmental

Codebase familiarity

Codebase maturity

Personal preference

Factors influencing language model usage

Task characteristics

The characteristics of a task, such as ambiguity, complexity, novelty, precision, and specificity, influence how an individual interacts with the language model.

Interaction types include directive, verification, explanatory, transformative, and exploratory. As understanding improves through iteration, task characteristics evolve, as do the interaction modes.

High specificity

Tasks such as CSS require high specificity and precision for precise control. Designers must explicitly define selectors, properties, and values, which is why most CSS is written by hand – assisted by autocomplete – or generated through visual tools. Javascript UI update code and other areas also require explicit definitions.

Knowledge

What an individual understands – domain, language-specific, task-specific, codebase, and language model knowledge.

Skill

How well an individual can use what they understand. The ability to apply knowledge effectively.

Cognitive

Cognitive factors are the mental processes that determine how effectively an individual can perform a task. These include cognitive load, attention, focus, fatigue, time pressure, memory quality, and memory recency.

Tooling

The set of tools and features that assist developers in writing, editing, and optimising code. Code editor functionality, including local documentation, classical automation, knowledge sharing, code diagrams, and scripts.

Model performance

The degree to which a language model produces useful, correct, and reliable outputs. Data quality, fine-tuning, and hybrid approaches can improve quality and consistency.

Codebase familiarity

The degree to which an individual knows and can navigate the codebase.

Codebase maturity

Refers to the development stage of a codebase. Its stability, scalability, maintainability, code quality, documentation, amount of functionality, and the range of problems solved.

As a codebase matures, development shifts towards reusing and adapting existing code rather than writing code from scratch. Make use of utility files, templates, APIs, existing patterns, and internal knowledge – documents, comments, notes...

Personal preference

Individuals differ in how they prefer to interact with language models.

Interaction time

The amount of time an individual spends interacting with a language model. As knowledge, skill, tooling, codebase familiarity and knowledge increase, language model usage will tend to decrease. I expect usage among team members to range from zero to peripheral.

Time saved

The amount of time saved per task when using language models is highly variable, ranging from time lost to meaningful gains. The overall workflow impact is likely marginal due to their peripheral usage.