Automation

This project uses a variety of techniques to automate tasks – improving task consistency, quality, and productivity. This post discusses task elimination, classical programming, machine learning, APIs, and augmentation.

Task elimination

Huge gains in cost efficiency and productivity can come from removing tasks entirely. For example, rather than spend lots of time and money building apps for different platforms, build one app which runs in the browser. Eliminate before you automate.

Classical programming

Explicit step-by-step instructions. Logic is predefined and fixed by the programmer. The program produces predictable outputs based on these rules. It does not learn or adapt.

Automation tasks include code generation, chatbots, APIs, language generation, and summarisation.

Classical systems offer many benefits: reliable, transparent, explainable, control, consistent, cost-efficient, trust, deterministic, customisable, verifiable, traceable, precision, predictable, auditable, and more.

Machine learning

A process where a program learns from data – using one or more machine learning (ML) algorithms – to find patterns, make predictions, or make decisions, improving over time, without being explicitly programmed.

Find patterns

Anomaly detection, Natural Language Processing (NLP), modality conversions, semantic search, and vision.

Make predictions

Autocomplete, API routing, pricing, recommendations, availability, forecasting, classification, personalisation, NLP, modality conversions, semantic search, and vision.

Make decisions

Prescriptions, vision, decision support, and reinforcement learning.

Use ML as an answer system

The ML model just returns an output, eg image classification, translation, question answering, chatbot, OCR, and sentiment analysis. No system state changes. Input > ML model > Output.

Use ML to update/augment classical systems

Keep all the benefits of classical systems. Use machine learning to add capabilities that classical systems lack or struggle to implement, such as data-driven predictions, pattern recognition, and adaptive decision-making.

Used in areas where rules are hard to write, but patterns exist – for example, when data is messy, ambiguous, unstructured, contextual, or complex. Typically, an ML system is used to update databases.

Modality conversions

Transforming data across or within modalities – such as text, images, audio, video, and sign-language. Intra-modal conversions, eg text-to-text, involve processing or transforming data within the same modality, for example, through translation, summarisation, or style transfer.

Vector embeddings

Numerical representations of data created for text, images, audio, video, or sign language. Used for similarity search, classification, cross-modal alignment, clustering, recommendation, and other tasks.

Open source models

We use open source machine learning models. These are local, private, and free to access, offering users full control, offline capability, and fast performance. Users can integrate third-party machine learning models through the plugin system.

APIs

An Application Programming Interface (API) is a set of rules, protocols, and tools that allows different software applications to communicate with each other. It defines the methods and data formats that applications can use to request and exchange information.

Web API

Allows applications to send or receive data over HTTP/HTTPS via a URL. It can be accessed over the internet, an intranet, or localhost. Example URL:

https://example.com/api/v1/users/123

Internal API

An API used exclusively within an application, system, or organisation. It can be a web or programmatic API – accessed directly in code through function or method calls, rather than over a network.

Public API

An API that is openly available to external developers.

Private API

An API that is restricted to internal use within an organisation.

Hybrid API

An API which allows both private and public access. An API URL (endpoint) can be made available for private and/or public use, or neither. We use this approach. You decide access permissions for each endpoint.

REST API

An API that uses standard HTTP methods to access and manipulate resources in a stateless way.

API spec

A document that explains how an API works and how to use it. Can be linked to in the HTML <head> for automatic discovery. Uses the OpenAPI format.

Communication scenarios

We classify API endpoints into four layers:

  • Native app
  • Operating system (OS)
  • Firmware
  • Website/web app

Each layer can use its own API internally, or communicate with APIs on the same or different device. Cross-device API connections typically use a cloud API mediator:

App > Mediator > Device > Mediator > App

This is how, for example, a mobile phone app connects to your home device, such as a thermostat. The device registers with the central API server, a connection (Websockets, MQTT) is kept open, and all communication between the app and device goes through the intermediary.

Interaction methods

These are the ways users or systems trigger API calls through different methods of interaction:

  • Text
  • Speech
  • Sign language
  • Touch
  • Touchless gestures
  • Haptic feedback
  • Eye movement
  • Motion
  • Facial expressions
  • Brain-computer interface
  • Keyboard/mouse
  • Physical devices
  • QR codes
  • NFC
  • Images/video
  • Bluetooth
  • RFID
  • GPS
  • Biometrics
  • CLI
  • Files
  • AR/VR
  • Automated scripts

Each interaction is converted to a text-based action before the system sends the request to an API endpoint in the background.

Endpoint actions

When an API endpoint receives a request, it can perform a variety of actions depending on the type of request – eg GET, POST, PUT, DELETE – and the logic implemented on the server. Example actions include:

  • Database operations
  • File operations
  • Hardware operations
  • Communication operations
  • Data processing
  • Authentication
  • Authorisation
  • Logging
  • Send notifications
  • Error handling
  • External API calls
  • Internal API calls

The endpoint does whatever it needs to do – fetch data, process data, reason, plan, perform calculations, query a database... The endpoint then returns a response.

API response

Most modern APIs return responses in JSON format. The response can include status information, structured data, metadata, raw text, templates, and error details. A response could include links to URLs.

Orchestration

The process of coordinating multiple APIs in a multi-step workflow so they work together to complete a larger task.

Parallel execution

You can call multiple APIs simultaneously. Optionally combine outputs.

Chained/cascading API calls

Each API endpoint can call other API endpoints, which may belong to the same API or different APIs, creating a multi-level workflow. APIs could reside on multiple devices or servers.

Data dependency

The output from one API endpoint can be used as input for another API call.

Side effect calls

Calls whose outputs aren't used by other steps. For example, messaging, logging, and external integrations.

Human-in-the-loop

The workflow might require human input before the workflow continues.

Safety

A chat interface should request user confirmation before proceeding when:

  • The action is high cost
  • The action has wide impact
  • The action is irreversible
  • The system detects ambiguous or incomplete intent
  • There is a risk of misinterpretation

Security

APIs can restrict access through authentication and authorisation.

Purchases

Purchases require explicit user confirmation – A 'confirm/buy' button, voice command, or biometric/wallet approval. Legal and regulatory requirements.

Reliability

Refers to the consistency and dependability of an API in performing its intended functions over time. Key aspects include: uptime, consistency of responses, error handling, performance stability, and resilience.

API reliability is typically high (eg 99.5% to 99.95%), and can be 99.99% or higher on systems you fully control. 99.95% means there is 4.38 hours of downtime per year.

We use a variety of techniques to maximise reliability: retries, timeouts, error handling, auto API creation, circuit breakers, backoff strategies, versioning, logging, CDN, rate limiting, monitoring, metrics, and alerts.

Decision-making

You can define rulesets to control decision-making within an API orchestration workflow. These rules can be applied at three levels: global, per-API, or per-request via prompts. Decision types:

  • Conditional branching
  • Dynamic parameter adjustment
  • Fallback and error handling
  • Time/event-based decisions
  • User/context-aware decisions
  • System-aware decisions
  • Security/compliance decisions
  • Constrained randomness

Deterministic, predictable, transparent, explainable, easy to govern, low risk, easy to simulate and verify, fast, reliable, resource-efficient, and suitable for high-stakes tasks.

API management

We provide an interface for managing your API. Dashboards, logs, metrics, access control, error handling, localisation, rate limiting, usage fees, alerts, monitoring, versioning, and more.

Auto API creation

We automatically create API endpoints for every feature, providing effortless access to all functionality. You have full control over which API endpoints are exposed and who can access them.

Discoverability

APIs can be discovered through websites – either by manual exploration or via automatic API discovery – as well as through search engines, API registries, and API directories.

Why allow public API access?

Distribution, brand visibility, data acquisition and insights, outsource the UI layer, revenue opportunities, enhanced user experience – accessibility, convenience, no need to navigate the UI...

Why block public API access?

Protect business interests, maintain brand experience, maximise revenue opportunities, control distribution, monetisation strategy, IP protection, quality control...

Use cases

Let's explore some API use case categories – internal use, assistants, device control, website, and web app.

Internal use

Internal use can mean use within an app (native or web), system, or organisation.

App

An API call can be triggered within an app by using any one of many interaction methods: keyboard/mouse (UI element, natural language prompt), voice, touch, automated scripts, and so on.

From a user experience perspective, the chosen input method depends on the context (device, task, environment), user goals, type of interaction, user preference, and skill.

System/organisation

An API can be used to communicate with APIs on the same or different device. Same-device communication can use direct function calls or HTTP via localhost. Cross-device – use mediators. Intranet – websites/web apps can communicate via API requests.

Assistants

A software assistant helps users complete tasks, answer questions, solve problems, or provide guidance. A digital helper that reduces manual effort and improves efficiency.

Many assistants include chat interfaces for easy interaction, while others work entirely through voice commands or other UI elements.

Example assistants include: personal, business, shopping, customer support, travel, code, and website/app.

Personal

A personal assistant helps manage and perform tasks, find and organise information, solve problems, and offer support. It can be a native or web app and can work offline. Example features:

  • Dashboards
  • Chat interface
  • Email management
  • Calendar scheduling
  • Travel planning
  • App management
  • Web search
  • Device management
  • Shopping assistance
  • Content/news aggregation
  • Website/app monitoring

Business

Same as personal, except for use in business contexts. A personal and business assistant could be integrated into a single app or kept separate.

Shopping

A shopping assistant helps someone find, compare, and purchase products more easily and efficiently. It can be built into an online store, or be part of an external website or app, such as a personal assistant. Example features:

  • Ask questions
  • Chat interface
  • Suggest products
  • Compare products
  • Semantic search
  • Alert to discounts
  • Update cart
  • Make purchases
  • Anticipate needs
  • Assistive UI features
  • Personalisation
  • Memory
  • Context awareness
  • Visual search
  • Purchase support
  • Price tracking
  • Stock alerts
  • Post-purchase support
  • Bookmarks

Customer support

A customer support assistant helps customers with questions, problems, or requests related to a product or service. Example features:

  • Ask questions
  • Returns management
  • Escalate to humans
  • Account management
  • Payment support
  • Order management
  • Knowledge base
  • Personalisation
  • 24/7 availability
  • Notifications

Travel

A travel assistant helps travellers before, during, and after a trip, by providing planning, booking, guidance, and on-the-go support to make travel easier, smoother, and more enjoyable.

The assistant can be part of a travel website/app or external website/app such as a personal assistant or comparison service. Example features:

  • Itinerary suggestions
  • Make travel bookings
  • Manage travel bookings
  • Recommendations
  • Activity suggestions
  • Ask questions
  • Route planning
  • Visa assistance
  • Safety guidance
  • Flight information
  • Destination information
  • Weather information
  • Budget assistance
  • Deal alerts

Code

A chat-based code assistant helps programmers write, understand, debug, and optimise code. Example features:

  • Ask questions
  • Generate code
  • Perform actions
  • Search documentation
  • Search knowledge base
  • Search codebase
  • Configure UI
  • Web search

Website/app

A website or app assistant helps users navigate and use the site or app, and can also perform tasks on their behalf. Example features:

  • Ask questions
  • Provide support
  • Navigation help
  • Search
  • Perform actions
  • Personalisation
  • Recommendations
  • Notifications

Intent recognition

To understand user intent in natural language, we use classical NLP techniques. If necessary, we fall back to text embedding-based semantic similarity, followed by an NLU model for deeper contextual understanding.

Language generation

Once an API endpoint has completed its tasks, such as fetching data, processing logic, planning, or interacting with a database, it typically returns the results as structured JSON data. The data can contain:

  • Raw text
  • Templates with slots
  • Structured data

The calling system decides how to assemble or generate the final language output – use the provided raw text or templates, or use various classical Natural Language Generation (NLG) techniques to generate diverse, fluent, and engaging language.

The API response provides the building blocks – a solid foundation. Language is the presentation and communication layer.

Our classical techniques offer determinism, controllability, consistency, trust, auditability, transparency, accountability, accuracy, precision, reliability, cost-efficiency, and suitability for high-stakes tasks.

Device control

A device can expose an API for external systems to connect to, if it's designed or configured to do so. Examples include thermostats, security cameras, lights, and speakers. Communication between, for example, a web app and the device, typically goes through a central API server – a mediator.

Website

A website can expose an API to allow external systems to access selected functionality. Examples:

  • Subscribe/unsubscribe
  • Account management
  • Content/news aggregation
  • Ask questions
  • Shopping
  • Research
  • Alerts
  • Monitoring
  • Contact

Web app

A web app can expose an API to allow external systems to access its functionality, such as signing in, sending and retrieving data, creating content, and enabling collaboration.

Augmentation

Augmentation refers to enhancing or expanding the capabilities of a system by adding new features, data, or functionality. Human plus machine working together. This contrasts with automation, where tasks are performed with little or no human involvement.

Continuum

The augmentation-automation continuum is a concept that describes a range of how tasks are distributed between humans and machines, from full human control to full machine autonomy. The continuum can be categorised into six levels:

Level 0 – None

No system involvement. Human performs the task entirely unaided. Examples include writing by hand, manual calculations, sketching, drawing, and data entry by hand. No assistance, suggestions, or automation.

Level 1 – Minimal augmentation

Very basic support, eg simple tools, information lookup. Human does almost all the work. Examples include using a calculator, spellcheck, syntax highlighting, basic search engines, and static dashboards. No decision-making help or automation of steps.

Level 2 – Low augmentation

Basic suggestions or guidance. Human remains fully in control. Examples include code autocomplete, predictive text, recommendation systems, and grammar/style suggestions. Suggest, predict, or recommend.

Level 3 – Moderate augmentation

System assists in meaningful parts of the task. Human still makes the final decisions. Example tasks include code generation, template generation, email drafting, photo enhancement tools, customer support triage. Significant reduction in manual effort.

Level 4 – High augmentation

System performs a significant part of the task. Human supervises and can override. Examples include various API workflows, code build and deployment pipelines, and chatbots. Human involvement is reactive, not constant.

Level 5 – Near-full automation

System performs the task independently. Human involvement is minimal or only for oversight. Examples include autonomous monitoring and alerting systems and fully automated code deployment systems. Humans are out of the loop, except for setup or maintenance.

Productivity

Refers to the output (work or value) produced per unit of input (time, resources, effort). It measures how much is achieved with the resources available.

Augmentation boosts productivity by enabling humans to complete tasks faster or with greater quality. Automation increases productivity by performing repetitive or time-consuming tasks without needing human intervention.

Efficiency

Refers to how well resources (time, effort, or materials) are used to achieve a given output. It measures how well something is done, with minimal waste or effort.

Augmentation increases efficiency by reducing effort or errors. Automation increases efficiency by removing human inefficiencies, eg errors, fatigue, and speeding up processes.

Up next

A blog post entitled 'Code', then on to showing the user interface – that'll take 4+ weeks.