For Applied AI, Data = Work

Foundational AI and Applied AI operate on entirely different planes. So, the approach you take to build an applied AI company will also be very different.

Sidu Ponnappa

May 15, 2024

Foundational AI and Applied AI operate on entirely different planes. So, the approach you take to build an applied AI company will also be very different.

Foundational AI is about building large language models trained on huge swathes of data — public and proprietary. The goal is to create a flexible, multi-purpose "brain" that can be adapted to various downstream tasks.

Applied AI, on the other hand, is about creating specialized agents that can perform specific tasks within a defined workflow with high reliability and quality. The goal is not general intelligence, but targeted, superhuman performance on economically valuable activities.

Second, the nature of "data" is completely different in these two domains.

For applied AI,

data = work

This is so, so important to understand!

For foundational AI, data primarily means feeding it with vast amounts of text/images/video. But for applied AI, this data actually means a step-by-step tracing of how humans execute a specific task within a workflow.

For example, a human dev writing a Salesforce unit test for an Apex class would

  • start by reviewing the requirements and specifications for the Apex class

  • analyze the structure and logic of the Apex class itself, identifying the key objects, inputs, outputs, and decision points

  • define a set of test cases that cover the major functionality and edge scenarios

then for each test case, they would:

  • set up the necessary test data and preconditions

  • invoke the Apex class with the test inputs

  • assert that the outputs and side effects match the expected behavior

Throughout this process, the developer is drawing on their domain knowledge of the Salesforce platform, the Apex language, and unit testing best practices.

They're applying heuristics and pattern-matching to navigate the codebase and generate meaningful test coverage.

This domain knowledge and these heuristics — the actual human process of writing a unit test — are the "data" that matter for applied AI. And this data doesn't exist anywhere in a structured form. It's not in any textbook or online tutorial. It's locked in the heads of experienced developers.

So, the way to build an agent is to extract this knowledge from human workers and codify it into your agents.

Naturally, the only way to get more of this data is to do more real-world work.

With every new project, sprint, and task, you capture new process knowledge and feed it back into your agent development cycle. It's a fundamentally different motion than training a foundational model.

The applied AI flywheel

As you take on more projects and execute more work, you accumulate more proprietary process data. This allows you to train more capable agents, which in turn allows you to take on even more ambitious projects.

A competitive edge here will be a function of:

the depth and quality of the process knowledge you extract x the speed with which you can codify that knowledge into production-grade agent workflows