Most sales teams are sitting on the same unsolved problem. A list of leads exists. Someone is expected to research each company, understand what they actually do, and write a cold email that earns a response. Done well, that process takes twenty to thirty minutes per lead. Done at volume, it either gets rushed into something generic or it does not get done at all.
We built this system to solve that without compromising on what makes outreach work in the first place: genuine research, relevant context, and a message that respects the intelligence of the person receiving it.
The Problem We Were Solving
Cold outreach fails for one of two reasons. It is either too generic, a visible template with a first name dropped in, or too shallow, referencing surface-level facts that signal no real effort was made. Neither version earns a reply.
What earns a reply is specificity. A message that references something real about the company, connects it to a pressure that business is actually facing, and explains in plain language how a conversation might be worth fifteen minutes. The challenge is that building that kind of message by hand does not scale. Fifty well-researched emails in a row is not humanly sustainable.
The goal was to build a system that reads a company website with enough depth to understand what the business actually does, identifies the operational context that makes outreach relevant, and then writes an email that reads like a senior business development professional wrote it after doing real homework.
What We Built
The workflow runs inside n8n and uses Azure OpenAI for two distinct AI tasks: analysing the prospect's website and generating the personalised email. Lead data is managed in Google Sheets. The finished email is delivered via SMTP and the lead record is updated automatically to reflect the contact as processed.
Before a single AI call is made, the system runs three layers of qualification. It checks every lead against previously contacted records, against a blacklist of excluded addresses, and against duplicates within the current batch. Only genuinely new, contactable leads move forward. This is not an afterthought. It is the first thing the system does.
How the Workflow Operates
Lead Intake and Qualification
The workflow is triggered manually and reads the source lead list from Google Sheets. A custom JavaScript node immediately applies the three-way de-duplication logic, excluding previously contacted leads, blacklisted emails, and any duplicate entries within the same batch. What passes through is a clean, verified list of prospects that have not been reached before.
A second filter retains only leads that carry both a company domain and a valid email address. Any record missing either field is dropped before any further processing begins.
Website Research and Content Extraction
For each qualified lead, the system makes an HTTP request to the company homepage and extracts every anchor tag from the returned HTML. Those links are split into individual records, filtered to retain only fully formed URLs, and passed through a de-duplication step to remove redundant internal paths.
The crawl is deliberately capped at three subpages per company. This reflects a considered trade-off: about pages tend to describe a company in broad strokes, while service pages, case study pages, and sector-specific content contain the language that makes outreach specific and credible. Three pages is enough to build genuine context without creating an unnecessarily heavy process. Each subpage is fetched in sequence and the raw HTML is converted to readable Markdown before being passed forward.
Website Analysis with Azure OpenAI GPT-4.1
The Markdown content from the crawled pages is passed to the first AI model, GPT-4.1, via the Azure OpenAI endpoint. The model is instructed to act as a B2B sales research assistant and return a concise two to three sentence abstract capturing what the company does, its service areas, and any operationally relevant context.
The response is constrained to a strict JSON format so that output is clean and parseable regardless of the complexity of the source content. The abstracts from all three pages are then aggregated into a single research record that travels into the email generation step.
Personalised Email Generation with Azure OpenAI GPT-5.2
This is the core of the system. The aggregated research, combined with the lead's name, professional headline, company name, and location, is passed to GPT-5.2 Chat with a carefully constructed system prompt.
The model is instructed to write as a senior Business Development Manager with twenty years of experience, on behalf of Smart Tech LLC. The opening paragraph must reference specific, verifiable facts from the last eighteen months about the company's business and sector. No flattery, no filler, no fabricated content. The second paragraph must identify the top two pain points the company is likely facing, drawing on industry publications and sector news, and connect them naturally to what Smart Tech can address.
The format is fixed. A two-sentence opening grounded in genuine research. One to two sentences that explain relevance without sounding like a sales pitch. A direct call to action with a calendar link. Total length under 155 words.
The prompt includes strict prohibitions that were each added because the output without them was worse. No bullet points, no hashes, no asterisks anywhere in the body. No phrases such as “I hope this finds you well,” “I came across,” “I wanted to reach out,” or “touching base.” No AI-sounding language. The subject line is generated separately from the email body.
The model returns a clean JSON object containing the subject line and the complete email body with proper line breaks, ready to send.
Saving, Sending, and Updating Records
The generated subject and email body are written to the outreach tracking sheet in Google Sheets alongside the lead's details. The email is sent from the business development address. The source lead record is then updated to mark the contact as processed, so it will be excluded from all future runs automatically.
What the Output Actually Looks Like
The email that arrives in a prospect's inbox does not read like automation. The opening sentence references something specific to that company: a recent market expansion, a product line, a regulatory development affecting their sector, or a piece of published coverage. The second paragraph connects one or two of the realities they are navigating to a capability that is genuinely relevant. It reads like someone did the homework because the system did.
The signature is consistent across every email: name and title on the first line, company and phone on the second, website on the third. No variation, no formatting inconsistency.
Four Things We Learned
1. The Depth of the Research Layer Is What Separates This From Standard Automation
Crawling three subpages rather than only the homepage produces noticeably better emails. Homepages are written for broad audiences. Service pages, industry pages, and case study pages contain the specific language and context that makes an opening sentence credible to someone who knows their own business well. The three-page cap is a deliberate design choice, not a limitation.
2. Two AI Calls Produce Better Results Than One
Separating website analysis from email generation was an architectural decision made after testing the alternative. Asking a single model to read raw website content and simultaneously produce a polished, researched email produces inconsistent output. Giving the generation model a clean, structured research summary as its input produces consistently better results. The analysis step adds a small amount of latency and cost. Both are justified by the quality improvement at the generation stage.
3. The System Prompt on the Email Node Is Load-Bearing Architecture
The quality of every email this system produces is almost entirely a function of the specificity of the generation prompt. Vague instructions produce vague output. The prompt used here specifies what to reference, what to avoid, how the email must be structured, how long it must be, what phrases are forbidden, and how the signature must appear. Every constraint exists because removing it made the output worse. The prompt is not configuration. It is the product.
4. De-duplication Before AI Is Both a Cost Decision and a Reputation Decision
Qualifying leads before any AI call is made ensures the system never generates an email for a contact who has already been reached, who is on the exclusion list, or who appears twice in the source data. This matters for cost efficiency. It matters more for sender reputation. Delivering two researched cold emails to the same person, especially when both reference genuine research about their company, causes more damage than no contact at all.
How to Build This
Step 1: Prepare three Google Sheets tabs: one for source leads with columns for first name, last name, email, company domain, company name, headline, city, and country; one for previously contacted leads; and one for blacklisted email addresses.
Step 2: In n8n, configure three Google Sheets nodes to read from each tab. Connect them into the Remove Already Contacted JavaScript node, which handles all three de-duplication checks in a single pass.
Step 3: Add a filter node to drop any lead missing a company domain or email address. Only records with both fields present move forward.
Step 4: Build the crawling pipeline. An HTTP Request node fetches the company homepage. An HTML extraction node pulls all anchor tags. A Split Out node separates links into individual records. A Filter node retains only fully formed URLs. A Code node cleans the link structure. A Limit node caps the crawl at three subpages. A second HTTP Request node fetches each subpage. A Markdown node converts the HTML to readable text.
Step 5: Add an Aggregate node to collect all three page summaries into one research record per lead.
Step 6: Configure the website analysis node as an HTTP POST request to your Azure OpenAI GPT-4.1 endpoint. Write the system prompt to return only the JSON abstract format. Test with several company types before proceeding.
Step 7: Configure the email generation node as an HTTP POST request to your Azure OpenAI GPT-5.2 Chat endpoint. Paste the full generation prompt including all structure, length, and prohibition constraints. Pass the lead context fields and the aggregated research as the user message content.
Step 8: Add the Save Lead node to write the generated email to your tracking sheet. Add the Send Outreach Email node to deliver the message. Add the Update Row node to mark the lead as contacted in the source file.
Total build time for someone with working knowledge of n8n is three to four hours. The majority of that time is spent writing and iterating on the email generation prompt, which is also where the highest value is created.
What Changed for the Outreach Process
Before this system existed, writing a genuinely personalised cold email for a single lead required twenty to thirty minutes of research and drafting. A batch of fifty leads was a full working day, and the quality of the fiftieth email was rarely as strong as the first. The bottleneck was not motivation. It was the cognitive load of doing real research at volume.
After the system was live, a batch of fifty leads completes in under thirty minutes. Each email is grounded in research specific to that company. Subject lines are personalised. Opening sentences reference verifiable facts. The tone is direct and professional rather than corporate and distant.
The most significant outcome is that outreach quality no longer declines as volume grows. The research depth applied to the fiftieth lead in a batch is identical to the research depth applied to the first.
Final Thought
This system was built on a straightforward principle: AI should handle what is repetitive and time-consuming so that human judgment can be applied where it actually matters. Reading a company website and extracting operational context is a task that scales well with automation. Deciding which companies to pursue, which relationships to build, and which conversations to have are tasks that do not.
The result is a cold outreach process that does not trade quality for volume. It produces emails that respect the time of the person receiving them, because they are grounded in real context about that person's business. That is the only standard worth building to.
We have built this for our own outreach. We are ready to build it for yours.
