Prerequisites
- You have a CloudFront distribution with Standard Logging v2 enabled or available
- You have access to the AWS Console with permissions to create Kinesis Firehose delivery streams
Step 1: Create an API Key
Create an API key in Promptwatch. Go to Settings → API Keys in your Promptwatch dashboard.Step 2: Create Firehose Delivery Stream
In the AWS Console, navigate to Amazon Data Firehose and create a new delivery stream.
- Go to Firehose in the AWS Console
- Click Create Firehose stream
Step 3: Configure Delivery Stream
Configure the delivery stream to send logs to Promptwatch via HTTP endpoint.
- Set Source to Direct PUT
- Set Destination to HTTP endpoint delivery
- Optional: Name the stream something descriptive
- Set the access key: Enter your API key from Step 1
We recommend storing your API key in AWS Secrets Manager.
- Set Content Encoding to GZIP

Step 4: Create S3 Bucket for Failed Logs
- Create or select an S3 bucket to store failed delivery logs (required by AWS)

Step 5: Attach Firehose to CloudFront
Now attach the Firehose delivery stream to your distribution.- Navigate to your CloudFront distribution → Logging tab → Add → Amazon Kinesis Data Firehose

Step 6: Select Delivery Stream
Select the Firehose delivery stream you created in Step 2.
Step 7: Configure Log Format
Configure the log format and field selection in CloudFront.- Select Additional settings
- For Field selection, select exactly these 10 fields:
timestampc-ipsc-statuscs-methodcs-uri-stemcs-uri-querycs(Host)cs(User-Agent)cs(Referer)sc-content-type
Optional: Filter on known AI Crawler user agents
Promptwatch identifies AI crawlers automatically and only stores AI crawler visits, all other traffic is discarded. You can safely send all your server logs without worrying about non-crawler data being retained. If you prefer to only forward AI crawler traffic, you can use the user agents listed below to pre-filter on your side. Keep in mind that you’ll need to maintain this list yourself as new crawlers emerge.OpenAI (3)
OpenAI (3)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| GPT Bot | GPTBot | Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot) | Used to crawl content for training OpenAI’s generative AI foundation models. |
| SearchBot | OAI-SearchBot | Mozilla/5.0 (compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot) | Used by ChatGPT search to surface websites in search results. |
| ChatGPT Citations | ChatGPT-User | Mozilla/5.0 (compatible; ChatGPT-User/1.0; +https://openai.com/bot) | Used for user actions in ChatGPT when visiting web pages. |
Anthropic (5)
Anthropic (5)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| Claude Bot | ClaudeBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com) | Used to crawl content for training Anthropic’s generative AI models. |
| Claude Citations | Claude-User | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; Claude-User/1.0; +claudebot@anthropic.com) | When individuals ask questions to Claude or use Claude Code, it may access websites using a Claude-User agent. |
| Claude Search Bot | Claude-SearchBot | Mozilla/5.0 (compatible; claude-search-bot/1.0; +http://www.anthropic.com/bot.html) | Navigates the web to improve search result quality for users. |
| Claude Web | claude-web | Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html) | Targeted crawler for recent web content, feeding the Claude browser agent with updated site data. |
| Anthropic AI | anthropic-ai | Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html) | Primary Anthropic crawler that collects broad web data for Claude model development. |
Google (3)
Google (3)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| Gemini | Google-Extended | Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html) | Controls whether content can be used for training Gemini AI models. |
| Google Mobile Agent | Google-Agent | Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Google-Agent; +https://developers.google.com/crawling/docs/crawlers-fetchers/google-agent) | Used by Google AI agents to autonomously browse the web and complete tasks on behalf of users (mobile). |
| Google Desktop Agent | Google-Agent | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Google-Agent; +https://developers.google.com/crawling/docs/crawlers-fetchers/google-agent) Chrome/W.X.Y.Z Safari/537.36 | Used by Google AI agents to autonomously browse the web and complete tasks on behalf of users (desktop). |
Perplexity (2)
Perplexity (2)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| Perplexity Bot | PerplexityBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) | Used to surface and link websites in Perplexity search results. |
| Perplexity Citations | Perplexity-User | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user) | Used for user actions in Perplexity when visiting web pages to answer questions. |
Cohere (1)
Cohere (1)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| Cohere AI | cohere-ai | Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html) | Collects textual data for Cohere’s language models, helping refine large-scale text generation. |
Mistral (1)
Mistral (1)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| Mistral AI Citations | MistralAI-User | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots) | Used for user actions in LeChat when visiting web pages to answer questions. |
DeepSeek (1)
DeepSeek (1)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| DeepSeek | DeepSeekBot | Mozilla/5.0 (compatible; DeepSeekBot/1.0; +http://www.deepseek.com/bot.html) | Used to crawl content for training DeepSeek’s generative AI models. |
xAI / Grok (3)
xAI / Grok (3)
| Name | User Agent | Full User Agent | Description |
|---|---|---|---|
| Grok Bot | GrokBot | GrokBot/1.0 (+https://x.ai) | Used for training Grok AI. |
| Grok Search | xAI-Grok | xAI-Grok/1.0 (+https://grok.com) | Used for Grok’s search capabilities. |
| Grok Deep Search | Grok-DeepSearch | Grok-DeepSearch/1.0 (+https://x.ai) | Used for Grok’s advanced search capabilities. |
Step 8: Finish
Set Output format toJSON.
Setup is now complete. CloudFront will stream access logs to your Firehose delivery stream, which delivers them to Promptwatch. Crawler logs will usually appear within a few minutes.