Crawler Intelligence
Monitor which AI bots can crawl your website, analyze robots.txt and ai.txt policies, and detect changes in crawl access patterns.
Overview
Crawler Intelligence tracks how AI company bots interact with your website. It checks your robots.txt and ai.txt files to determine which AI crawlers are allowed or blocked, logs the results over time, and alerts you when access policies change.
If an AI model cannot crawl your website, it cannot learn from your content. Crawler Intelligence ensures you have full visibility into your crawl access posture.
Supported AI Crawlers
The system monitors the following AI crawlers:
| Bot Name | Operator | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training data and ChatGPT responses |
| ClaudeBot | Anthropic | Claude AI model training |
| Google-Extended | Gemini / AI Overviews | |
| PerplexityBot | Perplexity | Perplexity AI search |
| Amazonbot | Amazon | Alexa and Amazon AI services |
| CCBot | Common Crawl | Open training data corpus |
| Bytespider | ByteDance | TikTok and ByteDance AI |
| Meta-ExternalAgent | Meta | Meta AI features |
| AppleBot-Extended | Apple | Apple Intelligence |
| Bingbot | Microsoft | Copilot and Bing Chat |
How It Works
- Policy Check — The system fetches your website's
robots.txtandai.txtfiles and parses the directives for each known AI crawler. - Status Logging — Each bot's access status (allowed, blocked, or unknown) is recorded in the
crawler_logstable with a timestamp. - Pattern Detection — By comparing current results with historical logs, the system detects behavioral patterns such as newly blocked bots, newly discovered bots, or status changes.
- Alerting — Critical patterns (like a previously allowed bot being blocked) are flagged for review.
Dashboard Metrics
The Crawler Intelligence dashboard displays:
- Total Bots Monitored — Number of AI crawlers being tracked.
- Allowed / Blocked — Count of bots that can and cannot access your site.
- Bot Breakdown — Per-bot status with the source directive (robots.txt or ai.txt) and last check timestamp.
- Active Patterns — Unacknowledged pattern alerts requiring attention.
- Recent Logs — Timeline of the most recent crawl status checks.
Pattern Types
| Pattern | Severity | Description |
|---|---|---|
| New Bot | Info | A previously unseen AI crawler was detected in policy files. |
| Blocked | Critical | A bot that was previously allowed is now blocked. This can directly impact your AI visibility. |
| Frequency Change | Info | A bot's access status changed (e.g., unblocked after being blocked). |
| Stopped | Warning | A bot that was actively crawling has not been seen in recent checks. |
Critical patterns trigger immediate alerts. You can acknowledge patterns to dismiss them from the active view once addressed.
How to Use
- Navigate to Crawler Intelligence in the sidebar.
- Review the bot breakdown to confirm the right crawlers have access to your content.
- Investigate any critical patterns — a blocked bot means an AI model may stop referencing your content.
- Update your
robots.txtorai.txtif you need to change crawler access policies. - Acknowledge patterns once you have reviewed and addressed them.
Best Practices
- Allow GPTBot, ClaudeBot, and Google-Extended at minimum. These power the three most popular AI assistants.
- Review ai.txt in addition to robots.txt. Some organizations maintain separate AI-specific crawler policies.
- Monitor weekly for unexpected changes, especially after website deployments that may overwrite robots.txt.
- Set up alerts so you are notified immediately if a critical crawler gets blocked.
Plan Requirements
Crawler Intelligence is available on Pro-SME and above.