Skip to content

Comments

Allow Google bot/crawler#390

Merged
ashutoshamazix merged 1 commit intostagingfrom
fix/allow-crawler
Aug 21, 2025
Merged

Allow Google bot/crawler#390
ashutoshamazix merged 1 commit intostagingfrom
fix/allow-crawler

Conversation

@ashutoshamazix
Copy link
Collaborator

@ashutoshamazix ashutoshamazix commented Aug 21, 2025

Description

  • Modified Robots.txt to allow Google bots

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Summary by CodeRabbit

  • Chores
    • Updated crawler rules to allow Googlebot, AdsBot-Google, and Google-InspectionTool to access the site.
    • Added a default block for all other user agents to reduce unwanted crawling.
    • Removed the previous blanket block that could prevent desired indexing.
    • Cleaned up outdated comments for clarity.
    • Impact: Improves Google indexing and ad verification while limiting non-Google crawlers, potentially enhancing performance and crawl budget.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Walkthrough

Rewrote robots.txt to explicitly allow Googlebot, AdsBot-Google, and Google-InspectionTool while disallowing all other user agents. Removed the previous global block and an admonition comment. Added a catch-all disallow rule at the end.

Changes

Cohort / File(s) Summary
Robots policy
public/robots.txt
Replaced top-level disallow-all with per-agent rules: allow Googlebot, AdsBot-Google, Google-InspectionTool (empty Disallow); added final User-agent: * with Disallow: /; removed prior comment about AdsBot naming and reversed AdsBot-Google from blocked to allowed; newly allowed Google-InspectionTool.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my whiskers at sitemap scrolls,
Granting Google paws to reach their goals.
AdsBot hops through, Inspection too—
The meadow’s paths now clear and true.
For all else: hedges, gently tall—
A courteous “No peeking” wall. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/allow-crawler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Collaborator

@IAmKio IAmKio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashutoshamazix ashutoshamazix merged commit 1845e1e into staging Aug 21, 2025
5 of 6 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
public/robots.txt (5)

3-6: Use Allow: / for clarity; empty Disallow is valid but less explicit.

An empty Disallow is interpreted as “allow everything,” but being explicit improves readability, avoids lint warnings, and is consistent with common examples.

Apply this small edit:

-User-agent: Googlebot
-Disallow:
+User-agent: Googlebot
+Allow: /

7-10: Cover AdsBot variants that Google actually uses.

Google Ads landing page checks hit with multiple user-agent tokens (e.g., AdsBot-Google, AdsBot-Google-Mobile, AdsBot-Google-Mobile-Apps). If you intend to fully allow AdsBot, include these variants. Consider keeping AdsBot rules separate from general Googlebot rules so you can change them independently later.

Proposed adjustment:

-# Allow AdsBot (for Google Ads landing page checks)
-User-agent: AdsBot-Google
-Disallow:
+# Allow AdsBot (for Google Ads landing page checks)
+User-agent: AdsBot-Google
+Allow: /
+
+# Also allow AdsBot mobile variants
+User-agent: AdsBot-Google-Mobile
+Allow: /
+
+User-agent: AdsBot-Google-Mobile-Apps
+Allow: /

Optional (only if you run AdSense): also allow the AdSense crawler:

+User-agent: Mediapartners-Google
+Allow: /

11-14: LGTM; consider grouping with Googlebot and making the rule explicit.

Allowing Google-InspectionTool is helpful for Search Console tests. You can either keep it separate or group multiple Google UAs that share the same rule to reduce duplication. Also prefer “Allow: /” for explicitness.

Minimal tweak:

-User-agent: Google-InspectionTool
-Disallow:
+User-agent: Google-InspectionTool
+Allow: /

Optional grouping (single group for identical rules):

-User-agent: Googlebot
-Allow: /
-
-User-agent: Google-InspectionTool
-Allow: /
+User-agent: Googlebot
+User-agent: Google-InspectionTool
+Allow: /

If you want to future-proof for other Google-operated crawlers, you may also allow “GoogleOther”:

+User-agent: GoogleOther
+Allow: /

15-17: Catch‑all Disallow will block non‑Google bots, including social/link preview crawlers—verify intent (especially on staging).

With this rule, Bing, DuckDuckBot, Applebot, and social preview fetchers (facebookexternalhit/Facebot, Twitterbot, LinkedInBot, Slackbot, etc.) won’t fetch your pages. That can:

  • Break/unfurl link previews in social apps and chat.
  • Prevent non-Google search engines from indexing/crawling.
  • Interfere with uptime monitors and analytics crawlers that respect robots.

If “Google-only” is truly intended, keep it. If not, allow the common preview bots while still blocking everyone else.

Alternative that preserves “Google-only” crawling for search but enables previews:

+# Allow common social/link preview bots
+User-agent: facebookexternalhit
+User-agent: Facebot
+User-agent: Twitterbot
+User-agent: LinkedInBot
+User-agent: Slackbot
+User-agent: TelegramBot
+User-agent: WhatsApp
+Allow: /
+
 # Block everyone
 User-agent: *
 Disallow: /

Environment note: This PR targets the “staging” branch. If staging is publicly reachable, are we OK with Google crawling it? Many teams block staging (to avoid duplicate content) and allow only production. If needed, I can help set up environment-specific robots.txt (e.g., a dynamic robots endpoint or separate artifacts per deploy target).


3-17: Add a Sitemap directive to help crawlers discover content.

Including a Sitemap line is harmless even when access is restricted and helps Google find URLs more reliably.

Add at the top (replace with the correct absolute URL):

+# Sitemap
+Sitemap: https://<your-domain>/sitemap.xml
+
 # Allow Googlebot
 User-agent: Googlebot
 Allow: /

If you don’t have a sitemap, I can generate one or wire this to your build pipeline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f2cf96c and e90a7b5.

📒 Files selected for processing (1)
  • public/robots.txt (1 hunks)
🧰 Additional context used
🪛 LanguageTool
public/robots.txt

[grammar] ~3-~3: There might be a mistake here.
Context: ...xt.org/robotstxt.html # Allow Googlebot User-agent: Googlebot Disallow: # Allow...

(QB_NEW_EN)


[grammar] ~4-~4: There might be a mistake here.
Context: ... # Allow Googlebot User-agent: Googlebot Disallow: # Allow AdsBot (for Google Ad...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...g page checks) User-agent: AdsBot-Google Disallow: # Allow Google Search Console...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...g tool User-agent: Google-InspectionTool Disallow: # Block everyone User-agent: ...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...isallow: # Block everyone User-agent: * Disallow: /

(QB_NEW_EN)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants