Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google search console couldn't fetch sitemap but index is ok #417

Open
MiroBabic opened this issue Nov 6, 2022 · 7 comments
Open

google search console couldn't fetch sitemap but index is ok #417

MiroBabic opened this issue Nov 6, 2022 · 7 comments

Comments

@MiroBabic
Copy link

I get couldn't fetch from gsc for sitemap files, but index is read fine. When I put my sitemap to any validator, it shows it is ok.

this is my sitemap.rb

SitemapGenerator::Sitemap.default_host = "https://www.xxxxx.xxxxx"
SitemapGenerator::Sitemap.compress = false
SitemapGenerator::Sitemap.create_index = true

SitemapGenerator::Sitemap.create(:max_sitemap_links=>45000) do

  [:sk, :en].each do |locale|
    add root_path.to_s + locale.to_s
    add "/#{locale}" + list_cities_path
    add "/#{locale}" + invoices_path
    add "/#{locale}" + orders_path
    add "/#{locale}" + contracts_path
    add "/#{locale}" + contractors_path
   
    
    City.find_each do |city|
      add "/#{locale}/city/#{city.slug_url}", :lastmod => city.updated_at
    end

    Contractor.find_each do |contractor|
      add "/#{locale}/contractors/#{contractor.id}", :lastmod => contractor.updated_at
    end

    Invoice.find_each do |invoice|
      add "/#{locale}/invoices/#{invoice.id}", :lastmod => invoice.updated_at
    end

    Order.find_each do |order|
      add "/#{locale}/orders/#{order.id}", :lastmod => order.updated_at
    end

    Contract.find_each do |contract|
      add "/#{locale}/contracts/#{contract.id}", :lastmod => contract.updated_at
    end

  end

  # Put links creation logic here.
  #
  # The root path '/' and sitemap index file are added automatically for you.
  # Links are added to the Sitemap in the order they are specified.
  #
  # Usage: add(path, options={})
  #        (default options are used if you don't specify)
  #
  # Defaults: :priority => 0.5, :changefreq => 'weekly',
  #           :lastmod => Time.now, :host => default_host
  #
  # Examples:
  #
  # Add '/articles'
  #
  #   add articles_path, :priority => 0.7, :changefreq => 'daily'
  #
  # Add all articles:
  #
  #   Article.find_each do |article|
  #     add article_path(article), :lastmod => article.updated_at
  #   end
end

what can I do to make it readable for google ?

@kjvarga
Copy link
Owner

kjvarga commented Nov 7, 2022 via email

@MiroBabic
Copy link
Author

yes, when I copy paste link from sitemap index for exact sitemap, it works ok
robots.txt are in root folder of web, also sitemap

you can check all here, all links works, robots, sitemaps
https://www.openstats.city/robots.txt

@kjvarga
Copy link
Owner

kjvarga commented Nov 8, 2022

Yes I'm able to access all the links fine as well. So I'm not sure what the issue is. The sitemaps look fine.

Please be aware that all your Invoice, Contracts and Orders data is publicly accessible!! e.g. https://www.openstats.city/en/invoices. you should secure that ASAP to prevent PII exposure, or worse. At the very least someone could use that data phish those users and get them to click malicious links, knowing details of their interactions with your site.

@MiroBabic
Copy link
Author

@kjvarga thanks, but thats ok, it is public data (opendata) and should be accessible to anybody to check where money flows. Thats all from opendata initiative to help with transparency how public money are spent

@kjvarga
Copy link
Owner

kjvarga commented Nov 8, 2022

haha oops I was worried!

@elcuervo
Copy link

I think I saw this behavior as well. I re ran ping_search_engines with the sitemap index and eventually they got marked correctly.
I our scenario I think it was related with the fact that our generated sitemap was huge and probably our cdn throttle some of the bot ips.
Also if you are redirecting to S3/Google via the app there might been app errors

@MiroBabic
Copy link
Author

I see something strange in my rails app log
it looks that google is looking for gz version of sitemap even in index is linked non gziped version

I, [2022-11-15T19:48:10.902295 #1576179]  INFO -- : [1671072e-6015-44ad-aa6c-914bc73f0917] Started GET "/sitemap12.xml.gz" for 66.249.75.241 at 2022-11-15 19:48:10 +0000
[1671072e-6015-44ad-aa6c-914bc73f0917] ActionController::RoutingError (No route matches [GET] "/sitemap12.xml.gz"):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants