Author Topic: robot.txt logging into RCForb Server here  (Read 1458 times)

Offline kf1p

  • Just starting...
  • *
  • Posts: 9
robot.txt logging into RCForb Server here
« on: October 19, 2017, 11:51:36 am »
On occasion I see someone or something log in as robot.txt on "Console" view
Is this a spambot or other such garbage ???   
What is this and how do I block this???

Please be specific.   OS is WIN 10

Make the instructions simple.....so those that are not the Conputer Gurus may implement this.

Thank you in advance for a solution.
Bob Partain
KF1P

Offline w8rj

  • Moderator
  • Remote Master
  • *****
  • Posts: 2402
Re: robot.txt logging into RCForb Server here
« Reply #1 on: October 19, 2017, 01:58:32 pm »
Not sure what would be going on. When you see it again would you copy and paste the console into a text file and attach it here.

Left click and drag the mouse to highlight a few lines before the robot.txt to a few lines after. Then a CRTL-C will copy it. Open Notepad and do a CRTL-V, save it and attach to your reply.
73
Roger
W8RJ

Offline m3ghe

  • Moderator
  • Remote Master
  • *****
  • Posts: 760
    • Barney's online radios
Re: robot.txt logging into RCForb Server here
« Reply #2 on: October 21, 2017, 07:05:21 am »
Either someone has robot.txt as a username (there isn't) or more likely a call is being made by a connection for a file called for robot.txt. When search engines like Google attempt to automatically index a site they look for any exclusions in the robots.txt file.
If you want to test this theory connect to your server through a browser using the format http://yourserverurl:4525/robots.txt/ where yourseverurl:4525 is the actual IP address and port of your server.
« Last Edit: October 21, 2017, 07:44:57 am by m3ghe »

Offline kf1p

  • Just starting...
  • *
  • Posts: 9
Re: robot.txt logging into RCForb Server here
« Reply #3 on: October 23, 2017, 04:15:18 pm »
It looks like one of those Web Crawlers attempting to "log In" and see what is here.

I will copya nd paste and send if it happens again.

Thank You for the two replies....
Bob Partain
KF1P

Offline kf1p

  • Just starting...
  • *
  • Posts: 9
Re: robot.txt logging into RCForb Server here
« Reply #4 on: December 08, 2017, 08:57:13 am »
12/8/2017 11:17:31 AM | Synchronized with RemoteHams.com
12/8/2017 11:27:35 AM | Synchronized with RemoteHams.com
12/8/2017 11:31:15 AM | Info: New Connection->66.249.64.206:45913 (Guest-52)
12/8/2017 11:31:15 AM | Info: BrowserRequest: GET /robots.txt HTTP/1.1
12/8/2017 11:31:15 AM | Info: Client closed connection.
12/8/2017 11:31:15 AM | Info: Disconnected->66.249.64.206:45913 (Guest-52)
12/8/2017 11:31:15 AM | Info: New Connection->66.249.64.206:46091 (Guest-96)
12/8/2017 11:31:15 AM | Info: BrowserRequest: GET /json.js?callback=jsonpORBCallback&_=1507507200000 HTTP/1.1
12/8/2017 11:31:15 AM | Info: Client closed connection.
12/8/2017 11:31:15 AM | Info: Disconnected->66.249.64.206:46091 (Guest-96)
12/8/2017 11:37:14 AM | Info: uPnP Requesting Router to Forward Ports
12/8/2017 11:37:14 AM | Info: Port Forwarded: 4525
12/8/2017 11:37:14 AM | Info: Port Forwarded: 4524

Here is the latest attempt.
Bob Partain
KF1P

Offline w8rj

  • Moderator
  • Remote Master
  • *****
  • Posts: 2402
Re: robot.txt logging into RCForb Server here
« Reply #5 on: December 08, 2017, 09:31:39 am »
Somebody clicked on your remote in the online remote list on the website.
73
Roger
W8RJ

Offline ny4i

  • Remote Enthusiast
  • ***
  • Posts: 32
Re: robot.txt logging into RCForb Server here
« Reply #6 on: December 09, 2017, 08:31:06 am »
I did a traceroute 66.249.64.206 (tracert 66.249.64.206 on Windows)

The result confirms this is one of the Google crawlers: crawl-66-249-64-206.googlebot.com (66.249.64.206)  63.543 ms  65.759 ms  58.995 ms

The Google crawler will hit every link on the remotehams website including every link on the online remote sites. One way to prevent this (unless there would be a valid reason to search the online remotes) would be for whomever manages remotehams.com to add a Disallow: /online.html. to the robots.txt file

As the server headers of remotehams.com state the site is Server: Apache/2.2.14 (Ubuntu), an example robots.txt is below:

User-agent: *
Disallow: /online.html
Disallow: /orb.html

Please note there is currently a robots.txt file in remotehams.com but it has the following:

User-agent: *
Disallow:


If the intent was to stop any crawling, that needs to be Disallow: / As it stands now, that is essentially a NO-OP as Disallow: alone does nothing.

If you want to see all the pages that are in Google, simply go to Google and enter remote site:remotehams.com and you will see many pages (including the online remote list).

Tom NY4I



Tom NY4I

Offline m3ghe

  • Moderator
  • Remote Master
  • *****
  • Posts: 760
    • Barney's online radios
Re: robot.txt logging into RCForb Server here
« Reply #7 on: December 09, 2017, 10:17:20 am »
I think I have said this before, it is an automated call from a search engine for the robots.txt file, if it finds it and reads the content the search robot should leave. If however there is no robots.txt file you will see an error. What ever it is not a problem.
With out this activity the remote servers would not be indexed by the search engines so no one new would find them.
« Last Edit: December 09, 2017, 10:41:51 am by m3ghe »

Offline ny4i

  • Remote Enthusiast
  • ***
  • Posts: 32
Re: robot.txt logging into RCForb Server here
« Reply #8 on: December 09, 2017, 12:32:58 pm »
I read that and understood it. I do understand it is an automated hit from the Google crawler.

My point was to drill a bit deeper into how robots.txt was interacting with the remotes from my observation. It is entirely possible to set the robots.txt file on the remotehams.com domain to allow finding the remote on the /online.html page if that is desired. But if you want to avoid having Google hit the remotes themselves, then a Disallow to /orb.html would do that.

User-agent: *
Disallow: /orb.html

If that is desired to allow the online.html page to be indexed but prevent Google from accessing the Orb itself. At a minimum, adding this one thing to robots.txt will stop that question from coming up in the future as remote servers would not see the hit from the crawler.

Tom NY4I
Tom NY4I