Monday, January 9, 2017

Update of the Great Bot Hunter for January 9, 2017

The .htaccess file has been changed to remove blocks on members that have reported in, plus the addition of new blocks on a number of new sites from Eastern Europe and Russia.

I have not repeated the list of blocks here but if anyone wants it, just ask. If you do not have access please remember to send me an e-mail to:

cefmatrix@gmail.com

Two other major steps were completed today:

  1. The posts on the MAIN FORUM that deal with this issue have been moved to a MEMBERS ONLY section of the web site. That means that if you are not a member you do not even know that area of the forum exists.
  2. The CEFSG WIKI has been under increased attacks since the FORUM blocks were put in place so that is the one that is now PASSWORD protected. You need to enter both the correct USER NAME and PASSWORD to get into the Wiki. It has now been accessed by a member so I can confirm it is working - yahoo! That is not a simple process. Details on how to gain access is provided in the new MEMBERS ONLY section, as we don't want to publish that here on an open forum.
The size of the log file was significantly reduced today, which I take as a good sign. The really BAD BOT "SEMrushBOT" is not going down without a fight as it made 18,746 attempts to get into the site yesterday - all of which were successfully blocked. There were 36,582 entries in total, many of which were Google and Bing bots that are now rejected as well. Basically, if you are a BOT then "BUG OFF"for now and we will reconsider you later. GOOGLE and BING are "good bots" but for now we have to keep everyone away until we get this under control.

Lastly, we have had some very generous donations to the RESERVE FUND so I extend my thanks to all involved. As requested at the start of the DONATIONS program, the names of the donors are not identified publicly. Each donor "should" be receiving an automated confirmation e-mail once a donation is received.

Hopefully all of this will come to an end soon, it has been an annoyance to say the least!

Richard

Sunday, January 8, 2017

Changes for January 8, 2017

The .htaccess file has been changed to include these items:

# from Register.ca for Semrushbot 46.229. and then RVL added many from log of 6 January 2017 using Excel spreadsheet to sort into large groups
# bingbot is 157.55.39.104 and then added just 157.55.39. as many others used; 207.46.13.10
# mj12bot found in large group from 163.172.68.136 - on Sunday 8th made the block on 163.172.68. to catch all the rest (drop the .68.136 at end); also 46.4.32.75
# Domain Re-Animator Bot 167.114.156.198
# BoogleBot is 52.90.230.103 - very large amount on 6 January 2017 from IP in Seattle USA
# profound.net/domainappender trying to read .htaccess file 54.147.153.234
# downloading wiki from London UK 62.210.148.247; France 91.121.97.49
# large downloads from this Windson IP so temp block 70.51.99.113
# blocked any large groups from China, Russia, etc. also large group from Germany - Russia 5.255.250.66; Kiev 82.193.100.107; 91.200.
# Amsterdam 46.229.
# Sunday 8th added a large block of easter europe all in the 178. range; China 180.
# Sunday a very large download in Edmonton from 50.71.222.81 - system files
# list Google sites here: 66.249.73.142 - that did stop earlier as 66. was blocked, block 66.249 in case need to open 66.
# list Yahoo sites here: 68.180.228.157
# xpymep.exe at 146.158.83.16; 90.154.70.102
# THESE ONES HAVE BEEN OPENED AFTER THE INITIAL BLOCK WAS PUT IN PLACE
# 67 (Ted Walshe, Montreal & Neil Burns); 69 (Diane Johnson Brantford - changed Sunday new IP); 124 (Mark, Australia)

Require all granted
Require not ip 10.8.163.19
Require not ip 10.8.174.151
Require not ip 104.140.
Require not ip 2.
Require not ip 5.196.167.230
Require not ip 5.9.94.207
Require not ip 5.255.250.66
Require not ip 10.
Require not ip 10.8.
Require not ip 11.
Require not ip 12.
Require not ip 13.
Require not ip 14.
Require not ip 15.
Require not ip 16.
Require not ip 17.
Require not ip 18.
Require not ip 19.
Require not ip 46.
Require not ip 46.229.
Require not ip 50.71.222.81
Require not ip 52.90.230.103
Require not ip 54.147.153.234
Require not ip 61.
Require not ip 62.
Require not ip 62.210.148.247
Require not ip 63.
Require not ip 64.
Require not ip 65.
Require not ip 66.
Require not ip 66.249.
Require not ip 66.249.73.142
Require not ip 68.
Require not ip 68.180.
Require not ip 70.51.99.113
Require not ip 77.248.252.113
Require not ip 82.193.
Require not ip 90.154.70.102
Require not ip 91.121.
Require not ip 91.200.
Require not ip 104.
Require not ip 104.140.
Require not ip 106.
Require not ip 107.
Require not ip 108.
Require not ip 109.
Require not ip 110.
Require not ip 111.
Require not ip 112.
Require not ip 113.
Require not ip 114.
Require not ip 115.
Require not ip 116.
Require not ip 117.
Require not ip 118.
Require not ip 119.
Require not ip 120.
Require not ip 121.
Require not ip 122.
Require not ip 123.
Require not ip 125.
Require not ip 126.
Require not ip 127.
Require not ip 128.
Require not ip 129.
Require not ip 133.130
Require not ip 144.
Require not ip 146.
Require not ip 151.237.
Require not ip 151.80.
Require not ip 154.20.7.7
Require not ip 157.55.39.104
Require not ip 157.55.39.
Require not ip 163.172.
Require not ip 165.231.
Require not ip 167.114.156.198
Require not ip 173.234.159.250
Require not ip 178.
Require not ip 180.
Require not ip 207.46.13.10
Require not ip 207.81.234.223


As well all these bots are blocked. The first line is supposed to block all bots with the "*" but many seem to get around this process:

User-agent: *
Disallow: /
User-agent: mj12bot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: ChinasoSpider
Disallow: /
User-agent: ToutiaoSpider
Disallow: /
User-agent: Yahoo Slurp
Disallow: /
User-agent: YahooCacheSystem
Disallow: /
User-agent: Sosospider
Disallow: /
User-agent: bingbot
Disallow: /
User-agent: EasouSpider
Disallow: /
User-agent: JikeSpider
Disallow: /
User-agent: YYSpider
Disallow: /
User-Agent: YoudaoBot
Disallow: /
User-Agent:360Spider
Disallow:/
User-Agent:360Spider-Image
Disallow:/
User-Agent: sogou spider
Disallow: /
User-Agent: Sogou web spider
Disallow: /
User-Agent: MJ12bot
Disallow:/
user-agent: AhrefsBot
disallow: /
user-agent: YRSpider
disallow: /
user-agent: 360spider-image
disallow: /

Saturday, January 7, 2017

SYSTEM CHANGES: Saturday January 7, 2017 $$$$

Unfortunately the BOTS have not only continued to ATTACK but they are growing in number.

Changes made to date have significantly reduced our "compute cycles" but we are still way over the limit and the costs are increasing daily. You can see the effect the changes made to date have made in the graph:

https://www.mediafire.com/convkey/87ef/i1r78skw21rv1kb6g.jpg



Our compute cycles have dropped from 2500 a day to less than 600 a day and our bandwidth has dropped from 4000 to less than 800. That is from blocking some of the worst offenders.

The PROBLEM is that we are only allowed 3000 compute cycles for the WHOLE MONTH period, which runs from the 22nd of Month A to 21st of Month B. Additional charges for the period 16 November to 15 December were $360.26 so there will be another large charge for December-January. We normally pay less than that for the whole year!

A few of you know that I was making changes to increase the BLOT BLOCKAGE today, as I made a typo and the server crashed. That was quickly fixed and from that point on I tested out all the files on a blank directory before they are set to load on the site here. That work is all done and the files are going to be loaded shortly. I do check to see that there are not too many people on the system when I make that change but unfortunately I can not account for someone who is logging on at the exact same time the files are uploading.

The new .htaccess file that carries all the new blocking code will be blocking a large number if IP addresses and ranges of IP addresses associated with what appear to be the main offenders. We have tried other less severe approaches and the BOTS are bypassing them easily. Neil found a way to just say "NO BOTS PLEASE" and they continued the attack. It could be it takes a few days but at the rate the costs are increasing we have to take every step possible.

IF YOU FIND that you are blocked, then I hope that you remembered to bookmark the site where the announcements are posted here: 

http://cefresearch.blogspot.ca/

as I will post a copy of this message at that location. I will put in the e-mail address for the MATRIX in case you don't have that, then you can send me a message with your I.P. address so I can exempt that block.

Find your IP address: http://whoisip.ovh

I will also provide a list of blocked addresses to date at that location. Doing that here would just let the BOTS know who is being blocked.

I also need to you to tell me if any of you are using these IP addresses as they appear somewhat normal, of origin in the USA or Canada, but are associated with what appear to be large data downloads:

Halifax 142.177.247.89
Winnipeg 142.161.238.42
Ottawa 131.137.88.71
Melbourne Australia 124.191.103.90 (might be the Diggers)
Saskatoon 128.233.6.93
Redmond Kansas 131.253.24.140
Calgary 137.186.55.136
Tatamagouche (N.S. or P.E.I?) 142.134.66.130
Brantford 50.100.61.223
Edmonton 50.64.149.31 or 50.68.152.77
Windsor 70.51.99.113 (that is now blocked as it does not appear to be HUMAN - if you are call home!)

I will post this now and come back at 3:00 pm EST to upload the new blocks. You might want to finish any posts you are working on now prior to that time, JUST IN CASE! It should work cleanly but better safe than sorry.

Stay tuned to the other announcement site over the next week or until further notice, as if this keeps up and the blocks DON'T WORK then we will probably need to block the site, or we will go broke.

I have already sent in a submission to a VPS PROVIDER (http://vpsville.ca/) to determine what is involved (and costs) to switch to a VPS (Virtual Private Server) system. I know nothing about those, other than what I have read, and I do not know if I would have the necessary skill set or time to operate such a system. I have not heard back from this as of this date. Reading tells me there are MANAGED VPS and STAND ALONE VPS and so if we can get a MANAGED system, it might be okay. Best is if we can stay where we are now if we can beat the botters to death!

Fingers crossed, now also crossing toes,

Richard

ADDED TO THIS BLOG POST ONLY

Here is "section" the file that is now on the system to do the blocking. If you see one that is a short version such as "Require not ip 124." that means that it is blocking any IP that starts with that series of numbers. If your IP has any of the front numbers on the list you have to let me know by e-mail to cefmatrix@gmail.com.

# from Register.ca for Semrushbot 46.229. and then RVL added many from log of 6 January 2017 using Excel spreadsheet to sort into large groups
# bingbot is 157.55.39.104 and then added just 157.55.39. as many others used
# mj12bot found in large group from 163.172.68.136
# Domain Re-Animator Bot 167.114.156.198
# BoogleBot is 52.90.230.103 - very large amount on 6 January 2017 from IP in Seattle USA
# profound.net/domainappender trying to read .htaccess file 54.147.153.234
# downloading wiki from London UK 62.210.148.247
# large downloads from this Windson IP so temp block 70.51.99.113
# blocked any large groups from China, Russia, etc. also large group from Germany

Require all granted
Require not ip 46.229.
Require not ip 10.8.163.19
Require not ip 10.8.174.151
Require not ip 10.8.
Require not ip 104.140.
Require not ip 107.
Require not ip 108.
Require not ip 109.
Require not ip 110.
Require not ip 111.
Require not ip 112.
Require not ip 113.
Require not ip 114.
Require not ip 115.
Require not ip 116.
Require not ip 117.
Require not ip 118.
Require not ip 119.
Require not ip 12.106.
Require not ip 120.
Require not ip 121.
Require not ip 122.
Require not ip 123.
Require not ip 124.
Require not ip 125.
Require not ip 126.
Require not ip 127.
Require not ip 128.
Require not ip 129.
Require not ip 144.
Require not ip 146.
Require not ip 151.237.
Require not ip 151.80
Require not ip 157.55.39.104
Require not ip 157.55.39.
Require not ip 163.172.68.136
Require not ip 165.231.
Require not ip 167.114.156.198
Require not ip 17.
Require not ip 18.
Require not ip 5.196.167.230
Require not ip 5.9.94.207
Require not ip 52.90.230.103
Require not ip 54.147.153.234
Require not ip 61.
Require not ip 62.
Require not ip 62.210.148.247
Require not ip 63.
Require not ip 64.
Require not ip 65.
Require not ip 66.
Require not ip 67.
Require not ip 68.
Require not ip 69.
Require not ip 70.51.99.113
Require not ip 77.248.252.113


This may be duplication, but is also added to the .htaccess code as some can ignore the IP address blocks. These are the NAMES of the ones that I have found so far that are the main offenders:

# RVL added December 31, 2016 as Semrushbot is still invading the site
# found here: http://stackoverflow.com/questions/23631872/ban-robots-from-website
# they say "Also you can do this little trick, deny ANY ip address that has "SemrushBot" in user agent string"
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
SetEnvIfNoCase User-Agent "^SemrushBot" bad_user
SetEnvIfNoCase User-Agent "^Slurp" bad_user
SetEnvIfNoCase User-Agent "^dotbot" bad_user
SetEnvIfNoCase User-Agent "^Googlebot" bad_user
SetEnvIfNoCase User-Agent "^bingbot" bad_user
SetEnvIfNoCase User-Agent "^mj12bot" bad_user
SetEnvIfNoCase User-Agent "^ToutiaoSpider" bad_user
SetEnvIfNoCase User-Agent "^YandexBot" bad_user
SetEnvIfNoCase User-Agent "^domainappender" bad_user
SetEnvIfNoCase User-Agent "^bogglebot" bad_user
SetEnvIfNoCase User-Agent "^WhateverElseBadUserAgentHere" bad_user
Deny from env=bad_user

In the event that you understand any of this, then I can tell you that the following code was also added to CACHE the files on the site so they are not downloaded from the server every time someone visits a post:

# ADDED from http://www.fastcomet.com/tutorials/phpbb3/performance-optimization 28-12-2016
## EXPIRES CACHING ##

ExpiresActive On
ExpiresByType image/jpg "access plus 1 year"
ExpiresByType image/jpeg "access plus 1 year"
ExpiresByType image/gif "access plus 1 year"
ExpiresByType image/png "access plus 1 year"
ExpiresByType text/css "access plus 1 month"
ExpiresByType application/pdf "access plus 1 month"
ExpiresByType text/x-javascript "access plus 1 month"
ExpiresByType application/x-shockwave-flash "access plus 1 month"
ExpiresByType image/x-icon "access plus 1 year"
ExpiresDefault "access plus 2 days"

## EXPIRES CACHING ##

If you think I actually know what I am doing, think again! I go read about it on the web, try it out, and then sit back to see if it works. If anyone really knows how to do all this stuff, then you could be a big asset to the team! If you see line of code that starts with a # that means that line is just a note and is not read as code. There you will see I put in the URL of the information that I may have found on the web that tells me what I should do to fix the problem. Very often I have no idea what they are talking about so I have to learn that first.

Added later:

The file was uploaded at 3:02 pm EST and the site is still alive! Whew!

The blocks are now in effect.