VIDEO TRANSCRIPTION

How to Remove Referrer spam and Fake Traffic from Google Analytics

So, you might have seen in your referral reports lately, some sources that don’t really send traffic to your website, and simply spam your reports, which makes them hard to read and sometimes unreadable. In this video, we’re going to learn why this actually happens, what you can do about it and make your Google Analytics usable again. All and more, coming up, right after this.
Hi there. And welcome to another video of measureschool.com where we teach you the data-driven way of digital marketing. My name is Julian, and on this channel, we do marketing tech reviews, how-to videos, and tutorials, just like this one. So if you haven’t yet, consider subscribing. Now recently, you might have seen a surge of new referral spam entering your Google Analytics account. Unfortunately, it is a long-lasting problem already, which the Google Analytics team wasn’t really able to address the recent changes. Why not? Who knows? If there’s anybody out there from Google, please, we still are looking for a built-in tool to block this referral spam from entering our account. But for now, we can still use some tricks to clean out our tracking and make it useful again. So let’s review some of the most effective methods for 2017. We got lots to cover, so let’s dive in.
Today our journey starts in our Google Analytics account here on our Demoshop, and this Demoshop doesn’t really exist. It’s actually a website that’s running on my local machines. So if you go to this website right now, you wouldn’t be able to see this, and my Google Analytics code wouldn’t even be executed. So the question is, why do I have some traffic in here? Well, some traffic actually generated through the tests that I do here in the videos, but there’s actually a lot of other different traffic that is coming in all the time. So how did this happen? Let’s actually look into our Acquisition reports under “All traffic,” and let’s look into “Source/medium.” And here we see, the bulk of the traffic is referral traffic, and this is actually referral spam because I don’t have real users coming to this website. What the scammer wants to do is actually to make me check out this website to scam me or download a virus, so better don’t ever go to these websites. Unfortunately, we also see some referrers in here that are legit, such as reddit.com, but again, I never received this traffic. These are not real people because this website is not even online. So why is this in here? Well, there’s another technique that they use nowadays. If you go here to the “Audience Overview”, and scroll down, you can see that they’re also spamming the language dimension within Google Analytics, and try to get our attention. So the question is, how do they actually do this?
Well, some of these techniques actually involve building a little software that will automatically go to your site and then execute your Google Analytics code. They do this with a spoofed referrer. So for example, here, I’m on the website of spam.com and I have changed around the link down here, so when I click terms, it will open up my Demoshop. This will generate a new page view. And if you go to Real-Time reporting, we see that our new page view was coming from spam.com. Now, this is something that you can do automatically with the help of a bot or a spider, just like the search engine Google does it as well. Only here, it is done for the purposes of spamming our Google Analytics account. How to get around this, we’ll get to in a second. The second method and the more problematic matter here is the use of the measurement protocol. Now, what is the measurement protocol? It’s basically the API of Google Analytics. It lets you send in data to any account in an automated way. It is really intended to connect any kind of device to your Google Analytics account, so for example, a cash register could be tracked with it, or a TV, but it can actually also be used for referrer spam. So for example, here we have the Hit Builder by Google Analytics and all we need to do is know the Google Analytics tracking ID, put in the right parameters, so for example, here I have the referrer spam, right here. I can also put a language spam in here and then we can send it off to this URL which will generate a hit within our account. So let’s send this, this was sent. Let’s go over to Google Analytics, and we see a new page view was generated, and we have again, a new page view with our source, spam.com. And this wasn’t even done by visiting our website. We have just simply sent over a page view. I can do this again and again and million times again without even having to visit your website once or to know anything about your website. So this is a completely automized way. Unfortunately, we cannot block these measurement protocol hits that come in because of the way the measurement protocol was constructed.
Anyways, now let’s start, how we can actually get rid of referral spam within our Google Analytics account. Now before we begin, a best practice is to build, actually, several views into your account. So for example here, we have a Demoshop with a Master, Test and Unfiltered view, because we will be installing filters and something might go wrong doing that. You might filter out too much or too little. You don’t want to harm your overall data, and an unfiltered view is recommended. So now let’s get into the techniques that you might want to use. First off, if you’re creating a new Google Analytics account, I would recommend to actually open up multiple properties. What does that mean? Well, if you know a Google Analytics account consists of account properties and views. Now, these properties can actually build 50 into one account carry a tracking ID, and this tracking ID, and especially the last digit here, is being counted up. Now I would recommend to use a property for your purposes with a two or higher. Why is that? Well as I mentioned, in the measurement protocol attacks here, we simply need to know the tracking ID to send an information. Now the spammers, previously, just guessed one of these numbers and these actually count up, and then attach a number one to the back and spam millions of accounts in an automated way. I would, therefore, recommend to use a three, four, or five, or six, seven as an account number. That way, you’re less likely to be targeted just because most other accounts end with the number one. Now, this would obviously not work for long because it’s easy to change within the spammer’s configurations. But if you’re starting out new with an account, consider using a higher digit number, and create an extra property to obtain that number. For all the accounts, this wouldn’t really work, because you wouldn’t be able to copy over all your data from one property to the next, and this would impair your ability to actually analyze data retroactively. So, for newer accounts, I would recommend this. For older accounts, you might want to keep data that you have in there.
Second technique, let’s take care of these spiders. Now, spiders, or bots, can be taken care of, first of all, taking the option in your View Settings, to exclude all hits from known bots and spiders. This has shown to decrease the referral spam a little bit, but not completely. This is obviously a feature that is built in and controlled by Google, so we don’t really know how they’re keeping up-to-date with all these known bots and spiders. But it’s a good option to keep ticked.
Next up, our Hostnames. So when a bot enters my website, he usually doesn’t include a Hostname. What does that mean? Well, if you go to our reporting here, and under the Audience, Technology and Network tab, we can find the Hostname report right here, and this shows the Hostname. Now normally, the Hostname would be your own website. So, in our case, Demoshop.com, because that’s what Google Analytics sends over by default. Now, if you have any kind of sub-domains where you have the same Google Analytics running, you need to take these into account, too. But all of these others are most likely spam. So, even the not set one are spam, completely. So what we can do is build a filter. Let’s go over to our Admin section, and build a filter for our view. Now you should be doing this on your Test, or Master view, and not on the unfiltered view. So we can add a filter here. And we can say, include Demoshop.com. This is a custom filter, so as a field, we’ll choose Hostname, and we’re only going to include our Demoshop.com. Now, this is a regular expression field, so we need to escape the dot with our backlash here, and this will only include our Hostname. You can always verify this filter, it will show what will be filtered out. This is just for the last seven days. So let’s save this, and you’re all set. You might want to include other domains if you have any kind of sub-domains, or other domains running on your Google account. Okay.
This should reduce your spam count by quite a bit already, but not completely because the host name can be spoofed really easily with the measurement protocol. So to get rid of these ghost referrers, we need to go to the hard way of blocking each out, individually in our filters. That involves going to your Reports here and looking into the spam that you don’t want to have. So for example, under our Acquisition report, we’ll look into the referrals. Seeing, “Okay, I don’t want to have this TrafficCash,” and I will just copy this, go over to our Admin section, and implement another filter. We’ll go over to Add Filters, we’ll call this the Referral Spam Exclusion. It will be a custom filter again. This time I will look for the source field, campaign source, here, and I will just implement what we have copied before. This will filter out this particular referral spam. Now if I had a second one, I would simply put in a pipe, which stands for or, within regular expressions, and I could put in my referrer here. So, for example, if there would be, a referrer traffic three, you would just line them up here. Unfortunately, this is restricted by a certain character count, so you can only put in as many as you can, in that manner. But, again, now you can go ahead and exclude these referrers. Let’s save this and you have your first filter set up to block out the spam. Now I’ll show you, in a minute, how you can do this a little bit more efficient. But the gist of it would be to go through your existing referrer spam list, and simply exclude each and every one from your account. Now, these are the most effective methods of blocking out referrer spam and preventing new spam from entering your account. But unfortunately, as we see here, this will not affect your existing data in your account, because these filters were not in place. These filters only apply to new incoming data. So what if you want to filter data retroactively?
You can only be doing this via Custom Segments. So these custom segments let you filter out data. All you need to do is add them to a segment. So we’ll add a new segment here, and say “New Segment.” This is our Exclude Referrer Spam Segment, which simply looks at the campaign source, and filters out our known referrers. And you can use the same list that you have made for your view filters. Now we already see 188 users would be affected, let’s exclude that data, and save this. Go back to our referral reports, and we see our referrer is now not present anymore. You could be doing this with all of the known spam referrers. I’ll show you, in a second, a faster way to accomplishing this.
So, to quickly recap what we have done so far. First of all, if you have a new account, try to use a property ID with a number two in the back. If that’s not possible because you want to keep your old data around, then definitely implement the Bot filtering option within the Admin section, implement a Hostname filter, and build view filters for your referrer spam. If you want to analyze any data that is currently in your account retroactively, you need to be using a custom segment that excludes your referrer spam. So these are the most effective methods. Now I want to show you, quickly, how you can do this all faster. There’s a great post by Mike Sullivan at Analytics Edge, that is, really, the definite guide to removing all Google Analytics spam. He has different techniques in here and explains the methods that we just discovered in depth, as well. The really useful bit I want to point out is the current Spam Crawler Filter Expressions. These are these expressions that you can enter in your filters. Just copy them over, and you will most likely be spam-free afterwards. Obviously, these, from time to time, need to be updated, because the spammers change their referrers all the time, and he has also a filter that lets you filter out these language spam from your account. Now if you have multiple properties and multiple websites where you want to install these filters, then I would recommend a tool by Seymour Ahava, who has built a spam filter tool that lets you upload these. It’s a little bit technical because you need to install it onto your website. But then it will connect to your Google Analytics account, and upload the filters that you have to find, right here. That way you can install these view filters much faster as typing them into the interface. Now the last thing I want to point out is, the referrer segment, from Mike Sullivan, which you can simply install into your account. So if you click on this, it will give you the segment of the Analytics Solution Gallery. You can import this to your account, select your view, create this, and it will be installed onto your account, so you’ll have it available under that segment. We now have All Users Spam Removed. Let’s get rid of ours here, and apply this and voila! We have got rid of all our known referrer spam in our account. So this is super-helpful to do things much faster. And by using these tools, you hopefully will stay free of referrer spam for the foreseeable future.
So there you have it. These are the methods that are currently working best for cleaning up your Google Analytics account from referrer spam. I know there are some other approaches out there that might be effective as well, and if you have any suggestions, please leave them in the comments below, so we can have a discussion about this, and solve this problem. As always, if you liked this video, please share it to a friend, or a colleague, and subscribe to this channel, to get more of these videos every Wednesday. My name is Julian, until next time.