<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1639164799743833&amp;ev=PageView&amp;noscript=1">
Diagram Views

Eliminating Language Spam With Google Analytics

Matt Brady
#Industry Insights, #Tutorials
Published on December 19, 2016
warren-wong-323107-unsplash-1

Learn how to ensure that your Google Analytics data is correct by eliminating any spam traffic that uses the language setting inappropriately.

Anybody who regularly uses Google Analytics to track data about how people reach and interact with a website is likely familiar with the problem of referral spam. This occurs when certain less-than-scrupulous people send traffic to a site through bots or send data directly to Google Analytics, often with the intent of generating traffic for themselves, or even hoping to infect people’s computers with malware.

This false traffic can greatly skew a site’s Google Analytics data, and combatting it is an ongoing concern. We’ve found that the best way to do so is to maintain a series of referral spam filters and regularly update those filters as new sources of referral spam are discovered. If you want to know more about how to do so, you can read our previous blog about eliminating referral spam.

The Latest Type of Google Analytics Spam

In the last couple of months, a new type of referral spam has arisen: language spam. This could be found in the Audience/Geo/Language section of Google Analytics, with the language for a certain number of site visits listed as some variant of “Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!” (note that the character ‘ɢ’ in “Secret.ɢoogle.com” differs from the lower case ‘g’ that would be displayed for any actual Google domains).

Google Analytics Language Spam

This spam traffic seems to be related to other referral spam, although since it affects an area of Google Analytics that people may not have been looking at, it might have gone unnoticed. However, it is definitely not legitimate traffic, since items in the Language report should only list the abbreviations for the language settings in a user’s web browser.

Eliminating Language Spam With a Filter

Similar to the filters that we use to eliminate referral spam from Google Analytics data, we can create a filter for this language spam. Since the Languages report in Google Analytics lists a user’s language setting in their browser, it should only include values like “en” (for English) or “fr” (for French). These short abbreviations are usually no longer than 6 characters, so we can create a filter that eliminates any entries longer than 15 characters. We’ll also set our filter to eliminate any characters that should not be included in these fields, such as periods, commas, and exclamation points.

To create your new filter, access the view for which you want to filter out referral spam and select Filters. Click the “+ New Filter” button and enter the settings for the new filter. Set the Filter Name to “Exclude Language Spam”. Select “Custom” as the Filter Type, choose “Exclude”, and set the Filter Field to “Language Settings”. For the Filter Pattern, enter the following regular expression:

.{15,}|\.|,|\!

Language Spam Filter

Before clicking Save, it’s a good idea to click “Verify this filter” to see how it will change your Google Analytics data. Once filters are created in Google Analytics, any traffic that they block will be excluded from your data, and that data cannot be recovered, so you want to make sure you won’t be inadvertently filtering out any legitimate traffic.

Regularly monitoring this type of false traffic data in Google Analytics can be a cumbersome task, but since it affects the veracity of your data, filtering out any referral or language spam is essential if you want to get an accurate picture of how people are reaching and using your site. Do you have any questions about Google Analytics or how to find and resolve issues with faulty data? Please contact us, or feel free to share any tips of your own in the comments below.