PAD Spam

If you are maintaining a PAD Website, you know that there are a lot of spammers out there trying to take advantage of your site without contributing anything useful to its content. Although it can be nice to find your site has gained a few thousand liistings overnight, it doesn't make your site any better if those new listings are spam.

I'm providing this page as a resource that addresses some of the techniques these spammers use and what can be done about it. Note that this page assumes you maintain or are developing a PAD site, and that your intent is to provide visitors with a catalog of freeware and shareware software, and that you are not interested in posting or promoting any other type of product or service.

What the Spammers Do

Generate Links to Their Site

One reason why spammers submit PAD files is to establish links to their site. Not only will this lead to more users finding their way to such sites, but it can also improve the site's ranking on search engines due to the larger number of incoming links.

It is normal and reasonable for software authors to want links to their sites. But if there is no software behind it then it's just spam. They may provide a description and a graphic for a screenshot, but both the application and company URLs point to their site, which is not related to any software. In some cases, they may put a little more effort into making it look like a real application. They may produce a more convincing description and authentic-looking screenshot. They may even have software it could be some pointless software like an eBook or a bogus browser search toolbar that serves no purpose other than to direct traffic to their site.

Either way, content from such spammers will not benefit users of your site looking for quality freeware or shareware.

Third-Party Commissions

Less common are webmasters like yourself. (Well, hopefully not exactly like yourself.) They have a Website with their own catalog of shareware. In addition, they are taking advantage of various affiliate programs to get a cut of the profit when a visitor registers software originally discovered through their site. This is all perfectly valid. But some of these webmasters will submit "copy" PAD files to other PAD sites, trying to increase their profits.

Their copy PAD file can have links to their site instead of the shareware author's site. So when a visitor from your site clicks a link to find out more about a shareware listing, it takes them to a description on the spammer's site. Then, if the visitor ends up registering the software, the spammer's site will earn a commission. In some cases, they will even provide links that contain query arguments. The link may be to the real author's site or an affiliate site such as RegNow, but the query arguments will identify the spammer's affiliate ID. Again, if a visitor clicks this link and ends up registering the shareware, the spammer's site will earn a commission.

Either way, they are not providing original content. And because they have their own catalog, they have the ability to fully automate massive spamming campaigns that are bound to generate some income for them.

Duplicate Submissions

In some cases, spammers will develop perfectly valid software but then just spam your site with it by submitting multiple copies. Sometimes the submissions are exactly the same except for the URL of the PAD file, other times they may change the version or even the name, still other times they may actually create variations on the software and submit all of those variations. Ultimately, you as Webmaster will have to determine how unique a submission must be to be listed on your site. Some have suggested that these duplicates could result in search engines like Google penalizing your site for duplicate content but I'm not sure that this is the case.

Are you familiar with other techniques used by those who spam PAD sites? Please let us know.

How You Can Fight Back

Allow Submissions from Humans Only

First and foremost, you should not allow fully automated submissions of PAD files to your site. Yes, you want to make it easy for authors to submit multiple PAD files. But no matter what measures you take to stop the spammers, they will just continue as long as they can spam you as part of an automated process. You may decided to allow fully automated submissions while you are building your catalog, but you will not have any control over PAD spam while you do. Ultimately, you need a CAPTCHA control of some sort, which requires the submitter to enter characters seen in a distorted graphic. This prevents the process from being fully automated and, I would expect, cause many spammers to not even bother with your site.

Maintain a Blacklist

Next, it is critical to create a "blacklist" of all the domains found to contain spam in the past. This list should contain the base domain from the PAD file URL (e.g. the base domain of "http://www.fileparade.com" is "fileparade.com"). Because any subdirectories or prefixes like "www" are optional, you should not include these in the comparison. Your submission code needs to scan the blacklist and halt the process then and there if the file being submitted is from a domain on this list. I recommend providing a terse error message like "You have been blocked from submitting any further submissions" rather than providing too much information to spammers about your banned list of domains. My blacklist has become fairly large and is growing all the time. You can view it at the following link.

File Parade's Banned Domains

Review Submissions

As developers, we all want to automate as much of the operation of our Websites as possible. Although I could write an algorithm that could make a pretty good guess at whether or not a submission is spam, I don't know of any way to do this reliably. Therefore, you really need to personally approve each submission. On my site, new submissions do not appear on the site until I've personally approved them. I'm not in a position to actually download and install each piece of software. But my tools make it easy to view the screenshot and application home page, and to review all the other fields of the PAD file. And I can do this in just seconds. Although I could still be fooled by some types of spam, this process works reasonably well. And I post the fact that submissions will not appear on the site until they have been approved. Again, this would cause most spammers to not even bother with your site.

The following list discusses some of the types of things you can check when reviewing a submission in order to determine whether or not it is spam.

  • Are all URLs to the same primary domain?
    In some cases, companies will have one domain for each product plus another domain for the company URL. This is perfectly valid. But some PAD files creators are not promoting software and so they may include links to someone else's site to make it look like they are.
  • Do some URLs include scripting arguments?
    This can be valid but can sometimes indicate problems. Some PAD site owners will submit "duplicate" PAD files to other sites and set the link to credit them as the affiliate, earning them a commission if a user buys the software through your site.
  • Are URLs of the correct type?
    Most PAD site owners require that the download URL links to the actual download file, and not a Web page that may or may not link to this file. Also, image URLs such as for the screenshot should link to a graphics file and not a Web page that may or may not link to this file. Beware though, it is possible, for example, to have a URL that points to a GIF file (which is a common graphics file) but gets redirected to a Web page.
  • Read the Description
    Does it describe software and does it appear consistent with the title, company, screenshots and other links? I've seen cases where the description may talk about a screensaver of photos from some place like Italy but, when follow the link to the application page, there is no mention about any screensaver. Rather, the page is about book trips to Italy.
  • Look at the screenshot image. Here are some things you might look for:
    • Is it Software?
      Looking at the screenshot image is a very quick and easy way to determine if the PAD file is at least claiming to represent a software product. Sometimes you can tell that the PAD file promotes a Website and not desktop software. Note that some valid products, such as screensavers, may include a screensaver image, which won't look like an application. Also, some companies put an image of the product box instead of a screenshot. These may not be reasons to reject the submission but should cause you to carefully examine other elements of the submission.
    • Is it Useful Software?
      A lot of submissions include pointless software intended to promote a non-software product or service. For example, some people will create a screensaver (which can be easily be done) of something related to their business. But it's just a way to get you to link to their site and increase traffic and their site's Google rating. Other people will create some kind of browser toolbar intended to help them find information on a particular subject (which, again, can be done easily). This is also just an attempt to build traffic to their own site. In both cases, the submission serves no useful purpose to a shareware site and should be rejected.
    • Does it Look Familiar?
      If you are manually approving hundreds of PAD submissions, you will soon begin to recognize duplicates. In some cases, the submitter might take a screenshot of another application and edit the image to show their name in the caption bar. Whether or not the submission is valid or what to do about will require further consideration. But it's definitely a red flag if you've seen this screenshot image many times before.
  • Look at the Application Page
    Is it Related to the Product Title? The application URL should describe the product represented by the PAD file. If this page does not mention the title in the PAD file, then the submission should probably be rejected. If the page mentions the title but is substantially about something else, then that should be a red flag. Note that some authors will set this link to their main page, possible so visitors will be exposed to all their and this may be acceptable if the related product is easy to find. Note that still other authors may not have separate pages for each product, which is valid.
  • Look at the Company Page
    Is it valid? This is a little harder to judge. Normally, this page should be for the company named in the PAD file. Ideally, this page will list one or more software products, which includes the one in the PAD file. But just because the company page doesn't mention the product or sells things other than software is not a reason to reject the submission. If everything else looked okay, then you may judge the submission to be valid.

Find Existing Spam in Your Catalog

If you are reading this, there is a good possibility that you already have a catalog of PAD files. As mentioned previously, I know of no way to automatically detect which of those items might be spam. Based on my work, I can offer the following techniques to help find spam entries. IMPORTANT: None of these criteria confirm an entry as spam. In all cases, you will need to manually inspect each listing to determine whether or not it belongs on your site.

  • One approach is to build a list of all the domains for your PAD files along with the number of PAD files associated with each domain. If you sort by the number of PAD files from each domain in descending order, all the items at the top of your list will be suspect. Most authors will only have a handful of programs, and any shareware sites that are submitting copy PAD files of their own catalog will appear near the top of this list. However, there are some types of products where the author is able to produce many titles, and some spammers will only submit a couple of PAD files from the same domain.
  • A good test is to search the URLs, particularly the screenshot URL, for terms like "debt", "finance", "loan", "mortgage", "cash", and similar terms. I found that many of the sites spamming just to build links are associated with debt, loans or the like.
  • Another test to try is if the company URL and application URL are the same. If they are, you should check to see if this URL mentions the product name. If not, it's almost certainly spam. Some sites only have one application and so the application URL will be the same as for the company, and some valid authors might decide to point the application URL to their home page (which isn't the intent of this field but may not be reason enough to reject the submission).
  • Often, the first test I'll do is check the screenshot URL. Spammers will sometimes put just a picture at this URL, or they may even redirect to another page. If the screenshot does not look like an application, you need to determine what it is. Note that some valid sites will link to a picture of their logo, or a simulated image of the product box. In addition, authors that are submitting duplicates of their software may vary fields such as the domain, company name, and title, etc. but use the same screenshot. So sometimes a screenshot can help identify software as coming from the same company.
  • If the download URL contains "affiliate=", "affiliateid=", "regnow", or the name of other affiliates sites, there is a high probability that submission is intended to earn a commission off of referals from your site and does not reflect original software. Sometimes the domain is even to an affiliate site such as RegNow. You could even search for "?" in this URL, but this will also return a lot of valid submissions.
  • Another potential red flag is when the company name varies between submissions from the same domain. It could be variations in the way they write their company name or it could be a company that just likes to put the product name in the company field, but it could also be spam.

Are you familiar with other techniques than can be used to combat PAD spam? Please let us know.