If you are maintaining a PAD Website, you know that there are a lot of spammers
out there trying to take advantage of your site without contributing anything useful
to its content. Although it can be nice to find your site has gained
a few thousand liistings overnight, it doesn't make your site any better
if those new listings are spam.
I'm providing this page as a resource that addresses some of the techniques
these spammers use and what can be done about it. Note that this page assumes you
maintain or are developing a PAD site, and that your intent is to provide visitors
with a catalog of freeware and shareware software, and that you are not interested
in posting or promoting any other type of product or service.
What the Spammers Do
Generate Links to Their Site
One reason why spammers submit PAD files is to establish links to their site.
Not only will this lead to more users finding their way to such sites, but it
can also improve the site's ranking on search engines due to the larger number
of incoming links.
It is normal and reasonable for software authors to want links to their sites.
But if there is no software behind it then it's just spam. They may provide a description and
a graphic for a screenshot, but both the application and company URLs point to their
site, which is not related to any software. In some cases, they may put
a little more effort into making it look like a real application. They may produce
a more convincing description and authentic-looking screenshot. They may even
have software it could be some pointless software like an eBook or a bogus browser search toolbar that
serves no purpose other than to direct traffic to their site.
Either way, content from such spammers will not benefit users of your site looking for
quality freeware or shareware.
Third-Party Commissions
Less common are webmasters like yourself. (Well, hopefully not exactly like yourself.)
They have a Website with their own catalog of shareware. In addition, they are taking
advantage of various affiliate programs to get a cut of the profit when a visitor
registers software originally discovered through their site. This is all perfectly
valid. But some of these webmasters will submit "copy" PAD files to other
PAD sites, trying to increase their profits.
Their copy PAD file can have links to their site instead of the shareware author's
site. So when a visitor from your site clicks a link to find out more about a shareware
listing, it takes them to a description on the spammer's site. Then, if the
visitor ends up registering the software, the spammer's site will earn a commission.
In some cases, they will even provide links that contain query arguments. The link
may be to the real author's site or an affiliate site such as RegNow, but the
query arguments will identify the spammer's affiliate ID. Again, if a visitor
clicks this link and ends up registering the shareware, the spammer's site will
earn a commission.
Either way, they are not providing original content. And because they have their
own catalog, they have the ability to fully automate massive spamming campaigns
that are bound to generate some income for them.
Duplicate Submissions
In some cases, spammers will develop perfectly valid software but then just spam
your site with it by submitting multiple copies. Sometimes the submissions are
exactly the same except for the URL of the PAD file, other times they may change
the version or even the name, still other times they may actually create
variations on the software and submit all of those variations. Ultimately, you
as Webmaster will have to determine how unique a submission must be to be listed
on your site. Some have suggested that these duplicates could result in search
engines like Google penalizing your site for duplicate content but I'm not sure
that this is the case.
Are you familiar with other techniques used by those who spam PAD sites? Please
let us know.
How You Can Fight Back
Allow Submissions from Humans Only
First and foremost, you should not allow fully automated submissions of PAD files
to your site. Yes, you want to make it easy for authors to submit multiple PAD files.
But no matter what measures you take to stop the spammers, they will just continue
as long as they can spam you as part of an automated process. You may decided to
allow fully automated submissions while you are building your catalog, but you will
not have any control over PAD spam while you do. Ultimately, you need a CAPTCHA
control of some sort, which requires the submitter to enter characters seen in a
distorted graphic. This prevents the process from being fully automated and, I would
expect, cause many spammers to not even bother with your site.
Maintain a Blacklist
Next, it is critical to create a "blacklist" of all the domains found
to contain spam in the past. This list should contain the base domain from the PAD
file URL (e.g. the base domain of "http://www.fileparade.com" is "fileparade.com").
Because any subdirectories or prefixes like "www" are optional, you should
not include these in the comparison. Your submission code needs to scan the blacklist and halt the process then and there if the file being submitted is from
a
domain on this list. I recommend providing a terse error message like "You have been blocked
from submitting any further submissions" rather than providing too much information
to spammers about your banned list of domains. My blacklist has become fairly large
and is growing all the time. You can view it at the following link.
File Parade's Banned Domains
Review Submissions
As developers, we all want to automate as much of the operation of our Websites
as possible.
Although I could write an algorithm that could make a pretty good guess at whether
or not a submission is spam, I don't know of any way to do this reliably.
Therefore, you really need to personally approve each submission. On my site, new
submissions do not appear on the site until I've personally approved them. I'm
not in a position to actually download and install each piece of software. But my
tools make it easy to view the screenshot and application home page, and to review
all the other fields of the PAD file. And I can do this in just seconds. Although
I could still be fooled by some types of spam, this process works reasonably well.
And I post the fact that submissions will not appear on the site until they have
been approved. Again, this would cause most spammers to not even bother with your
site.
The following list discusses some of the types of things you can check when reviewing a
submission in order to determine whether or not it is spam.
- Are all URLs to the same primary domain?
In some cases, companies will have one domain for each product plus another domain
for the company URL. This is perfectly valid. But some PAD files creators are not
promoting software and so they may include links to someone else's site to make
it look like they are.
- Do some URLs include scripting arguments?
This can be valid but can sometimes indicate problems. Some PAD site owners will
submit "duplicate" PAD files to other sites and set the link to credit
them as the affiliate, earning them a commission if a user buys the software through
your site.
- Are URLs of the correct type?
Most PAD site owners require that the download URL links to the actual download
file, and not a Web page that may or may not link to this file. Also, image URLs
such as for the screenshot should link to a graphics file and not a Web page that
may or may not link to this file. Beware though, it is possible, for example, to
have a URL that points to a GIF file (which is a common graphics file) but gets
redirected to a Web page.
- Read the Description
Does it describe software and does it appear consistent with the title, company,
screenshots and other links? I've seen cases where the description may talk about
a screensaver of photos from some place like Italy but, when follow the link to
the application page, there is no mention about any screensaver. Rather, the page
is about book trips to Italy.
- Look at the screenshot image. Here are some things you might look for:
- Is it Software?
Looking at the screenshot image is a very quick and easy way to
determine if the PAD file is at least claiming to represent a software product.
Sometimes you can tell that the PAD file promotes a Website and not desktop software.
Note that some valid products, such as screensavers, may include a screensaver image,
which won't look like an application. Also, some companies put an image of the
product box instead of a screenshot. These may not be reasons to reject the
submission but should cause you to carefully examine other elements of the
submission.
- Is it Useful Software?
A lot of submissions include pointless software intended to promote a non-software
product or service. For example, some people will create a screensaver (which can
be easily be done) of something related to their business. But it's just a way
to get you to link to their site and increase traffic and their site's Google
rating. Other people will create some kind of browser toolbar intended to help them
find information on a particular subject (which, again, can be done easily). This
is also just an attempt to build traffic to their own site. In both cases, the submission
serves no useful purpose to a shareware site and should be rejected.
- Does it Look Familiar?
If you are manually approving hundreds of PAD submissions, you will soon begin to
recognize duplicates. In some cases, the submitter might take a screenshot of another
application and edit the image to show their name in the caption bar. Whether or
not the submission is valid or what to do about will require further consideration.
But it's definitely a red flag if you've seen this screenshot image many
times before.
- Look at the Application Page
Is it Related to the Product Title? The application URL should describe the product represented by the PAD file. If
this page does not mention the title in the PAD file, then the submission should
probably be rejected. If the page mentions the title but is substantially about
something else, then that should be a red flag. Note that some authors will set
this link to their main page, possible so visitors will be exposed to all their
and this may be acceptable if the related product is easy to find. Note that still
other authors may not have separate pages for each product, which is valid.
- Look at the Company Page
Is it valid? This is a little harder to judge. Normally, this page should be for the company
named in the PAD file. Ideally, this page will list one or more software products,
which includes the one in the PAD file. But just because the company page doesn't
mention the product or sells things other than software is not a reason to reject
the submission. If everything else looked okay, then you may judge the submission
to be valid.
Find Existing Spam in Your Catalog
If you are reading this, there is a good possibility that you already have a catalog
of PAD files. As mentioned previously, I know of no way to automatically detect
which of those items might be spam. Based on my work, I can offer the following
techniques to help find spam entries. IMPORTANT: None of these criteria confirm
an entry as spam. In all cases, you will need to manually inspect each listing to
determine whether or not it belongs on your site.
- One approach is to build a list of all the domains for your PAD files along with
the number of PAD files associated with each domain. If you sort by the number of
PAD files from each domain in descending order, all the items at the top of your list will be suspect.
Most authors will only have a handful of programs, and any shareware sites that
are submitting copy PAD files of their own catalog will appear near the top of this
list. However, there are some types of products where the author is able to produce
many titles, and some spammers will only submit a couple of PAD files from the same
domain.
- A good test is to search the URLs, particularly the screenshot URL, for terms like
"debt", "finance", "loan", "mortgage", "cash",
and similar terms. I found that many of the sites spamming just to build links are
associated with debt, loans or the like.
- Another test to try is if the company URL and application URL are the same. If they
are, you should check to see if this URL mentions the product name. If not, it's
almost certainly spam. Some sites only have one application and so the application
URL will be the same as for the company, and some valid authors might decide to
point the application URL to their home page (which isn't the intent of this
field but may not be reason enough to reject the submission).
- Often, the first test I'll do is check the screenshot URL. Spammers will sometimes
put just a picture at this URL, or they may even redirect to another page. If the
screenshot does not look like an application, you need to determine what it is.
Note that some valid sites will link to a picture of their logo, or a simulated
image of the product box. In addition, authors that are submitting duplicates of
their software may vary fields such as the domain, company name, and title, etc.
but use the same screenshot. So sometimes a screenshot can help identify software
as coming from the same company.
- If the download URL contains "affiliate=", "affiliateid=", "regnow",
or the name of other affiliates sites, there is a high probability that submission
is intended to earn a commission off of referals from your site and does not
reflect original software. Sometimes the domain
is even to an affiliate site such as RegNow. You could even search for "?"
in this URL, but this will also return a lot of valid submissions.
- Another potential red flag is when the company name varies between submissions from
the same domain. It could be variations in the way they write their company name
or it could be a company that just likes to put the product name in the company
field, but it could also be spam.
Are you familiar with other techniques than can be used to combat PAD spam? Please
let us know.