What is Scraping and how to stop it?

I’m at the Search Engine Strategies conference and we just had lunch with a team from Google who showed off some of the new webmaster tools (and I managed to get in a vote for a crawl error referral report to Vanessa Fox, but that’s another post). The topic of scraping was raised and Danny Sullivan mentioned that there will be a full session on it later in the week. My general rule is not to blog during business hours but since we’ve been fighting this battle at work it’s relevant (and remember that AccuRev has the Ultimate Source Control Tool).

In our Web 2.0 world you can make money just by generating traffic and putting up Google AdSense ads. For the Ronin Marketeer, you post quality content, get the traffic and are regarded as a hero by all. Another approach for those of more flexible business ethics is to copy someone else’s content and show it as your own. This is happening more and more in the blogosphere, is already an issue for corporate sites.

The practice of grabbing content from another website and posting it as your own is called scraping. I’ve never played with scripting this myself but there are varying degrees of automating this process. Most people come across it when they are googling themselves or their company and they get some results that are outside of their own domains (often blogs using a default template) that copies their content verbatim. More recently these pages often include copy from multiple websites.

So, what to do about the theives in our midst? Adam Lasnik of Google discussed this during the panel today, and here’s a summary of the answer as I heard it:

  1. Overall, “Don’t Panic”. It’s fairly easy for Google to verify this, your site published it first and your domain has been established with Google. The scraper is not established, their URL is newer and probably registered for a year or less.
  2. You can file a DCMA Takedown request with them
  3. The takedown request is good but Adam referred to it as “swatting flies”, your time is better spent staying the course – make sure you are the source for your content by continuing to crank it out and remain the source.

Keep in mind that in the grand scheme the majority of scraping is garbage and clutter, and anyone providing search results will continue to screen it. But then again, it’s yet another cat and mouse game for us to follow.

I’m learning some good stuff, more to follow.

Integrating with Google Adwords

A stumbling block on the path to the holy land today, the code snippet we need to integrate with our Google Adwords campaign conflicts with some existing javascript we have on our custom web-to-lead forms. As I have no Perl skills to speak of beyond the “cut and paste somebody else’s stuff and pray it works” I’ve had to call in some bigger guns, i.e. Salesforce support level two and our own Ronin Coder. Perhaps there will be more luck tomorrow…

On the plus side, Joel delivered the web traffic today…

Email is as dead as direct mail

That is – not dead at all. Today was a big email day for me sending out two blasts. I’m currently using ConstantContact which is the best value for the price – free to start and not expensive after that. I’ve used ExactTarget, which is a great product (and perhaps in my future due to integration with, and in fact Chris Baggott from over there is coming out with a book next year and if some of my pieces make the editoral cut I’ll be published there.

Contrary to what you may hear, email is very much alive, just as is direct mail as I can tell from the 35 catalogs that have come in through the mailslot at home in the past week. Perhaps no longer the silver bullet, these tactics still deliver.

ConstantContact has some benchmark figures across the service that are interesting: Global Bounces are at 18.3% (although probably understated since I get some Out of Office messages direct to me), opens at 37%, and clicks at 8.9%. I do better on bounces, lower on opens, and much better on clicks. My personal mailing list (for The M Show, listen now!) has under 1,000 names but performs at a level of magnitude much greater (9x cleaner 4x clicks). This is quite normal for smaller lists, I have more stats on that but I’m not going to dig that up now, leave me a comment if you want more.

Browse Fonts, Preview Fonts, Manage Fonts, Ahhhh

I’ve spent the past 5 years looking for a font browser. Like any other Ronin Marketeer, I have a portable hard drive of digital tools that I’ve gathered. It includes some insane amount of fonts now over 10,000. Yesterday I again hit a point where I was so frustrated that I decided to take a time out to see if there were any tools out there.

I found Suitcase by Extensis. This gift from the gods allows you to grab a folder and it will build a library, complete with samples to view of all the fonts in the subfolders. I’m doing the 30-day trial but unless it does something ridiculuous I’ll be a new customer in no time

Ok, this is not a best practice – it’s more of a security risk, but if you are in and out of this will save you a lot of hassle. Here’s the link to setup on your desktop or toolbar:[password]&un=[username with %40 instead of @]
for example, if your email was your username would be

Please promise me that if you are going to use this that you will at least set up your laptop with a password at login so that when you lose it at the airport you’ve not giving away the keys to the kingdom.

In other news we are testing the SalesForce and Google AdWords integration. It’s not working. Something about 2 conflicting forms on the same page. A surprise project for this afternoon!