Categories
Brain Buster Productivity Booster SEO and Paid Search The Marketeer

What is Scraping and how to stop it?

I’m at the Search Engine Strategies conference and we just had lunch with a team from Google who showed off some of the new webmaster tools (and I managed to get in a vote for a crawl error referral report to Vanessa Fox, but that’s another post). The topic of scraping was raised and Danny Sullivan mentioned that there will be a full session on it later in the week. My general rule is not to blog during business hours but since we’ve been fighting this battle at work it’s relevant (and remember that AccuRev has the Ultimate Source Control Tool).

In our Web 2.0 world you can make money just by generating traffic and putting up Google AdSense ads. For the Ronin Marketeer, you post quality content, get the traffic and are regarded as a hero by all. Another approach for those of more flexible business ethics is to copy someone else’s content and show it as your own. This is happening more and more in the blogosphere, is already an issue for corporate sites.

The practice of grabbing content from another website and posting it as your own is called scraping. I’ve never played with scripting this myself but there are varying degrees of automating this process. Most people come across it when they are googling themselves or their company and they get some results that are outside of their own domains (often blogs using a default template) that copies their content verbatim. More recently these pages often include copy from multiple websites.

So, what to do about the theives in our midst? Adam Lasnik of Google discussed this during the panel today, and here’s a summary of the answer as I heard it:

  1. Overall, “Don’t Panic”. It’s fairly easy for Google to verify this, your site published it first and your domain has been established with Google. The scraper is not established, their URL is newer and probably registered for a year or less.
  2. You can file a DCMA Takedown request with them
  3. The takedown request is good but Adam referred to it as “swatting flies”, your time is better spent staying the course – make sure you are the source for your content by continuing to crank it out and remain the source.

Keep in mind that in the grand scheme the majority of scraping is garbage and clutter, and anyone providing search results will continue to screen it. But then again, it’s yet another cat and mouse game for us to follow.

I’m learning some good stuff, more to follow.

Categories
Daily Life The Marketeer

It’s Sunday – that means there’s an M Show

boots

Nothing better than drinking Mulled Wine on public property at the German Christmas Market. Back to business tomorrow!

Check out your Monday M Show

Categories
Daily Life Podcasting

Why Radio Sucks

Not long ago I was travelling for business with a rental car and I had no way to get my iPod output into the stereo. After 15 minutes I was actually considering the risk of driving with headphones (and decided against it for my personal safety, I already felt at risk in my ultra-sub-mini-compact vehicle).

Reason #1 – Ads. There’s tons of them. This is the golden age of podcasting, there are very few there.

Reason #2 – Finite number of stations, finite choices. When’s the last time you heard a show about the best way to raise pigs or knit? Radio: every town in America has, the morning zoo, talk from the left and right, country, rap, classic rock, current rock, oldies and that’s it. No matter what your hobby, you can find something that you really want to learn about. And even though I am 100% podcast there’s lots of hollywood quality stuff through Audible (John Federico, thank me later).

Reason #3 – 60 GB, 9000 songs, my choice.

Reason #4 – If you are stuck waiting somewhere you can watch video on your iPod, kind of cool coming through 6 car speakers.

Bonus Reason: You may say “But John, in your simple-minded rant you forgot one thing – variety”, and you I say “See #3 and add Smart Playlist – Random” and check out my channels on gigadial – it’s great, subscribe to the feed and I’ll throw you random new podcasts to check out. The CAPOW channel covers marketing and business stuff, New England Podcasters is everything else for the general public, and the John Wall channel has everything that’s too nerdy, radical, edgy, or adult for the (somewhat) family friendly NE Casters channel.

My channel has some All-Star geek recordings right now that include Apple Founder Steve Wozniak, Joel from Joel on Software, and Business Gurus Clayton Christensen (the Innovator’s Dilemma) and Malcolm Gladwell (The Tipping Point and Blink).
If you want just music then check out some folks doing some great stuff: Rock from Accident Hash, Hip-Hop from Julien, Chill with Anji Bee, and get your pirate Barry White style groove on with Suzy Chase.

By the way, anyone can add to those channels so if you have anything you’d like to share please add it.

Holy web bluntman, that’s a lot of links. Have a good weekend, The M Show will be out Sunday Night with some special guests…

Categories
Brain Buster Geek Stuff

Web 3.0

I mentioned yesterday that I had a chance to speak with Mike Kowalchik of Grazr and that started me thinking about the changing face of the web. This goes right along with a post of Steve Rubel’s regarding Yahoo no longer putting feeds on major pages. RSS is a way to get through content faster – it removes some of the friction in an already nearly frictionless environment.

The only problem is that we are now drowning in information – the web is being crushed under its own weight. A tool like Grazr allows readers to skip unchanged page views that would normally bear advertising messages. Once you are hooked on RSS feeds your surfing time decreases. This is a disruptive force.

I’m beginning to think that the missing link is an RSS killer app. With a program that folks on the far side of the chasm would adopt (something beyond a propellor-head newsreader), a program that makes RSS completely seamless, we will see something completely new. While Grazr may look like a widget on the surface, I think it may be the first look at something completely different.

Categories
Daily Life The Marketeer

Joel Digs AccuRev

A great surprise for me this morning forwarded by Chicago Mike : AccuRev has made the homepage of Joel on Software. This should be an interesting day as far as web traffic.

Categories
Brain Buster Geek Stuff

What a 21st Century Record Label Looks Like

I attended the WebInno event tonight over at the Royal Sonesta just across the river from Boston in Cambridge. Besides getting a chance to catch up with Andrew Bourland and Christopher Carleton, I got to see some interesting new web apps. I geeked out on RSS and OPML with Mike Kowalchik of Grazr, but that stretched my brain too far and now I have to process that for a day or two before talking about it.

The other main course was Calabash Music which was demoed by Brad Powell. They focus on World Music, and the interesting thing was that they have a mini player that bands can host on their own site which has both a playlist and integrated purchasing mechanism. You can listen to the tunes, and click to purchase the track. He mentioned that they already have a deal going with National Geographic (who hosts their podcasts with LibSyn). Very cool stuff, sort of a CD Baby without the CDs. Is this the record label of tomorrow?

Categories
Daily Life Geek Stuff

Where to check out new Web Stuff in Boston

I found out about the Web Innovators Group through Brian Owen of Masthead Venture Partners at the Nantucket Conference back in the spring. The latest meeting is tonight so I’ll fill you in on anything cool. If you are in the Boston area just go over to their Wiki and sign up.

I may even grab some audio for The M Show.

Categories
Daily Life

The quiet before the Turkeystorm

Things are a little bit slow today, everyone has that “Well, it’s going to be a short week and impossible to get anything done so $%*& it”. I posted the second AccuRev podcast today, and the customer newsletter will be rolling out soon. I’m still trying to think of the right title for this blog. The “Adventures in Marketing” domain is already gone so I need to keep the thinking cap on…