Facebook Wants To Be Your Default Home Page

As I was logging into Facebook to check out the redesign, I noticed this little checkbox to “set Facebook as your home page”.

Set Facebook As Your Home Page?

I don’t recall seeing that before. I am impressed that it’s not selected by default.

Once they get all those default home pages, they just need to expand their partnership with Microsoft to offer Live Search as part of the Facebook home page experience. Does Microsoft need Yahoo’s users if they can grab Facebook’s instead? How many people do you think will make the default home page switch?

Search-Friendly Flash?

A couple of weeks ago, Adobe announced that it was working with Google and Yahoo! on making Flash content easier to index in search engines. Google said it was using the search-engine specific Flash player that Adobe had made available (Yahoo!’s integration is still in the works). While I think it’s great and absolutely vital that search engines continue to evolve beyond strictly text (to ensure they are providing the best possible experience for their users), I don’t think this announcement means that all the Flash content on the web will now suddenly start ranking in search results and I don’t think that Flash developers can stop thinking about search engine optimization.

How search engines work
It all goes back to how search engines work. At least for now (even with all of the advancements in the last year around universal search), the foundations of the major search engines are based on text. The web began with primarily text-only pages and the search engine algorithms were built on that idea. When people started searching for information, they searched with words. We’re used to asking for things in words, after all, and since words were what the web was made up of, the questions and answers matched up quite well. Search engines are a bit of a middleman (middlemachine?) between a searcher’s textual questions and a web site’s textual answers.

Searching continues to be text based
Sure, you might imagine other types of exchanges. I might want to upload a picture of a person and ask for all the other pictures on the web of that person. Or I might want to search through the audio of a song for a particular lyric. All of those types of searches and more are coming (and some have been tried, with varying degrees of success), but at least for now, those applications are not how the three major search engines work and not how most people search.

Over time, search engines have experimented with different elements on pages beyond simply the text itself to better understand what those pages are about. Although since these experiments are built on a text-based foundation, the experiments have also still mostly focused on text. For instance, search engines found that the text that’s in the title may be a strong indicator of the focus on the page. The textual caption under and image is likely describing that image.

How Flash fits in with text-based search engines
Now, consider Flash. Most Flash pages contain little text. Those that do could often just as easily display that text outside of the Flash components (which would make it easier for those on screen readers and mobile phones, for instance, to view the content).

With this latest innovation in crawling Flash, Google can more easily access the text in Flash, but they still can’t process it quite as well as it can HTML text because they aren’t extracting any meta data about that text. As I mentioned earlier, search engines are now storing all kinds of meta data based on the structure of the text in HTML, like if it’s in a title tag, or an H1 and so on. So Flash-based text has that disadvantage.

Provide a separate URL for each piece of Flash content
Another consideration is how the Flash application itself is constructed. This new Flash player that Adobe is making available to Google and Yahoo! helps the search engines in that it enables them to access content it never could before. The crawlers can interact with the Flash application as a user would and crawl deeper into the application to get to text that may be four or five levels deep. On first glance, this may seem similar to search engine crawlers following links within HTML sites, but it can actually be quite different.

HTML pages (generally) have unique URLs for each page. Flash applications can be constructed that way, but can also be constructed so that as you go deeper into the application, the URL doesn’t change. This can be problematic for lots of usability reasons that have nothing to do with search. For instance, the back button in the browser doesn’t work. Users can’t easily email, Digg, or otherwise share a particular section of the Flash application easily. Bookmarking only works for the beginning of the Flash app.

As you might imagine, it also causes problems in search. Sure, the search engine crawlers may now be able to get to some of that content several levels in, but they have to index all of the text under a single URL. (Also note that they likely won’t index all of the application in this case; they will execute only a certain number of interactions.)

Say information about your latest product line is available once you choose “products” from the home page, then “new” from the products page, then “coming soon” from the new page. If the URL of the application doesn’t change for each interaction, then search engines will have to index the content from the home page, products page, new page, and coming soon page all under a single URL. When a searcher looks for your latest product line, that URL may appear in the results. But once the searcher clicks over, they aren’t brought to your coming soon page, they see your home page, and may have no idea where to go from there. If you ensure your Flash app uses a different URL for each page, then the searcher can be brought directly to the page that has the right content, which should greatly improve conversion rates and lower bounce rates.

But if you take the announcement that Google can now index Flash at face value, without looking deeper, you may not realize this, and think that your single-URL Flash application is now perfectly positioned for search.

Taking back the tour
Want an example of how the statement “Google can now index Flash” isn’t the whole story?

I’ve been watching the Tour de France. It’s playing on the Versus network for the first time this year. I’d never heard of the Versus network before (since it seems to mostly show ultimate fighting cage matches, this may be because I’m not its target audience; not to mention that I wasn’t the target audience for the network under its previous name, OLN, as I think it mostly played shows about people fishing then), and the network is looking to capitalize on this potential new audience.

Versus is spending a lot of money on its Tour de France campaign “Take Back the Tour”. It has put together flashy commercials and an equally flashy website.

firstpage

Versus probably would like to be found when people search for [tour de france]. The Tour de France page on the main versus.com domain shows up in the search results, but the Take Back The Tour site that they spent so money money on? Nowhere to be found.

Well, they’re spending all the money on commercials and print ads, so maybe people have been searching for [take back the tour] as well. The site does rank #1 for that query on both Google and Live (although it’s down at #8 on Yahoo!). For all three engines, even those who do the search because they saw an ad might not be sure if the takebackthetour.com listing is really the official site based on how the listing looks in the search results.

results

You can see that at this point, Google doesn’t see any content on the site and in fact, notes on the cached page that [take back the tour] appears only in links pointing to the page. Since it can’t extract any text, it has no way of knowing that the site is about the Tour de France.

Google still doesn’t Flash executed via JavaScript
So. What’s the problem? Google crawls Flash now and all should be well. I see at least two problems. The first is fundamental. The Flash executes via JavaScript. Google noted in their blog post that:

“Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.”

They did update the post later to say that:

“For our July 1st launch, we didn’t enable Flash indexing for Flash files embedded via SWFObject. We’re now rolling out an update that enables support for common JavaScript techniques for embedding Flash, including SWFObject and SWFObject2.”

Will this update help the Take Back the Tour site? Maybe not.

Can Google find any words to index?
Another big obstacle to the crawl of this site is that even if Google could get to the Flash, it would find few words to index. Nearly all of the text on the site is contained in images. The first thing you see when you go to the site is lots of words, but the only ones that seem to be text, rather than part of the image, are in the link “join the movement”.

So, once Google can access the Flash, it will be able to crawl and index those words. This design is a theme throughout the site. Links like “back” are text. Nearly everything else is in images.

Let’s pretend for a moment that they changed the Flash file so that the text wasn’t contained in images (and that the JavaScript problem didn’t exist). Would this help indexing? Yes and no.

No separate URLs can lead to a poor experience for searchers
Each time you click a link in the Flash file, you are taken to another page, but the URL doesn’t change. It stays at takebackthetour.com no matter how you navigate. That means that any text Google does pick up will be indexed under that one URL.

By clicking about three levels deep, I can find TV spots about the tour. If the site designers added some text about those TV spots, using the language of their customers, then searchers looking for [tour de france video] or something similar might see the takebackthetour.com site come up in their search results. But when they clicked through to the site, they wouldn’t see the TV spots. They would see the Flash splash page. And they would have to figure out how to navigate through the site to find the video section. Chances are that many searchers would scan the initial page that came up, not see what they were looking for and go back to the search results to find another site.

Little change for viral success
This makes for a poor user experience from search, but consider also that the creators of this campaign obviously are hoping it goes viral. If you want a site to go viral, you have to make it easily shareable. Sure, people may love the rant section or the video section or the contest, but no URL of any of these sections exists for those people to email, Digg, Twitter, Stumble, or otherwise share. A viral campaign that requires every person who shares the content to say, “go to this URL, then click ‘join the movement’, then click ‘how will you take back the tour’ is over before it even begins.

And what about accessibility? And those on the go? I watched the first night of the tour at a friend’s house. What if I had seen the commercial, wanted to check it out, and pulled up the site on my Windows Mobile Smartphone? I would have had this awesome experience:

nojavascript

It’s not even an accurate error message, since the first problem is that I don’t have JavaScript support.

Be smart about Flash
Clearly, a few problems still exist with Flash websites. My view is this:

  • It’s important for web technology providers to think about things like accessibility and search engine optimization or those who implement those technologies will turn to other solutions. To this end, Adobe should be commended for continuing to evolve their offerings to better serve the needs of their users.
  • Search engines have to continue to evolve beyond HTML as their primary goal is to provide the best possible results for searchers. They can’t rely on site owners across the web understanding what technologies are better for search. Google is clearly working on “organizing all the world’s information”, not just all the information well optimized for search engines, and this latest Flash development is an important part of that evolution.
  • If you operate a business online, search is an important acquisition channel. Don’t leave such an important avenue for gaining new customers in the hands of others. Ensure that you are making it as easy as possible for search engines to find your content.
  • Flash may very well be a great technology for your site, but implement it wisely.

Irony

A (Google official) blog post about scraped content on a scraper site. [Site is here (www.arsgeek.org/2008/06/11/duplicate-content-due-to-scrapers/), but link removed as the site may now contain malware.]

But the original does rank first.

Although not in blog search.

(But that appears to be because the post isn’t indexed in blogsearch at all. Because rivva.de is listed in its place?

It Only Seems Like I’m Quiet

It’s been a busy month. I spoke at some conferences, organized some local meetups, put together and moderated a day for developers about search, and wrote some stuff. I have a bunch of stuff coming up for this blog, but in the meantime, I thought I’d post a quick recap here of everything else.

emetrics
I was on a panel with Avinash Kaushik, which was great fun. I’m sure you already are subscribed to his RSS feed, and if not, what are you waiting for? Mel over at Microsoft adCenter quoted me a bit loosely from the session. I think I was replying to someone who was thinking they could perhaps buy a site, replace all the content with completely different stuff, and keep the credit for all the older site’s PageRank and incoming links. But it doesn’t really make sense that if msnbc.com had a bunch of incoming links for being a great news site and was sold to someone who turend it into a site about cute cats that all those old news links would help the new cat site.

advance08
I’m working on a big write up of Bill Gate’s talk at Microsoft’s advertising summit and my tour of their house of the future (just like they’d show during the Tom and Jerry cartoons!), and hopefully will have it out in the next few days.

Convergence Vancouver
I gave a talk on universal search and discussed how to use universal search as a new opportunity to connect with customers. What is your audience most interested in? You don’t want to create a bunch of videos and images just to try to blanket the results page — think about what will provide real value to your audience and focus on building universal results that will bring you more qualified traffic and return visitors. I also talked about how the changes to the search results page mean that you may need to look at new metrics. Ranking position and page views alone can’t tell the whole story.

SMX Developer Day
I had a great time yesterday moderating the developer track at SMX. The speakers were great and I particularly enjoyed hearing the case studies and seeing code samples. Look for some of that soon on Jane and Robot. Thanks to everyone who participated (by speaking or attending).

Jane and Robot Web Development and Search Meetups
Speaking of Jane and Robot, we held two meetups in Seattle in May. They were great fun so we’re going to keep doing them around once a month. We’re going to focus on a particular topic each time and leave lots of time for questions, site reviews, and chatting. If you’re a developer in the Seattle area and there’s a particular topic you’d like to hear about, let me know! Looks like the next one will likely be June 25th on the east side, so stay tuned for details.

Writing
I’ve managed to find some time to write a few things lately. Earlier tonight, I gave my thoughts on the Yahoo! SearchMonkey searcher experience. Late last week, I wrote about how the search experience is changing and how marketers can use that to their advance. I also wrote about implementing images on Jane and Robot. We’ve got another article that will likely go up sometime tomorrow.

What Cool Stuff Is LinkedIn Launching?

linkedin

Apparently, it’s magical.

Ranking As The Original Source For Content You Syndicate

When you write content on your site, whether it’s a blog post, product description, or an article, you likely want to rank well for it. I’m often asked how best to ensure this when you’re also syndicating that content.

Why Syndicate?
There are good reasons for syndicating content. Syndication can bring traffic, exposure, and sales.

If you’re a blogger, you might syndicate your posts to get wider distribution. If your posts are seen by a bigger audience, you might gain some of those readers for yourself. If your site provides authoritative resources, you might have a partnership with other sites that want to include that content. And if you sell products, you might provide affiliates with content feeds, which in turn brings in additional revenue.

But What Should Rank?
But from a search engine perspective, syndication can cause a bit of a conundrum. If what you wrote is a relevant result for a search, the search engine wants to show it to the searcher. But not show it twice (or three times, or maybe even a thousand times in the case of an affiliate feed). And that makes sense. If you’re searching for something, you don’t want multiple results that all lead to the same content even if that content is on different sites.

So what’s a search engine to do?

Search engines generally identify duplicate results and filter out all but one. They have lots of ways to decide which version to show. They try to figure out which one is the “original” by looking at things like which version was published first and which has the most links pointing to it.

Your content may appear on other sites at times other than when you syndicated it (such as when your RSS feed has been scraped), and search engines try to account for that too by looking at things like which site is more authoritative.

What If Search Engines Get It Wrong?
Generally, search engine algorithms work pretty well and your original version shows up. However, the system isn’t perfect. Michael Gray recently noted that sometimes Google gets it wrong and shows the version from a more authoritative site, even when that is not the original version. He suggested some ways for making sure that the original version shows up first. And he linked to the Search Illustrated column on Search Engine Land that shows a great illustration of how search engines determine the version to show.

How Can You Make Sure Your Site Ranks First?
So what do I suggest you do if you’re syndicating content but want your original version to rank about the syndicated ones?

  • Create a different version of the content to syndicate than what you write for your own site. This method works best for things like product affiliate feeds. I don’t think it works as well for things like blog posts or other types of articles. Instead, you could do something like write a high level summary article for syndication and a blog post with details about that topic for your own site.
  • Always include absolute links back to your own site in the body of the article. This is particularly helpful when your content is scraped.
  • Ask your syndication partners to block their version of your article (via robots.txt or a robots meta tag). Whenever I suggest this, people laugh and tell me that the sites they are syndicating to would never agree to this as they want the content so they can rank for it. I can completely understand this. But as someone who’s providing your content for syndication, you should then just realize you’re in a competition with your syndication partners for ranking and it’s quite possible they can outrank you. If you are able to, put together a syndication agreement that states they get your content as a benefit for their readers, not as a way to acquire search traffic for that content, then you can keep control of ranking for what you’ve written and they can provide a benefit to their audience.

But Make Sure Duplication Is the Issue
In Michael’s case, he explained that he has an agreement with Web Pro News that enables them to syndicate any blog post of his that they’d like for their own site. And in the case he describes, the article on the Web Pro News site is ranking above the version on his blog. He speculates that’s because Web Pro News is a more authoritative site. I am sure that what he describes can happen (particularly since in this case, his Web Pro News version of the article doesn’t have a link back to his original article; at the very least, he should negotiate an introductory paragraph at the beginning of his syndicated posts that explain where the original is located with a link to it, not only for search engine ranking purposes, but to give readers better content), but in his particular case, I’m not so sure that’s the cause.

I can’t find his original post indexed at all. Obviously, if a page isn’t indexed, it has no chance of ranking. I’m not sure why that particular page isn’t indexed. It’s not blocked with robots.txt or a robots meta tag. It sounds like he can see it indexed, so maybe I’m hitting a different data center. If that’s the case, I don’t know if the one I’m hitting was refreshed more recently than the one he’s hitting or if his is.

Don’t Give Away Your Control
His point that syndicating content can be tricky if you want to rank for that content remains, even if the root cause of his particular case is a bit hazy. If search is not yet a large acquisition channel for your site, then you may not mind if another site ranks for your material as you may get more traffic from the syndicated site (so make sure you at least have a link back to your site!). But as you site starts to stand on its own and search traffic starts growing, you will want to have more control. So think of your longer term strategy when you negotiate syndication partnerships and don’t give up all of the control of the content you work so hard to create to others.

Powerset’s New “Factz” From Wikipedia

Powerset, the natural language search engine that’s been under wraps for a while, has just launched a test version of their product that searches Wikipedia articles. Danny Sullivan describes how Powerset’s search differs from a standard search over at Search Engine Land.

Key to the difference is Powerset’s ability to glean meaning from the sentences. While other search engines primarily look for instances of words on pages, Powerset understands those words. Or something like that. The Search Engine Land article illustrates the concept with a search for Henry VIII. The Powerset results include “factz” based on verbs, such as he “granted” land and “married” a bunch of times.

I was suspicious of the “z”.

But, I figured I’d try it out myself using the tried and true ego search method. If I there’s one search result for which we should be able to judge accuracy, it should be the one about ourself. (Keeping in mind that the current version of my Wikipedia entry is woefully out of date and has been flagged for depressing grammar issues.) So what does Powerset think that Wikipedia has said about me?

Powerset Factz

That I have declared bankruptcy and received email.

Sigh.

Free Networking Events in Seattle For Developers

Lately, I’ve noticed that Seattle doesn’t seem to have regular networking events about search. And I’ve also noticed that not a lot of information exists about SEO for developers. And Seattle has lots of developers who are building web applications and could benefit from those apps being found through search.

I figured hey, why not start organizing some events for developers about search! So, I did.

Ideally, I’d like to hold these once a month, and bring together experts to review sites from the audience. And have lots of food and drinks. In our inaugural month, we’re holding two events!

Tuesday, May 13th at 6pm
Solo Bar, 200 Roy Street, Seattle

This event is sponsored by Microsoft, and they’ll be providing lots of swag in addition to food and drinks. We’ll chat a bit about search, look at a few sites, then hang out and chat. You can sign up at Upcoming.

Thursday, May 29th at 6pm
Google Seattle office, 651 N 34th St. Seattle

This event is sponsored by Google, and we’ll look at some diagnostic issues sites may encounter while we snack and drink. You can sign up for this event at Upcoming as well.

Wednesday, June 4th
Bell Harbor Convention Center, Seattle

Of course, if you’re looking for more in-depth information about how to build crawable sites, you can check out Developer Day at SMX Advanced on June 4th. We’ve got speakers from the major search engines to talk about the infrastructure details of web applications from a search perspective, Duane Nickull from Adobe to talk about making Adobe technologies search friendly, and web developers to give real-life examples and case studies. We’ll be ending the day with an expert panel to review your site!

Brought To You By Jane and Robot
The free networking events are the first activities organized by a new project I’m working on with Nathan Buggia called Jane and Robot. The idea behind Jane and Robot is to provide definitive content to developers about building web applications for both users and searchers. We’re focusing on the developer audience, rather than search marketers, so we’ll talk more about implementing 301 redirects in PHP than we will about optimizing content for particular keywords. The site is in “soft launch” mode now, but watch as we evolve it.

So far, we’ve got slides up from the SEO for Developers workshop we did at Web 2.0 Expo a few weeks ago (along with diagnostic checklists), as well as an events page where you can watch for more events like the ones we’re putting together in May.

And check out our first article, on domain canonicalization.

The Trouble in Targeting “The” Customer Rather Than “Your” Customer

Email marketers know that people tend not to open marketing mail that gets sent on the weekend. We spend Saturdays and Sundays maximizing our time in the sun and the breeze by watching TV and bad movies on cable, erm, I mean rollerblading and picnicking in the park. People also don’t open mail on Monday because they are trying to catch up from that weekend of TNT marathons and they don’t open anything on Fridays because they are too busy trying to decide whether the coming weekend should feature disaster movies or quality films starring California’s governor.

That leaves Tuesday, Wednesday, and Thursday for any serious email marketing effort. Some say Tuesday early afternoon is the best time for optimal open rates. People are ready to tackle the drudgery that is the inbox, and your mail is the first thing they see. Others say Wednesday, as perhaps people have conquered the worst of it and feel they deserve a reward such as idle email shopping. Choose either day, but make sure you send early afternoon.

Finally, testing and research have given us definitive answers for something and we never have to worry about it again. We now know not only the ideal days but the ideal time. Hooray!

Except I’ve found a potentially fatal flaw in this plan.

And that is that everyone is now sending marketing mail on Tuesday and Wednesday, early afternoon.

I don’t get much email marketing because I unsubscribe to just about everything as soon as the first piece of mail hits my inbox. Someone who declares email bankruptcy must become ruthless with incoming mail.

And yet there I was last Wednesday at around 1pm, and on came the mail. REI wanted me to know about their May events calendar. Alaska Airlines wanted to make sure I knew I could buy people flowers and earn miles at the same time. Microsoft Office Live Small Business thought I might want to know how to get my business online for free! Choice Hotels has my room ready! The mail just kept coming.

And I realized, all that research was going to have to start over with the addition of a new variable. Not only do marketers have to avoid sending mail when people are off for the weekend, they have to avoid sending mail at the same moment everyone else is sending mail. And so Thursday at 10am will become the new Tuesday at 1pm. At least until everyone adjusts their email schedule. And then it all will start over again.

Of course, rather than look at averages for “the” customer, you could look at the particulars of your customer. I was thinking about this last Wednesday at 1pm when my mail started filling up, but apparently I’m not the only one.

Last night on the plane, I was reading Fast Company and happened upon this article about how Barneys is personalizing mail based on individual behavior on the web site. Targeting mail seems like a much better approach than the old fashioned blast, although I’m not sure about their assertion this rosy new relationship with the customer means that people embrace getting up to five emails a week. They do say they’ve had a ten-fold rise in response rates, which totally makes sense. If you send a promo for hip new purses to your entire email list, you’re percentage of conversion is going to be lower than if you send the purse promo to teenage girls and the power tie promo to older men.

Although Barneys is getting better at segmentation, they seem to be hesitant to go the next step: stop sending mail to people who don’t respond. I have never shopped online at Barneys and haven’t been in a store in at least eight months. But that doesn’t stop them from sending me mail day in and day out. Mail, by the way, that I never open. (I finally opened one last week solely to click the unsubscribe button.) The incessant mail (10 messages during a 12 day period last month) actually made me less likely to shop at Barneys because I was so irritated that they continued to clog up my inbox.

Ryan Warren of Exact Target brought this up today at the eMetrics Industry Insights Day. He said that sometimes the best thing you can do is stop sending mail to people who don’t open it. Spend you energy on those who like getting your mail and take action on it.

His data supported Barneys’ direction. He said that only 11% of companies send targeted mail and only 7% leverage click stream data, but doing so can raise conversion rates from 1.1% to 3.9% (and can raise click through rates from 9.5% to 14%).

He talked about sending mail not on Tuesday afternoon at 1pm but based on when the customer was interacting with the site. For instance, if you have a travel site and someone puts a trip on hold, send them an email to remind them the hold is about to expire. Or if they were checking out a vacation package, let them know when the price drops. Or better yet, if you know they’re in Seattle and they were browsing trips to Mexico, email them when you see that the Seattle weather forecast calls for rain. (Although now that I think about it, you might need to tweak that last one, or you may end up with the Barneys mail sent every day dilemma.)

If It’s Tuesday, I Must Be In San Francisco

As always, I’ve been doing a lot of traveling, and next week is no different. I’m heading down to San Francisco to do four talks about search engine optimization and web development. If you’ll be around, stop by and say hi!

Domain Roundtable
I’ll be speaking on the SEO experts panel on Saturday about the key things to look at when thinking of developing a portfolio of domains into content sites. Building web sites with content aimed at users can be quite a bit different than managing domains for their potential inherent name value, and my advice will be focused on building long-term value. Even from a purely domain perspective, a site that’s built for long-term value should be easier and more lucrative to sell. (Of course, there are a myriad of other benefits from approaching site building this way as well.)

Web 2.0 Expo
I’ll be speaking at two sessions on Tuesday.

In the morning, I’ll be doing a session with Nathan Buggia in the development track about search-friendly design for web developers. We’ll be talking all about how to build solid infrastructure that takes into account both usability and search engine crawlability. The cool thing is that you can code the site in such a way that you accomplish both goals at once.

In the afternooon, I’ll be speaking with Dave McClure and Hiten Shah on startup metrics. At this session, I’ll be talking about the marketing side of search (rather than the development side that I’ll be talking about in the earlier session), particularly about the search metrics that matter most and how you can make them actionable.

Ignite San Francisco
On Tuesday night, I’ll try the whirlwind that is Ignite. 20 slides in 5 minutes! If you don’t have time for the three hour session Tuesday morning, you can check out the 5 minute version: 5 things developers should know about search. First thing! That you need more than 5 minutes.

Next Page »

www.flickr.com