Search for Rich Internet Applications

Thoughts on search for RIAs, along with pointers to my latest slides as well as MIX session on ensuring indexability (esp. for Silverlight and Ajax apps)...

If you're developing a web site, especially a public facing site, search and indexability are probably high on your list of requirements and priorities. Yesterday, searchability of rich internet applications got extra attention with Adobe's announcement that it is providing technology to search engines to improve indexability of swf/flash-based applications.

This is particularly interesting to me as I think getting the right SEO behavior for RIAs is based on looking at end-to-end solutions that involve complementary server-side techniques (guess that is not totally unexpected from someone like me who has been working on asp.net). I've also been presenting on Search and RIA and enabling indexability for Silverlight and Ajax apps for a couple of years at various conferences... links to those below.

There has always been a question mark around indexability of RIAs, whether they're built in Flash, or Silverlight, or even Ajax. The fundamental problem is that static indexing of a RIA is likely to turn up only the user interface of the application, and not the interesting and meaningful data fetched by application logic and presented dynamically to the user. Indexing an application binary or script is akin to having desktop search index winword.exe instead of your documents... not very useful. Most folks are now seeing indexing something like a raw swf binary as less and less useful, as applications become more and more dynamic.

The two key things around improving SEO (besides various general techniques like URL canonicalization, friendly URLs and search-engine friendly URL rewriting) are ensuring indexability and facilitating relevance. Indexability is created through addressing the what content is visible to the crawler, as well as where the crawler should look. Relevance is primarily addressed through creating deep linkable content and interesting content (so folks actually link to it).

The Adobe/Google announcement takes indexing one a step further beyond indexing static binary content, by attempting to simulate human behavior and interacting with the application to extract textual content and links from the application. I can see how automated clicks and the like might allow the crawler to cause an application to execute some partial logic, but a lot of application interaction is driven off of actual meaningful text input (eg. keywords in a search input box) where meaningful often depends on the specific application in question. The announcement does not go into any details... somewhat strange, I think, so there is naturally some guessing going on. The comment stream also contains a good mix of folks questioning whether the approach will even work? (for example here and here)

It is interesting to see the buzz - it is good to see search engines at least begin to think beyond indexing static HTML. Technically speaking, this sort of approach to indexability lends itself to Silverlight apps as well pretty easily. First a Silverlight application packaged in a xap file is easily cracked open without a special SDK - it is simply a zip file after all. Any static textual xaml content is easily parsed by virtue of being XML. Second it is easy to embed and extract metadata via an additional file within the zip archive. Third, the Silverlight DOM itself can be easily walked and inspected programmatically to detect all text, links and images that are being visualized by the control. Finally, it is possible to automate the application thanks to the extensible API that Silverlight offers for enabling accessibility and screen reader capabilties. Additionally, Silverlight apps can also support deep linking which is also important for facilitating relevance. Essentially, Silverlight provides simple APIs to allow the app to easily consume the URL it was loaded from, and use information on the URL query string to load and display appropriate data.

All that said, it will be interesting to see how well this approach pans out, as there are a number of challenges in simulating a user realistically, especially without any hints provided by the application developer. In the mean time, this is as good as any a time to share my slide deck on building indexable Silverlight applications that I used in my presentation at SMX, a search conference that took place last month here in Seattle.

The deck above illustrates the pattern for supplying alternate content with sample markup. At a high-level, the approach can be summarized as combining client-side logic with server-side rendering and sitemaps to address the "what" and "where" of indexability. The specific implementation of the pattern is interesting in how it achieves alternate content without requiring the developer to implement two applications and do double work.

I especially like the pattern because it works today, across search engines, applies to Silverlight/Flash apps as well as Ajax apps, and has a number of side-benefits around networking optimization and graceful degradation in script-less environments as listed in the deck above. Its always nice to pick a single pattern that can help solve multiple problems that Web developers encounter regularly. I first blogged about this approach back in 2007 right after MIX07. I had a chance to present it once more at MIX08, and you can actually check out the presentation and demo in the session video (skip ahead about 42 minutes into the talk for the part on indexability).

Any questions on the approach? Feel free to ask below. Also, I am curious what are your thoughts on the alternate content approach, or on the overall subject of search for RIAs?


[ Tags: | | | ]
Posted on Wednesday, 7/2/2008 @ 6:34 AM | #Silverlight


Comments

16 comments have been posted.

ThoseBug

Posted on 7/2/2008 @ 8:49 AM
Please my friend, give me a break, why you are not using Silverlight to do the simple animation in this page ? why you guys still using Flash (flex) ? If we will talk about RIA let's focus on Silverlight... what kind of influence do you think this simple animation could give to people who are searching for a good technology for RIA?


It's only a though.

ThoseBug

Nikhil Kothari

Posted on 7/2/2008 @ 11:00 AM
Hmmm... I just happened to use slideshare - hence the use of Flash (i.e. I just used an existing solution).

However, I realize that might send a mixed message, or at the least shift focus from the main topic at hand which is about RIAs and Search, which is what I want to focus on. To avoid the tangent, I have changed the post to use static images presented as a slide show. It works better than the Flash-based slideshow anyway...

BPerreault

Posted on 7/2/2008 @ 4:12 PM
Alternate Content - this is how I was thinking of solving the indexability problem. Thank-you for your hard work, and for giving the slides. It's not easy to propose this solution without some big guns. (you!) Because of the extra work involved and development time. Even if the presentation of the alternate content is done using Xslt transformation against the Xaml, there is still some duplication in my mind. But is there?

I hope it's ok to use your ideas (with credit?) at our Minneapolis Silverlight User Group Meeting.

Nikhil Kothari

Posted on 7/2/2008 @ 5:25 PM
@BPerreault - there is some duplication in terms of having to produce the alternate content. The smart thing to do is think about your alternate content as an HTML view/rendering on top of the same data you might surface as a service to your client application, and that way reuse the back-end code and data models you have, and avoid duplication at that layer.

Feel free to share the idea and reuse content... this is in fact one of the intents behind the blog post and my presentations.

Steve

Posted on 7/2/2008 @ 5:46 PM
For me, and I know the limitations - RIA's suffer from something Web developers do easily every day: make calls to a DAL with no need for web services in the middle.

ie. I can use NHibernate fairly easy with the web.

I think we are still missing the boat here. I want to write my Domain model, use it with SL and not have these network limitations. ie. I want Silverlight to be my 'View' in my MVC. I don't want to rewrite my Domain model to use WCF or web services, I don't want to hassle with the serialization limitations.

For example, I want to create an internet application that calls a DAL layer for my CRUD without services or sockets.

Perhaps some of this frustration comes with Silverlight and .NET. ie. I used Json.NET and serialized and deserialized my json with a Controller/Action in MVC. But I can't use Json.NET in Silverlight. And the serializer in SL throws errors - but not in Json.NET.

I still think SL has a ways to go in better networking support. Another example: I built a SL application that used Linq to SQL - but I had to code up all this 'middleware' code to do it. After getting it all to work, I uploaded the code to my ISP host - none of the networking part works. I've tried 100 different things from forums, still doesn't work.

Basically lost time - still feels very raw and unfriendly.

Lastly, the whole Blend UI with Visual Studio is just horrible. Toggling back and forth between the two, each feeling half-baked is a horrible user experience. For a company that invest so much in user experiences, this is not a good one. And please, oh please, don't sell me the 'designers design and coders code' - I shouldn't need a special designer tool to make a rich internet form should I ?

Sorry for the rant, but there aren't many places in MS world where someone really listens - the forums just say 'go add crossdomain.xml and it will work', but it doesn't, and none of the questions are answered.

Keep at it, but please offer a solution to networking. To me, the networking will be done when I can write the same code in Silverlight that I do in any application - not required to use WCF or webservices or webclients, etc...

Nikhil Kothari

Posted on 7/2/2008 @ 7:18 PM
@Steve - I totally hear you!

I am actually working on what I hope will be the solution to at least some of the things you bring up - things like defining a domain model, and using it naturally in SL without thinking/writing the plumbing, i.e. services manually are one aspect of this work that is happening to simplify development of RIAs. I can't wait to start sharing more details, but I have to wait until this work is a bit more public.

On the tools front, I know folks on the Blend and VS team are working very much together to smoothen out the workflow. I just don't know the exact time line for some of the things happening so far based on conversations I've had with folks on those teams. As a developer, who periodically needs to use Blend functionality, I too wish the integration was better.

Steve

Posted on 7/2/2008 @ 7:46 PM
You give me much hope - I appreciate your response.

I do want RIA with Silverlight, I like that it uses xaml, etc... I look forward to hearing from you when it's ready to be shared! :)

ThoseBug

Posted on 7/2/2008 @ 8:02 PM
Thank you so much Nikhil, I don't want to see Flash, I've use to work with Flex (Flash) and I can say S I L V E R L I G H T !!!! R O C K S !!!!
I can't wait for the SL Futures.

Alan Cobb

Posted on 7/3/2008 @ 1:14 PM
Hi Nikhil,

Thanks for your work on Silverlight SEO!

Now for the complaining: Maybe I'm missing something, but are we really expected to view the slides in that little fixed-size Flash window (less than 1/6 the size of my monitor)? Is there some way to just download your PPTX file for the slides? It's humorous and odd that it's Flash and not SL, but it's worse that it's Flash with a poor user experience (for me anyway).

I'm amazed how RIAs (SL as well as Flash) are supposed to be about creating "richer" user interfaces, and yet paradoxically if they aren't done "right" (which is frequently the case) they give a worse user experience than HTML.

Thanks,
Alan Cobb
www.alancobb.com/blog (Silverlight blog)

Nikhil Kothari

Posted on 7/3/2008 @ 7:01 PM
The slides are currently presented as just images + slideshow implemented in HTML and script - I changed that in the comment above about using Flash. When I did replace the Flash, I removed the link to slideshare, which also effectively removed the link to page containing a downloadable version... so here it is:

http://www.slideshare.net/nikhilk/search-friendly-web-apps/

Alan Cobb

Posted on 7/3/2008 @ 10:26 PM
Hi Nikhil,

Ok, thanks for posting the slideshare.net link. My first choice would have been a simple, old-fashioned link to a PPTX file. Slideshare.net wants me to create a account with them before I can download the slides! OTOH: The Flash UI at slideshare.net is pretty good and it does let me view the slides full screen.

Sorry for all the ranting. Now to actually read your slides... :).

Thanks,
Alan Cobb

John Mandia

Posted on 7/8/2008 @ 9:37 AM
Hi Nikhil,

I like the idea and have been playing around with it myself:

http://weblogs.asp.net/jmandia/archive/2008/01/04/silverlight-seo-search-engine-optimisation-optimization.aspx

http://www.silverlightseo.net/

I've unfortunately put further ideas on hold but will be focusing on it again shortly.

Wouldn't mind running some of it by you when it is done.

Cheers,

John

Nikhil Kothari

Posted on 7/10/2008 @ 4:07 PM
John, as always, I am all ears - would love to hear your ideas when you get back to them. I think SEO is super interesting, and some creative thinking is going to be required on the part of the platforms, developers and search engines.

Ian Griffiths

Posted on 7/15/2008 @ 5:09 AM
Nikhil, you say that a .xap is:

"simply a zip file after all. Any static textual xaml content is easily parsed by virtue of being XML"

But this seems slightly misleading when you look at any .xap (or at least at the way they are produced today by the Silverlight 2 beta 2 tools). Static textual Xaml content does *not* show up directly as a resource in a ZIP. The only Xaml resource packaged in a normal way into the .xap is the AppManifest. If you want to get at the Xaml for any of the UI itself, you need to dig the AppName.g.resources stream out of the DLL, which requires more than just ZIP parsing - you need to have the ability to extract an assembly manifest resource stream from a .NET binary.

And after you've done that you still need to extract the Xaml resources, which means you also need to be able to handle the binary format used by ResourceManager.

While the ability to dig into ZIPs is more or less universally available, the ability to extract manifest resouce streams from .NET assemblies, and to dig out resources from ResourceManager style streams is not. This is non-trivial if you don't have the .NET framework to hand.

Nikhil Kothari

Posted on 7/15/2008 @ 1:01 PM
Hi Ian,
Yes, you're right, the way things are currently, it helps a lot to have the .NET framework around. Its been such a long time since some of the early application model design... but if I remember, there was a time when the xaml would not be embedded as resources.

I believe this is still possible technically, and its only a tooling issue - specifically the tool could have chosen to package the xaml files as loose files within the zip, and change the generated InitializeComponent implementation to use a xap-relative URI for loading the xaml, as opposed to loading it out of resources. I guess I'll be pinging the VS tools folks on what prompted the current approach.

That said, I'll re-iterate that the most valuable content to index is the data presented to the user, and not the user interface of the app itself.

mirc

Posted on 9/26/2008 @ 6:24 AM
thanks.
Post your comment and continue the discussion.
 
 
 

 

 
Refresh this form if the spam-protection code is not readable, or has expired. (Your input will be preserved)