Facebook Went Down – Did You?

September 24th, 2010 No comments

Yesterday, Facebook went down for about 2.5 hours. Thousands of sites across the web, seemingly unconnected to Facebook, went down with it.

Facebook hosts thousands of “apps,” including games such as Farmville, Celtics 3 Point Play, and the Bruins Face Off. For 2.5 hours, all of those apps were unavailable, which means a lot of lost revenue (through lost ad views and lost transactions) for their owners. Facebook also hosts “pages” for everything from the BBC World Service to Barack Obama and Radio Head – so for 2.5 hours, all of these pages, which provide information on everything from political rallies to news discussion and concert planning were out of service. Even beyond the walled garden of Facebook, there are sites elsewhere on the Internet that use Facebook’s login mechanism to authenticate their users – for 2.5 hours, every site that did so was down. And even beyond that, many sites host “Like” buttons and other Facebook social widgets, and for the 2.5 hour duration, the lucky sites were simply missing those widgets, while the not so lucky ones showed their user javascript errors, and some even stopped working entirely.

Facebook is relied upon my many thousands of sites across the Internet, providing a single point of failure for a truly astounding portion of the web.

Is that really a good idea?

The Internet was created to be a reliable network that would route around failures; any disrupted connection would be routed around. This philosophy was baked into the Internet Protocol, into how the backbone is designed, how companies set up servers in redundant configurations, and how the fundamental protocols work. For example, consider email. If the gmail.com server goes down, only its users are effected; if I’m emailing my friend @isobar.com from my @integralblue.com address, there is absolutely no impact to me.

However, lately with the rise of Facebook, Twitter, and Google, a few very important points in the network are appearing, and when they fail, they wreck havoc. Perhaps it’s time to start thinking about how we’re gradually eliminating the reliability and redundancy that has served the Internet so well for so long, and start moving back towards those founding Internet principles.

Cross posted to the Isobar blog – please comment there.

Categories: Uncategorized Tags: ,

Microblogging inside the Firewall

March 30th, 2010 No comments

Cross posted to Molecular Voices. Please comment there.

Little strings of text are big business – both publicly and inside the corporate firewall. As we all know, Twitter is pretty big – TV and radio ads for major companies mention their Twitter sites and even business cards reference Twitter URLs nowadays. But Twitter cannot be used with internal information, so there’s a lot of collaborative power waiting to be unleashed by microblogging inside the corporate firewall. Consider how much more productive everyday workers could be if they shared a few quick bits of knowledge.

For example, consider this timeline:

Alice: Client loved the sales pitch – we won! #sales
Brion: Vending machine has been re-stocked
Charles: #CSS reminds me of aspect oriented programming #aop
Darleen: Project is progressing according to schedule #project3
Evan: Fellow #project3 members: Is this front end policy useful for us? http://ur1.ca/shyu
Fred: @evan Possibly – let’s discuss this with @brion over lunch
Zach: @fred @evan we used those guidelines on #project5 and it worked out well
ITBot: Email server test failed. IT has been contacted.

These examples show that:

  • The barrier to entry is incredibly low (Alice posted immediately after a sales pitch, probably from a plane)
  • Useful business information is exchanged, as well as team-building (Brion provided non-business information about the vending machine that others will likely appreciate)
  • Because discussion is open to a broader audience than email, others participate in unexpected and beneficial ways (see how Zach, who isn’t even on project 3, helped the project 3 team)
  • Bots can publicize information gathered automatically. For example, IT could set up a bot to monitor servers and automatically publish status updates. Bots can also subscribe to RSS feeds bridging wiki and blogs with the microblogging world.

There are many other benefits once metadata is considered.

  • People choose who to follow. If Alice isn’t interested in the state of IT systems, she doesn’t subscribe to the ITBot.
  • Users can mark a message as a favorite. Messages that are favorited many times show up in a “favorites” list, which is a great source of useful information.
  • By clicking on a #project3, Brion can find all posts about his project, providing a powerful search option.
  • Messages may have optionally location data attached. Users can tell if the person they’re talking to is in the same office as they are, on vacation, working from home, at a client office, or at another branch of their company. This data allows users to make fast decisions about how to further communicate (phone, email, or walk).

At Molecular, we wanted to take advantage of what “firewalled” microblogging has to offer, so we evaluated a few private microblogging tools, looking for software that provides a familiar interface, allows customization of the look and feel, and has clients for different devices (like Twitter has). In the end, we chose StatusNet. (In the interest of full disclosure, I’m a contributing developer to the StatusNet project.)

StatusNet LogoThe StatusNet software (which also runs the ~200k user identi.ca site) is Free and Open Source so anyone can feel free to install, evaluate, and use it without worrying about contracts or licensing fees. However, StatusNet, Inc (the company that supports the StatusNet software) offers professional services if you chose to run the software on site, or hosting if you prefer it to be hosted elsewhere. If the “go it yourself” route is selected, installation is pretty straightforward as it runs on the popular LAMP stack and has a vibrant community willing to answer questions.

StatusNet can integrate with LDAP/Active Directory and even some Single Sign On solutions. No worrying about managing accounts as employees come and go, so private information stays private.

The software also supports a variety of clients on a number of platforms, from Windows, Mac, and Linux to iPhones and Androids.

After developing a custom skin, selecting which plugins to enable, and testing with a small group, we officially launched “IsoBuzz” to the entire organization last week. We’re already seeing some interesting conversations. Over time, we hope to see IsoBuzz became a powerful tool for knowledge sharing and collaboration, especially among distant offices and between departments.

Categories: Uncategorized Tags:

Running Ubuntu in VMWare

October 28th, 2009 2 comments

VMWare is a leading (if not the leading) virtualization solution. Unfortunately, it is also proprietary software, which means that distributions tend not to care too much about it (and in my opinion, rightfully so!).

My employer is one such company that uses VMWare, and it recently instituted a policy that all VMs must have VMWare Tools installed on them, which causes a number of problems for Linux sysadmins, such as myself.

  1. VMWare Tools is not Free software
  2. VMWare Tools is a pain to acquire: it’s not packaged in any distribution (due to the non-Free nature), finding it on VMWare’s site is a serious pain, and the version that VMWare server includes seems to be perpetually out of date.
  3. Installing VMWare Tools is not a fun experience. The installer requires you to figure out how to get the kernel sources, then compiles and installs some kernel modules, and throws a bunch of proprietary binaries all over your file system. Also, depending on what kernel you’re using, the modules may not compile at all, in which case you have to hunt down patches.
  4. Installing VMWare Tools on a bunch of servers is an even bigger annoyance, because there’s no real automated way to do it.

The solution to all of these problems is the open-vm-tools project. It’s packaged in Debian and Ubuntu, and by all means, should Just Work.

Here’s when things get really interesting. Open-vm-tools really does Just Work – if the packaging is done correctly. As it stands right now, the packaging just copies the kernel module sources, and you are expected to figure out how to compile and install them, and do so each time you change kernels. Thanks to DKMS, this could be done automatically.

In Ubuntu bug #277556, that’s exactly how it’s done. I’ve been using the PPA referenced in that bug on 5 servers for about 4 months now, and it works great. Installation? As simple as apt-get install open-vm-tools! Upgrade your kernel? Open-vm-tools recompiles automatically.

So for you all you Debian/Ubuntu users who run VMs on VMWare, take a look at this bug, and you should save yourself some serious time and effort.

Categories: Uncategorized Tags:

oEmbed

August 7th, 2009 1 comment

oEmbed is a relatively simple concept, which can be basically thought of as hyperlinking to the next level. According to oembed.com: “oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.”

Today, if I want to embed this Youtube video into a WordPress blog (such as this one), I need to complete these steps:

  1. Start typing my new blog post
  2. Switch browser windows, and go the Youtube video’s page
  3. Copy the “embed” code, which is kind of crazy looking:
    <object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/Pube5Aynsls&hl=en&fs=1&"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/Pube5Aynsls&hl=en&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>
  4. Switch back to the WordPress window, and paste the embed code (as HTML) into my WordPress post

Clearly, that’s not ideal. Figuring out where the embed code is, and how to copy and paste it as HTML into WordPress is not very easy, or intuitive. Now consider a future where WordPress is an oEmbed consumer, and Youtube is an oEmbed provider. To do the same thing, these are the steps:

  1. Start typing my new blog post
  2. Click the “embed” button in WordPress
  3. Enter the regular web browser link to the Youtube video in the box
  4. Click “OK.” WordPress will automagically figure out how to embed the video, and do it for you.

No copy and paste, no tabbing between pages, and best of all, no code. The user doesn’t need to know what oEmbed is, or how it works.

oEmbed can be used in more creative ways, too. For example, if you link to a Youtube video on the microblogging site identi.ca, the link will get a little paper clip next to it, and when clicked on, the video player will open in a lightbox. For example, take a look at this notice.

At this early stage of oEmbed’s lifetime, there are not many providers or consumers. To jumpstart the process, Deepak Sarda created oohembed, a service that acts as a provider for many sites that don’t yet support oEmbed themselves (since Youtube isn’t an oEmbed provider, identi.ca uses oohembed, and that’s how the video embedding notice example works). oohembed supports a number of popular sites, such as Youtube, Vimeo, Hulu, Wikipedia, and WordPress.com.

Hopefully, we’ll see more and more sites and pieces of software support oEmbed as both providers and consumers to improve their user experience. WordPress 2.9 will likely be an oEmbed consumer (so the theoretical process I gave above may soon become a reality), and I’ve created a plugin that makes WordPress an oEmbed provider. Here’s to an easier (to embed, at least) future!

Categories: Uncategorized Tags:

Install JBoss 4.2 on Centos/RHEL 5

July 1st, 2009 3 comments

I was recently tasked with installing JBoss 4.2 on Centos/RHEL 5. I found the experience remarkably difficult, so I figured I should share it for my own future reference, and hopefully to also save the sanity of whatever other poor souls are tasked with the same project.

  1. Start off with RHEL 5 or Centos 5
  2. Install jpackage50.repo into /etc/yum.repos.d/ (instructions for how to make this file can be found at jpackage.org)
  3. run “yum update”
  4. run “yum install jbossas”
  5. If you see this message: ” –> Missing Dependency: /usr/bin/rebuild-security-providers” then download jpackage-utils-compat-el5-0.0.1-1.noarch and install it by usingĀ  rpm -i jpackage-utils-compat-el5-0.0.1-1.noarch.rpm , then run “yum install jbossas” again. See this bug at Red Hat for details, and http://www.zarb.org/pipermail/jpackage-discuss/2008-July/012751.html for how the rpm was built.
  6. run “/sbin/chkconfig jbossas on” to start JBoss automatically at startup
  7. Until this bug is resolved , run this command: “ln -s /usr/share/java/eclipse-ecj.jar /usr/share/java/ecj.jar”
  8. If you want JBoss to listen to requests from systems other than localhost, edit /etc/jbossas/jbossas.conf. Create a new line that reads “JBOSS_IP=0.0.0.0”.
  9. put your .ear’s, .war’s, ejb .jar’s, *-ds.xml’s into /var/lib/jbossas/server/default/deploy
  10. Start JBoss by running “/etc/init.d/jbossas start”

JVM args can be found in /etc/jbossas/run.conf.

Note that if your web application (war, ear, whatever) depends on JNDI, you need to edit /var/lib/jbossas/server/default/deploy/jboss-web.deployer/META-INF/jboss-service.xml and a line for each JNDI data source like this: “<depends>jboss.jca:service=DataSourceBinding,name=jdbc/whatever</depends>”. This little detail cost me quite a few hours to figure out… an explanation as to why this is necessary can be found at http://confluence.atlassian.com/display/DOC/Known+Issues+for+JBoss. Basically, JBoss will start applications before JNDI data sources unless told otherwise, so your application will error out on startup with an exception like this: “Caused by: javax.naming.NamingException: Could not dereference object [Root exception is javax.naming.NameNotFoundException: jdbc not bound]”.

Some may argue that I should have simply downloaded the tar from jboss.org and manually installed JBoss without a package manager. However, the package manager offers a lot of advantages, such as dependency resolution/management, automatic updates for security and/or new features, clean and easy uninstall, and a lot more. When given the choice, I always choose to use a package manager, and will even create packages if ones are not available, and I report package bugs so others, and my future self,will have a better experience.

A lot of the pain in installing JBoss is due to bugs in the packaging. I hope that jpackage.org / Red Hat solves these problems soon – I wouldn’t really want anyone to have to live through the trouble I went through to figure all this out again.

Categories: Uncategorized Tags:

Compression (deflate) and HTML, CSS, JS Minification in ASP.NET

May 22nd, 2009 7 comments

As I’ve already demonstrated, I like performance. So I cache and compress a lot. When I was put onto an ASP.NET project at work, I obviously wanted to optimize the site, so here’s what I did.

Taking some hints from Y! Slow, I decided I wanted to:

  • Get rid of all the MS AJAX/toolkit javascript, as we used jQuery instead
  • Combine all the javascript into one request
  • Combine all the CSS into one request
  • Minify the CSS
  • Minify the javascript
  • Minify the HTML
  • Deflate everything (gzip is slightly larger, and all modern browsers support deflate, so I just ignored gzip)

I followed the directions outlined at this site to override the ScriptManager and prevent it from including the Microsoft AJAX javascript. Removing uunsed code is always a good thing.

Combining the javascript was easy. Starting in ASP.NET 3.5 SP1, ASP.NET’s ScriptManager supports the CombineScript tag inside of it. That was easy.

Combining the CSS was not so easy, as there’s no such thing in ASP.NET as a “ScriptManager.” I had two options: make a CSS manager (and use it everywhere), or figure out another way. Never taking the easy route when there’s a more interesting (and more front end developer transparent) way, I decided to make a filter (implementer of IHttpModule) to find all the “<link>” tags in the page header and replace them with one “<link>” to a combined CSS handler (which I called “CssResource.axd” to parallel ScriptManager’s “ScriptResource.axd”). Then, in my IHttpHandler implementation which handles CssResource.axd, I read the querystring, grab the requested CSS files from the file system, combine them into one string, and return them. CSS combining done.

For minifying the CSS and Javascript, I used the C# version of YUI Compressor. I used the original (Java) YUI Compressor before, and had a great experience, so picking this version was a no-brainer. In my aforementioned filter, I intercept requests for “ScriptResource.axd” and “CssResource.axd,” apply YUI Compressor to the response content, cache the result (so I don’t need to minify every single request), then return.

I also minify inline (as in, mixed with HTML) CSS and Javascript. Also in my filter, if the return type is HTML, I scan for “<script src” and “<link rel=’stylesheet’ src=” and minify their contents. This minification does have to happen for every request to that page, unless that whole page is cached.

Finally, the last thing the filter does is check if the browser accepts deflate compression. If it does, the filter compresses the stream. In the case of “ScriptResource.axd” and “CssResource.axd” requests, the deflating is done before the response is cached, so requests for those resources don’t need to be re-deflated for every request (their content is static, unlike regular html requests, so caching the whole response is okay).

The initial (cache empty) page load was 780k before I started. When I had finished, the page load was only 234k – a 70% decrease.

You can download the code from this site. To use it, you need to modify your web.config.

<system.web>
<httpModules>
<add type="CompressionModule" name="CompressionModule" /><!--This must be the last entry in the httpHandlers list-->
</httpModules>
<httpHandlers>
<add verb="GET,HEAD" path="CssResource.axd" validate="false" type="CssResourceHandler"/>
</httpHandlers>
</system.web>

I cannot claim 100% credit for all of this work. I got many ideas from just browsing web search results, trying things out, and combining ideas from various sources. If I have not credited you, and I should have – I apologize, and will be happy to do. But I can say, that I did not just “copy and paste” this from anywhere – I’m confident that this work cannot be classified as a derived work of anything else. With that in mind, I release it into the public domain.

Categories: Uncategorized Tags:

Hibernate Deep Deproxy

March 16th, 2009 2 comments

A common problem faced with using ORMs that use lazy loading is that the objects returned by the ORM contain (obviously) lazy loading references, so that you need an ORM session to access those objects. For example, if you have a “Person” class, that contains a “mother” property, when you do “person.getMother()”, the ORM will get the mother from the database when it’s requested – not when the person is initialized.

Lazy loading is great, because it means you don’t load a huge amount of data when you really just want one object (say you just want the person’s name, with lazy loading, the person’s mother is never retrieved). However, when you want to do caching, lazy loading can be a serious problem.

For example, let’s say I have a method I call a lot – “personDao.findAll()”. I’d like to cache this entire method, so I don’t need to hit the database or the ORM at all, so I use something like an aspect to do declarative caching on that method. On the second and subsequent calls, the returned list of persons won’t have sessions attached (as they’re still attached to the first caller, which is long gone), so they can’t load their lazy references, and you end up with the famous LazyInitializationException. If you know the list of people isn’t too big, and that it doesn’t refer to too many other objects, you can removed the lazy proxies and load everything at once – then cache that result. But be careful – by doing deep deproxying, all objects that are refered to will be loaded, so if you’re not careful, you can load the entire database, which results is either a loss of performance (due to using all the memory) or an immediate error.

Here’s how I do deep deproxying with Hibernate. I’ve read about many techniques to do this, but this approach works for better than anything I’ve been able to find so far.

    public T deepDeproxy(final Object maybeProxy) throws ClassCastException {
if(maybeProxy==null) return null;
T ret = deepDeproxy(maybeProxy,new HashSet<Object>());
return ret;
}
 
private T deepDeproxy(final Object maybeProxy,final HashSet<Object> visited) throws ClassCastException {
if(maybeProxy==null) return null;
Class clazz;
Hibernate.initialize(maybeProxy);
if (maybeProxy instanceof HibernateProxy) {
HibernateProxy proxy = (HibernateProxy) maybeProxy;
LazyInitializer li = proxy.getHibernateLazyInitializer();
clazz = li.getImplementation().getClass();
}
else {
clazz = maybeProxy.getClass();
}
T ret = (T) deepDeproxy(maybeProxy,clazz);
if(visited.contains(ret)) return ret;
visited.add(ret);
for (PropertyDescriptor property : PropertyUtils.getPropertyDescriptors(ret)) {
try{
String name = property.getName();
if(!"owner".equals(name) &&  property.getWriteMethod()!=null){
Object value = PropertyUtils.getProperty(ret, name);
boolean needToSetProperty=false;
if (value instanceof HibernateProxy) {
value = deepDeproxy(value,visited);
needToSetProperty=true;
}
if(value instanceof Object[]){
Object[] valueArray = (Object[]) value;
Object[] result = (Object[]) Array.newInstance(value.getClass(), valueArray.length);
for(int i=0;i<valueArray.length;i++){
result[i]=deepDeproxy(valueArray[i],visited);
}
value=result;
needToSetProperty=true;
}
if(value instanceof Set){
Set valueSet = (Set) value;
Set result = new HashSet();
for(Object o : valueSet){
result.add(deepDeproxy(o,visited));
}
value=result;
needToSetProperty=true;
}
if(value instanceof Map){
Map valueMap = (Map) value;
Map result = new HashMap();
for(Object o : valueMap.keySet()){
result.put(deepDeproxy(o, visited),deepDeproxy(valueMap.get(o),visited));
}
value=result;
needToSetProperty=true;
}
if(value instanceof List){
List valueList = (List) value;
List result = new ArrayList(valueList.size());
for(Object o : valueList){
result.add(deepDeproxy(o,visited));
}
value=result;
needToSetProperty=true;
}
if(needToSetProperty) PropertyUtils.setProperty(ret, name, value);
}
}catch (java.lang.IllegalAccessException e){
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
} catch (NoSuchMethodException e) {
e.printStackTrace();
}
}
return ret;
}
 
private <T> T deepDeproxy(Object maybeProxy, Class<T> baseClass) throws ClassCastException {
if(maybeProxy==null) return null;
if (maybeProxy instanceof HibernateProxy){
return baseClass.cast(((HibernateProxy) maybeProxy).getHibernateLazyInitializer().getImplementation());
}else{
return baseClass.cast(maybeProxy);
}
}
Categories: Uncategorized Tags:

EhCache implementation of OpenJPA caching

March 12th, 2009 2 comments

I usually use Hibernate, which supports a number of caching implementations (such as EhCache, oscache, JBoss, etc). My most recent project had a dependency on a product which has a dependency on OpenJPA, and OpenJPA only has it’s own built in implementations of a query cache and a data cache. I like to have one caching implementation in my project, so having two (OpenJPA’s for itself, and EhCache for everything else) annoyed me. So I had to fix it.

I started with Pinaki Poddar’s implementation of a Coherence provided OpenJPA data cache. I changed it to use EhCache, adjusted the unit tests, and then added a query cache implementation. To use it, add a dependency on the openjpa-ehcache, then set OpenJPA’s “openjpa.QueryCache” to “ehcache” and “openjpa.DataCacheManager” to ehcache. That’s it!

The code can be compiled with Maven. Simply run “mvn install”.

My code, like EhCache and OpenJPA, is licensed under the Apache Public License 2.0. Get it here.

Categories: Uncategorized Tags:

One HTTPS site per IP address… or may be not?

February 26th, 2009 1 comment

I randomly ran across SNI (aka RFC 4366) tonight. It’s a technology that has been under development since before 2000 that allows the client to tell the server what domain it’s visiting before the server sends the certificate. The history is fascinating!

The situation today is that SNI is not here yet. OpenSSL will support it starting in 0.9.9, but has it as a compile time option (default disabled) as of 0.9.8f. Apache may support in it’s next minor release (2.2.12), or maybe not… at least it’s in their trunk, so it will be released someday. I just installed the SNI patch on my Apache 2.2.11 server, and I’m going to try it out. IIS has no stated plan to support it or not. The other popular servers, like Cherokee, lighthttps, and nginx, support it today.

But, as usual, browser support is the limiting factor:

As usual, Internet Explorer is the limiting factor. You need *Vista* to use SNI, so given that IE6 still has a decent market share, and it’s 8 years old… it’s going to be at least 2017 before we can reliably host multiple HTTPS sites on the same IP address – and who knows about embedded browsers (like those in cell phones and PDAs). Perhaps using one IPv6 address per HTTPS site will be more practical before SNI is widely available… who knows.

Categories: Uncategorized Tags:

Why would a cache include cookies?

February 25th, 2009 No comments

Ehcache’s SimplePageCachingFilter caches cookies. And that baffles me… why would a cache include cookies in it?

I ran into the interesting situation where servlets, interceptors, and all those other Java goodies were writing cookies for purposes like the current browsing user’s identifier so it could track that user on the site and keep track of his shopping cart. The problem, which is obvious in retrospect but was incredibly puzzling at first, was that the cookies that included the user id were being cached, so when a subsequent user hit that page, he got the original requester’s user id, and got all that implied (like his cart).

Since each page is cached separately and at separate times, and there is more than one user on the site, visitors would see their carts changing, items seemingly appearing and disappearing randomly, and other such fun. For example, if Alice happened to hit the home page when its cache was expired, her user id cookie ended up in the home page cache. Then Bob comes along and hits the accessories page when its cache has expired, so his user id cookies ends up in that page’s cache. Finally, Charles visits the home page, and sees Alice’s cart. Then, he goes to the accessories page, and sees Bob’s cart. It’s just an incredibly weird and confusing situation!

I’ve been wracking my brain on the topic of caching cookies – when would it be useful? Cookies, as far as I can imagine (and have experienced), contain only user unique information – so why would you cache them?

To solve this problem, I extended SimplePageCachingFilter and overrode the setCookies method, having it be a no-op. And I filed a bug report with Ehcache.

Apache’s mod_cache will include cookies in its cache too. But, in their documentation, they specifically point out the case of cookies in their example of how to exclude items from the cache. It seems Apache knows including cookies is a bad idea… perhaps they should default to excluded?

Categories: Uncategorized Tags: