Wednesday, August 11, 2010

Tap in Tuesday

So I attended the first Montreal "Tap in Tuesday" (http://tapintuesday.wordpress.com/) meeting. The idea is to provide an opportunity for budding future entrepreneurs to meet with current entrepreneurs to allows the mixing of secret sauce, so perhaps we can all be successful.

The first meeting was mentored by Sebastien Provencher (http://blogs.praized.com/seb/), He provided an insightful account of his involvement in Praized Media (http://www.praizedmedia.com) from its inception through 3 product cycles to the current Needium (http://needium.com/) a local social marketing tool.

During the talk Sebastien discussed the notion of taking and owning a space in the technology field. In the case of Praized Media they have adopted the "local social" space, becoming world leaders, and then exploited this to provide marketable products based around the ideas they have put forward.

With the idea of carving your own perceived space, and building a large person social network, it help to promote you product once it comes to market, which in many case is the different between success and failure.

Wednesday, August 4, 2010

A different model for building applications

I have been working on a prototype application for a large client:

The Problem
  1. A large number of users would like to access their specific data to make queries.
  2. Each users data is held in a separate SQL database.
  3. The current app only has minimal indexes, and adding new indexes is limited by the engineers managing the SQL cluster.
  4. The new app would like to make much more complex queries than those currently supported by indexing.
  5. Future apps would like to update the schema, and queries supported.
  6. Most data sets are in the order of MB.
  7. Users are only active part time.
A Crazy Solution

Stream the data sets into an in memory schema-less database on demand, support a rich brute force SQL like query language, and add the notion that a query could be incomplete if the streaming wasn't complete.

Cross ones fingers, and hope that as the data sets grow, so does the RAM and CPU speed of the machines processing the queries.

A Working Prototype

So its working, and I have been building the start of a test application on top of the stack: I call it a stack because I'm currently running three different servers (I will add a couple more) to host my simple application.
  1. Emulation layer that emulates the simplest feature sets of the current SQL cluster. Access to the real cluster is restricted because of security, and also my connection bandwidth to the cluster is restricted.
  2. Query Engine: reads the data from the cluster, and process SQL like queries over it, and returns a result set. Only SELECT is supported, but output control, WHERE filter, GROUP, ORDER, LIMIT and OFFSET are all supported. there is currently no caching of query sets, or compiled querie.
  3. Application server, provides abstract interface between application and query engine, will allow me to write a normal browser application, and a custom iPhone application.
Common Questions:

Why not index the existing data? 
Tests show the current application on the SQL cluster is already stretching its performance.

Why not cache the data in another SQL cluster, with more indexes?
It was found that the performance of most database systems don't have the write performance to efficiently stream the data on demand from the main SQL cluster. Database load times were in the order of minutes.


The prototype solution can move most data sets in seconds, while using existing databases was taking minutes. For an on demand system it was felt important to have the user data ready as quickly as possible.
 
Why not use off the shelf database running in memory?
The data currently lives in one table, but as the applications expand the schema for each object in the table is going to change, so going with a normal table approach would lead to an ever expanding main table, or make writing queries much harder.

The Query Engine is a test for building a very large scale data assimilation platform, with query capabilities. At its core there is an expression compiler that generates msil (for the clr runtime).

Tuesday, August 3, 2010

C#, C#, No really C#

So I have started a new project, and yep its written in C#.

So why C# ... Well I have worked with Ruby/Rails and I wasn't overly happy with the performance. I have toyed around with Python/Django, I don't know why, but I don't seem to feel comfortable working with Python, it feels very lispy, but just misses the mark. Working with either a scheme or lisp is very nice, but no one else can touch the code, and you end up having to write a lot of extra stuff, which is just a pain.

I have played with C# in the past, the performance isn't so bad, the language isn't so bad, and you can find almost anything you want in terms of libraries. You can run it on a Linux platform using Mono ... and well the IDE (Visual Studio 10) is really just amazing.

For the web framework, I have decided to use my own, I'm sure that is going to cost me more in the long run, but there are lots of tricks I don't see myself being able to do with the likes of MVC.