Your web-browser is very outdated, and as such, this website may not display properly. Please consider upgrading to a modern, faster and more secure browser. Click here to do so.
mash is a web applications design and development firm located in Cairo, Egypt. We have a passion for solving problems — thus the domain name — in new different ways. Using our own personal Agile methodology we like to work in pairs and small teams to reach the highest potential we have and create new ideas.
At mash we believe that the world is more than just business and work, we like to contribute back to the community that made our success possible. From time to time we organize learning workshops and activity events, and we gladly love to help other developers to find their passion like we have done.Please pay us a visit on our website
This post is brought to you by cousine
So its this simple, we @ mash, Ltd. are currently working on 2 social platforms that are planned to scale and be large from day 1, not also to mention that we’ve previously worked on 2 other platforms meetphool.net and naqeshny.com, all four of them was activity centric and required a feed, social graph and of course a notification engine.
Having to do this each and every time we work on a project gets really annoying and doesn’t really fit with our learn and improve philosophy we brag about !!
So to save ourselves the headache, we decided to build a generic solution that is reliable, self contained and flexible.
Now wait, building a generic social engine isn’t what you call a walk in the park; add to that the reliable, self contained and flexible constraints and you’ve got yourself a hell of a thinker (we originally estimated 2 weeks of work for the whole engine) and mid project I’ve found that I need to document the experience just to organize my thought process before I continue writing code.
We started by defining a social engine to be a component (in our case a Ruby gem) containing the building blocks of a social graph to represent relations between different entities, an activity feed (bound with entities), and a notification engine (also bound to entities).
As always we started by consulting Google about the best practices available for solving the problem, specially the activity feed and notifications since they are the most hit component in a social platform.
Neo4j seemed great and really fits exactly what we want, but soon enough we were disappointed by it’s sheer reliance on the JVM, specially its Ruby driver which required that we use JRuby which to us seemed limiting, our other option was using it’s REST interface but we feared performance issues when we need to scale; it just wasn’t worthy enough.
FlockDB on the other hand was perfect; distributed design, simple, easy to use Ruby driver, but as soon as we started experimentation we hit some road blocks; to mention a few, it lacked typed fields, this meant we can’t store but relations between same entity types (no user - group relations for example), no weighted edges and few but very annoying crashes and bugs.
After some discussion with my fellow colleague Khaled Gomaa we decided to go about implementing our very own Graph DB, crazy as it sounds we started gathering some resources about the topic; first off we selected Scala (for obvious reasons of speed and syntax) for the code, Thrift for the interface and MySQL for the actual storage, we called it RebatDB.
It took us about a week to get the basic functionality laid out and working, and the result was awesome just what we want, of course its yet to be tested in the wild and undergo some extensive stress testing but nevertheless we were satisfied with the outcome and went to write the ruby driver.
Implementing the social graph logic from hereon went smoothly, for the sake of the post’s subject I won’t cover that up here, but you can check our progress and implementation here.
By now (the time of writing this post), we are implementing the feed, during our initial discussions regarding the social engine we settled to use Redis for feed caching while leaving the actual storage for the developer to choose (we use either SQL based or MongoDB @ mash).
Redis is a much advanced flavor of memcache, a key-value memory store with data structures; this meant speed and consistency, one problem of course is persistence thus using another DB for storage.
Our research and previous experience classified feed implementation into two main categories, either Fan Out or Fan In.
Fan Out simply meant that whenever a user performs an action, the system would go write down the activity on his/her followers feeds whereas Fan In means that whenever a user requests his/her feed, the system would fetch the activities of the people he is following, organize them and return them back to him/her.
Both have their pros and cons, so at the beginning we went with an in-between solution; to get the best out of the Fan In implementation, whenever a user performs an action its stored in his feed only rather than fanning out to his/her followers, this saves performance when the user has some thousand followers, as well as save the processing done when some of his/her followers are inactive.
This had one problem, that whenever a user refreshed his/her feed, we’ll have to fetch the activity of his/her follows, thats simply inefficient and will eventually cripple our servers, so instead we were inspired by HTTP caching and chose to implement something similar to the header request; when a user requests his/her feed the system will ask each of his follows if they’ve done anything in the near future (a limit for the cache), if they’ve it’d write the new activity to the cache and save it in the user’s feed otherwise nothing changes.
As nice as this solutions sounds, it has one big problem; a user with a few thousand follow will still cripple the system asking each of them if they have something new :S!!, this meant we have to go back to the drawing board and Fan Out was the best solution since it meant that we have to traverse the social graph once when a user performed an action instead of traversing it each and every time a user requested their feed.
So what about our design? it remains unchanged, we’d still use Redis for caching recent activity feeds (each user has their feed) and Mongo for persistently storing the feed (just incase a user decides to travel back in time); mainly this is done to minimize the hit on the main database serving the platform (after all its not all about the activity feed).
Finally, you can read up about feeds in the following two articles, “Redis Powered Activity with Aggregation” and “Instagram Architecture: 14 Million Users, Terabytes Of Photos, 100s Of Instances, Dozens Of Technologies”.
P.S. Since its already 2 AM, I am too lazy to read my post again for grammar and mistakes, so I’d like to apologize in advance for my poor writing skills at this hour of night :P and would humbly ask your assistance in correcting mistakes if any :P
Feedback on your website like never Before, Tabshoura Private Beta is now online @ http://tabshora.com/,hurry up and reserve ur spot ;)
Page 1 of 2