Roy van Kaathoven
Full Stack Developer

22 Oct

Isolating memory leaks in a concurrent system

While working on Advanced-Forms i have to deal with various memory issues which arise when running the server application.

The server component of Advanced-Forms is a continuously running background service which can run many different components in parallel at any given moment. Some of these components are third party maintained by other companies or open sources projects like FTP, Mail, Data Conversions, etc..

Sometimes a component will leak memory and it has to be traced down. When i tracked it down and it happens to be an open source project then i can patch it and send the patch back upstream. The change will be merged and i don't have to maintain a separate repository with my own fixes.

In other cases the component is from a company which does not accept patches or the leak is way harder to track down or the code changes are just to big to maintain a separate repository with patches.

In the second case the component can be isolated in a separate AppDomain and any leaked memory will be cleaned up when the AppDomain is released. What is a Appdomain you ask? Well it provides a layer of isolation, unloading, and security boundaries for executing managed code within a process. You would think that things like static variables are "per program" but it's actually per AppDomain.

AppDomains are costly to create so creating more then necessary is a waste of resources and will hurt performance. Reusing an AppDomain will drastically speed things up.

Lets define the interface of the Object Pool. The pool must be able to give an object and we must be able to return the object to be reused.

public interface IObjectPool where T : IDisposable
{
    T GetObject();
    void GetObject(Action func);
    void ReleaseObject(T obj);
}

The GetObject(Action func); is a convenience method to get and automatically release an object, it can be used as follows:

pool.GetObject(obj => obj.DoSomething())

The object pool must return an object, in this case an AppDomain when calling the GetObject method. When an object is already in use by another thread then create a new object and return it. This way a thread does not have to wait for other threads to complete their work, and objects will be reused.

When the objectpool is heavily used then there typically will be the same amount of objects in the pool as there are available threads (usually 2 for each available core)

public abstract class ConcurrentObjectPool : IObjectPool, IDisposable where T : IDisposable
{
    private readonly List _available = new List();

    private readonly List _inUse = new List();

    protected abstract T CreateObject();

    protected abstract void CleanUp(T obj);

    protected abstract bool ShouldCleanup(T obj);

    public T GetObject()
    {
        lock (_available)
        {
            if (_available.Count != 0)
            {
                var po = _available[0];
                _inUse.Add(po);
                _available.RemoveAt(0);
                return po;
            }
            else
            {
                var po = CreateObject();
                _inUse.Add(po);
                return po;
            }
        }
    }

    public void GetObject(Action func)
    {
        var obj = GetObject();

        try
        {
            func(obj);
        }
        finally
        {
            ReleaseObject(obj);
        }
    }

    public void ReleaseObject(T po)
    {
        lock (_available)
        {
            _inUse.Remove(po);

            if (ShouldCleanup(po))
            {
                CleanUp(po);
            }
            else
            {
                _available.Add(po);
            }
        }
    }

    public void Dispose()
    {
        _inUse.ForEach(x => x.Dispose());
        _available.ForEach(x => x.Dispose());
    }
}

Next up the AppDomainFactory which builds the AppDomains and returns an instance of the object which is located in the isolated AppDomain

public class AppDomainFactory : IDisposable
{
    protected AppDomain Domain;

    public int NumberOrRuns { get; private set; }

    public ProcessorServiceDomain()
    {
        NumberOrRuns = 0;
    }

    protected AppDomain GetDomain()
    {
        if (Domain == null)
        {
            var appDomainSetup = new AppDomainSetup
            {
                ShadowCopyFiles = "false",
                ApplicationBase = AppDomain.CurrentDomain.SetupInformation.ApplicationBase
            };

            Domain = AppDomain.CreateDomain(
                "AppDomain-" + Guid.NewGuid(),
                AppDomain.CurrentDomain.Evidence,
                appDomainSetup);
        }

        return Domain;
    }

    public T CreateInstance() where T : MarshalByRefObject
    {
        NumberOrRuns++;
        return (T)GetDomain().CreateInstanceAndUnwrap(typeof(T).Assembly.FullName, typeof(T).FullName);
    }

    public void Dispose()
    {
        AppDomain.Unload(Domain);
        Domain = null;
    }
}

Notice that the T must be a covariant of MarshalByRefObject. If the object is not MarshalByRefObject then CreateInstanceAndUnwrap() will create a new object in the new AppDomain, serialize it, transfer the serialized instance to the original AppDomain, deserialize it and call the method on the deserialized instance. Which does not contain memory leaks.

One important setting is the ShadowCopyFiles (why the hell is that a string) which must be set to false to make sure DLL/EXE are not being locked.

The implementation is really easy now, just return the object when CreateObject is called and cleanup every x runs.

public class ObjectPoolImpl : ConcurrentObjectPool
{
    protected override AppDomainFactory CreateObject()
    {
        return new AppDomainFactory();
    }

    protected override void CleanUp(AppDomainFactory obj)
    {
        obj.Dispose();
    }

    protected override bool ShouldCleanup(AppDomainFactory obj)
    {
        return obj.NumberOrRuns > 500;
    }
}

The pool can now be used:

// Default method
var pool = new ObjectPoolImpl();
var object = pool.GetObject();
object.CreateInstance().DoSomething();
pool.ReleaseObject(object);

// Alternative
pool.GetObject(obj => {
    obj.CreateInstance().DoSomething();
});

Make sure that when objects are being passed to any functions or methods that the objects are marked [Serializable] because the object has to be serialized between AppDomains

Continue Reading

05 May

Implementing GCM server with Scala and Akka

I started working on an Android app for the Roboflow project. I have a LG Watch R and i thought it would be handy to be able to send notifications from a flow within Roboflow. So users can easily configure notifications for themselves for the things that matter.

To get started i had to implement a client. Following the GCM Client Documentation was pretty straightforward and after some reading and by following the examples i quickly had a working client. Maybe i will talk more about this in a follow up blog.

The next thing was implementing a GCM server which can be done in 2 ways:

  • HTTP server, by simply sending requests to a Google endpoint which then sends the messages to the clients
  • XMPP server, which is a bidirectional XML protocol which allows the client to talk back to the server

I started out with the HTTP server to quickly try and see if my client was working properly. The request looks as follows:

Content-Type:application/json
Authorization:key=AIzaSyB-1uEai2WiUapxCs2Q0GZYzPu7Udno5aA

{
  "registration_ids" : ["APA91bHun4MxP5egoKMwt2KZFBaFUH-1RYqx..."],
  "data" : {
    "message": "This is a message",
    "title": "Test"
  },
}

The data object may contain arbitrary data, in my case my client will send a notification to my smartwatch with the given title and message.

Doing webrequests with Play Frameworks is simple with the included library which provides a simple API to make asynchronous HTTP calls.

import play.api.Logger
import play.api.libs.json.Json
import play.api.libs.ws.WS

class GcmRestServer(val key: String) {

  def send(ids: List[String], data: Map[String, String]) = {

    import play.api.Play.current
    import scala.concurrent.ExecutionContext.Implicits.global

    val body = Json.obj(
      "registration_ids" => ids,
      "data" => data
    )

    WS.url("https://android.googleapis.com/gcm/send")
      .withHeaders(
        "Authorization" => s"key=$key",
        "Content-type" => "application/json"
      )
      .post(body)
      .map { response =>
        Logger.debug("Result: " + response.body)
      }
    }

}

I ran the code and the notification showed up instantly on my smartwatch

val server = new GcmRestServer("key")
server.send(List("clientid"), Map(
  "message" => "Test Message",
  "title" => "Test Title"
))

The fun started when i started to try and implement the XMPP server by following [the documentation(https://developer.android.com/google/gcm/ccs.html).

The examples were out of date and used old libraries without mentioning any version numbers. After some tinkering and trying to get rid of errors i decided to throw everything away and write a server with the latest version of smack

I started from scratch by using the latest smack 4.1 and loosely followed the examples.

Add the dependencies to build.sbt

libraryDependencies ++= Seq(
  "org.igniterealtime.smack" % "smack-java7" % "4.1.0",
  "org.igniterealtime.smack" % "smack-tcp" % "4.1.0",
  "org.igniterealtime.smack" % "smack-im" % "4.1.0",
  "org.igniterealtime.smack" % "smack-extensions" % "4.1.0"
)

Then define some settings

object GcmSettings {
  val server = "gcm.googleapis.com"
  val port = 5235
  val elementName = "gcm"
  val namespace = "google:mobile:data"
}

server and port point to the Google servers. elementName and namespace refer to the XML tags within the XMPP messages. XMPP is a XML based protocol and these elements refer to the tags within the XML messages that are being send and received.

The connection needs some configuration.

val configBuilder = XMPPTCPConnectionConfiguration.builder
  .setSecurityMode(SecurityMode.disabled)
  .setSocketFactory(SSLSocketFactory.getDefault)
  .setServiceName(GcmSettings.server)
  .setHost(GcmSettings.server)
  .setDebuggerEnabled(true) // Only enable during development
  .setPort(GcmSettings.port)
  .build

Time to make a connection, make sure you have your project id and api key handy

connection = new XMPPTCPConnection(configBuilder)

// The users contact list is being called a roster in XMPP, GCM does not support a contact list and will throw errors when its not disabled
Roster.getInstanceFor(connection).setRosterLoadedAtLogin(false)

// Enable the automatic reconnection
ReconnectionManager.getInstanceFor(connection).enableAutomaticReconnection()

// Make connection
connection.connect()

val projectId = 798234798234987L
val apiKey = "1234567890qwerty"

// Username and password are based on the projectId and apiKey
connection.login(s"$projectId@gcm.googleapis.com", apiKey)

Now the server should connect and when the debugger is enabled you should see some messages arrive. Try sending some messages from your Android device.

To start handling messages there must be some extensions to be added, first create the GcmPacketExtension

import org.jivesoftware.smack.packet._
import org.jivesoftware.smack.util._

class GcmPacketExtension(val json: String) extends DefaultExtensionElement(GcmSettings.elementName, GcmSettings.namespace) with PlainStreamElement {

  import GcmSettings._

  override def toXML = {
    String.format("%s%s>", elementName, namespace, StringUtils.escapeForXML(json), elementName)
  }

  def toJson = Json.parse(json)

  def toPacket = {
    val message = new Message()
    message.addExtension(this)
    message
  }
}

Now add a the extension parser to the ProviderManager, this makes sure that Google Cloud Messages are serialized and deserialized properly

ProviderManager.addExtensionProvider(elementName, namespace, new ExtensionElementProvider[GcmPacketExtension]() {
  override def parse(xmlPullParser: XmlPullParser, i: Int) = {
    val json = xmlPullParser.nextText()
    new GcmPacketExtension(json)
  }
})

Add a listener to the connection which handles the messages, the getExtension method returns the GcmPacketExtension as configured by the Extension Provider

case class GcmMessage(
  message_id: String, 
  message_type: Option[String] = None,
  from: String,
  data: Option[JsObject] = None,
  to: Option[String] = None)

connection.addSyncStanzaListener(new StanzaListener() {
  override def processPacket(packet: Stanza): Unit = {
    val ext = packet.getExtension(elementName, namespace).asInstanceOf[GcmPacketExtension]
    handleMessage(ext.toJson.as[GcmMessage])
  }
}, MessageTypeFilter.NORMAL)

And make a receive and send handler to handle the messages

def handleMessage(message: GcmMessage) = {
  message.message_type match {
    case Some("ack") =>
      // Handle ack
      val from = message.from
      val messageId = message.message_id
    case Some("nack") =>
      val from = message.from
      val messageId = message.message_id
    case x =>
      sendMessage(GcmMessage(
        from = "server",
        to = Some(message.from),
        message_id = nextId,
        data = Some(Json.obj(
          "message" => "echo",
          "title" => "Server Response"
        ))))

      sendMessage(GcmMessage(
        from = "server",
        to = message.to,
        message_type = message.message_type,
        message_id = message.message_id))
  }
}

def sendMessage(message: GcmMessage) = {
  val request = new GcmPacketExtension(Json.toJson(message).toString())
  connection.sendStanza(request.toPacket)
}

Full code is as follows:

package roboflow.module.google.service

import javax.net.ssl.SSLSocketFactory

import akka.actor.{Actor, Cancellable}
import org.jivesoftware.smack.ConnectionConfiguration.SecurityMode
import org.jivesoftware.smack.filter.MessageTypeFilter
import org.jivesoftware.smack.packet._
import org.jivesoftware.smack.provider.{ExtensionElementProvider, ProviderManager}
import org.jivesoftware.smack.roster.Roster
import org.jivesoftware.smack.tcp.{XMPPTCPConnection, XMPPTCPConnectionConfiguration}
import org.jivesoftware.smack.util.StringUtils
import org.jivesoftware.smack.{ReconnectionManager, StanzaListener}
import org.xmlpull.v1.XmlPullParser
import play.api.libs.json.{JsObject, Json}

import scala.concurrent.duration._

object GcmSettings {
  val server = "gcm.googleapis.com"
  val port = 5235
  val elementName = "gcm"
  val namespace = "google:mobile:data"
}

object GcmMessage {
  implicit val format = Json.format[GcmMessage]
}

case class GcmMessage(message_id: String, message_type: Option[String] = None, from: String, data: Option[JsObject] = None, to: Option[String] = None)

class GcmPacketExtension(val json: String) extends DefaultExtensionElement(GcmSettings.elementName, GcmSettings.namespace) with PlainStreamElement {

  import GcmSettings._

  override def toXML = {
    String.format("%s%s>", elementName, namespace, StringUtils.escapeForXML(json), elementName)
  }

  def toJson = Json.parse(json)

  def toPacket = {
    val message = new Message()
    message.addExtension(this)
    message
  }
}

class GcmServer(val projectId: Long, val apiKey: String) extends Actor {

  import GcmSettings._

  var id = 0
  var cancellable: Cancellable = null
  var connection: XMPPTCPConnection = null

  def nextId = {
    id += 1
    s"m-$id"
  }

  /**
   * Connect with the XAMMP server
   */
  def connect() = {

    ProviderManager.addExtensionProvider(elementName, namespace, new ExtensionElementProvider[GcmPacketExtension]() {
      override def parse(xmlPullParser: XmlPullParser, i: Int) = {
        val json = xmlPullParser.nextText()
        new GcmPacketExtension(json)
      }
    })

    val configBuilder = XMPPTCPConnectionConfiguration.builder
      .setSecurityMode(SecurityMode.disabled)
      .setUsernameAndPassword(s"$projectId@gcm.googleapis.com", apiKey)
      .setSocketFactory(SSLSocketFactory.getDefault)
      .setServiceName(server)
      .setHost(server)
      .setDebuggerEnabled(true)
      .setPort(port)
      .build

    connection = new XMPPTCPConnection(configBuilder)

    connection.addSyncStanzaListener(new StanzaListener() {
      override def processPacket(packet: Stanza): Unit = {
        val ext = packet.getExtension(elementName, namespace).asInstanceOf[GcmPacketExtension]
        handleMessage(ext.toJson.as[GcmMessage])
      }
    }, MessageTypeFilter.NORMAL)

    Roster.getInstanceFor(connection).setRosterLoadedAtLogin(false)
    ReconnectionManager.getInstanceFor(connection).enableAutomaticReconnection()

    connection.connect()
    connection.login(s"$projectId@gcm.googleapis.com", apiKey)

    sendMessage(GcmMessage(
      from = "server",
      message_id = nextId.toString,
      to = Some("clientId"),
      data = Some(Json.obj(
        "message" => "Server connected",
        "title" => "Server connected"))
    ))
  }

  def handleMessage(message: GcmMessage) = {
    message.message_type match {
      case Some("ack") =>
        // Handle ack
        val from = message.from
        val messageId = message.message_id
      case Some("nack") =>
        val from = message.from
        val messageId = message.message_id
      case x =>
        sendMessage(GcmMessage(
          from = "server",
          to = Some(message.from),
          message_id = nextId,
          data = Some(Json.obj(
            "message" => "echo",
            "title" => "Server Response"
          ))))

        // Acknowledge received messages
        sendMessage(GcmMessage(
          from = "server",
          to = message.to,
          message_type = message.message_type,
          message_id = message.message_id))
    }
  }

  def sendMessage(message: GcmMessage) = {
    val request = new GcmPacketExtension(Json.toJson(message).toString())
    connection.sendStanza(request.toPacket)
  }

  override def preStart() = {
    import context.dispatcher
    cancellable = context.system.scheduler.schedule(1.minute, 1.minute, self, "ping")
    connect()
  }

  override def postStop() = {
    if (cancellable != null && !cancellable.isCancelled) {
      cancellable.cancel()
    }
    if (connection != null && connection.isConnected) {
      connection.disconnect()
    }
  }

  def receive = {
    case "ping" =>
      sendMessage(GcmMessage(
        from = "server",
        message_id = nextId.toString,
        data = Some(Json.obj(
          "message" => "ping"))
      ))
  }
}
Continue Reading

30 Dec

Solving Paper Mazes with Python

Every year just after Christmas there is a quiz game called "De Rooise Kwis" in which a lot of people in Sint Oedenrode participate. This year there were 151 teams.

The quiz consists of questions in a wide range of categories, which can be found here

As soon as we got the questions we formed pairs to solve a different category, i was assigned to the puzzles. While scanning through the questions i quickly saw the large maze which had to be solved. There are letters in the maze which form a word if you follow the correct path to the exit.

Instead of manually solving the puzzle i decided to try and solve it with some Python. First i scanned the image of the maze and opened up Gimp to prepare the image by replacing colors with black and white colors.

This approach uses the Breadth First Search algorithm which is not very efficient but guarantees to solve it. A visualization of the algorithm can be found at Pathfinding.js.

import sys

from Queue import Queue
from PIL import Image

# Point where the algorithm should start
start = (4,4)

# Point where the maze is solved
end = (1363,1368)

# Check if the pixel is white/visitable
def iswhite(value):
    if value == (255,255,255):
        return True

def getadjacent(n):
    x,y = n
    return [(x-1,y),(x,y-1),(x+1,y),(x,y+1)]

# Breadth first search
def BFS(start, end, pixels):

    queue = Queue()
    queue.put([start]) # Wrapping the start tuple in a list

    while not queue.empty():

        path = queue.get()
        pixel = path[-1]

        if pixel == end:
            return path

        for adjacent in getadjacent(pixel):
            x,y = adjacent

            if x > 0 and y > 0 and iswhite(pixels[x,y]):

                pixels[x,y] = (127,127,127) # mark the visited area grey
                new_path = list(path)
                new_path.append(adjacent)
                queue.put(new_path)

    print "No answer was found."

if __name__ == '__main__':

    maze_image = "maze.png"

    base_img = Image.open(maze_image)
    base_pixels = base_img.load()

    path = BFS(start, end, base_pixels)

    path_img = Image.open(maze_image)
    path_pixels = path_img.load()

    for position in path:
        x,y = position
        path_pixels[x,y] = (255,0,0) # red

    path_img.save("maze-solved.jpg")

The code runs the algorithm and when it finds a solution it draws a red 1 pixel line (hard to see) to the exit of the maze.

The whole solution took me around 20 minutes, which saved me a lot of time compared to the other people who tried to solve it, and in some cases did not even finish it.

Paper Maze 1

Paper Maze 2

Continue Reading

06 Nov

Translating urls in Zend Framework 2

Recently i needed to build a multi language website using Zend Framework 2. The framework has a lot options to translate the website, but one thing that was lacking in the documentation was a method to translate URL's.

After some research i found a Pull Request which adds the ability to translate routes. The current version only supports the Segment route for now, which is fine for basic routing.

In order to start translating routes the router needs to be aware of the translator, do this by injecting the translator into the router during the module bootstrap.

namespace Application;

class Module
{
    public function onBootstrap(MvcEvent $e)
    {
        $app = $e->getApplication();
        $sm = $app->getServiceManager();
        $translator = $sm->get('translator');

        // Attach the translator to the router
        $e->getRouter()->setTranslator($translator);
    }
}

Now the routes can be translated, it requires some modifications to the URL's which are configured in the 'router.routes' key in the module.config.ph

By default the router object does not support translating so it has to be replaced by a router which is translator aware. Replace the router with TranslatorAwareTreeRouteStack by setting the router_class configuration.

The segments can be translated by wrapping the parts which needs to translated within curly braces. If the route is called 'home' then it must be replaced with '{home}'

return [
    'router' => [
        // Replace the default router with the Translator aware TreeRouteStack
        'router_class' => 'Zend\Mvc\Router\Http\TranslatorAwareTreeRouteStack',
        'routes' => [
            'home' => [
                'type' => 'Segment',
                'options' => [
                    'route' => '/{home}', // The parts within curly braces will be translated
                    'defaults' => [
                        'controller' => '...',
                        'action' => 'index',
                    ],
                ],
                'may_terminate' => true,
                'child_routes' => [
                    'profile' => [
                        'type' => 'Segment',
                        'options' => [
                            'route' => '/{profile}',
                            'defaults' => [
                                'controller' => '...',
                                'action' => 'profile',
                            ]
                        ]
                    ]
                ]
            ]
        ]
    ]
];
Continue Reading

10 Oct

Akka Event Bus

I've been working on a new project for a while which is based on Scala, Play Framework and Akka. The goal of the project is to create a library which can communicate with the Steam Servers, and using the library write multiple bots which perform various actions, like starting trades with other Steam users and exchange items.

There can be any number of bots online and their events can be handled by a varying number of services. The services should not have a hard dependency on eachother or other bots, they all have to communicate using a EventBus.

What is a Event Bus?

The Event Bus Pattern is a publish-subscribe-style communication pattern where services or objects can subscribe to the event bus and listen to specific types of events. Other services can publish events to the Event Bus, which then will be propagated to subscribers which are interested in the specific event type.

This pattern is a flexible loosely coupled design which allows for decoupled components that are not aware of eachother.

When to use

There are a lot of use cases for an event bus, a few common are:

  • GUI's: In a traditional application, multiple sources of events can exist across the application. A menu or toolbar button, a mouse click, window resizing, or some other external data source could feasibly produce the same desired result in the application. Selecting a certain row in a JTable, for example, might result in a toolbar button being deactivated, a database record updated, and a new window spawned. A double click of a JTree item might invoke another long chain of events. The event bus helps solve this problem by simply allowing each component to subscribe to the event bus, and when an interesting action occurs (such as the row selection in a table), an event should be generated to the bus. Each subscriber to the event type will be notified and can react accordingly.
  • Service Oriented Architecture: In a SOA environment there can be a lot of services which perform various actions. The Orderservice will handle an order and notify that an order has succesfully been completed. The newsletter service listens and subscribes the customers to its newsletter, the reporting service adds the informations to its database for later BI, and the inventory registers that the items are substracted from the warehouse. This business logic can be designed in a loosely coupled way using the EventBus.

Implementation

In order to construct a valid EventBus you need to inherit the EventBus class and mix a Classification trait. There are 3 classifications distributed with Akka which you can used in your application.

  • Lookup
  • SubChannel
  • Scanning

The default eventbus provides the following methods which are used to interact with actors:

// subscribes the given subscriber to events with the given classifier
def subscribe(subscriber: Subscriber, classifier: Classifier): Boolean
// undoes a specific subscription
def unsubscribe(subscriber: Subscriber, classifier: Classifier): Boolean
// undoes all subscriptions for the given subscriber
def unsubscribe(subscriber: Subscriber): Unit
// // publishes an event, which first is classified according to the specific bus (see Classifiers) and then published to all subscribers for the obtained classifier
def publish(event: Event): Unit

In order to build a valid event bus you will need to implement the following types:

Event specifies the type of messages the Eventbus will accept Classifier specifies the type of the classifier Subscriber specifies the type of the subscribers

Examples

The 3 classifications that are already available suit most cases so lets start with explaining how they can and should be used, and which fits your use case.

Lookup classification

Lookup classification is the most basic and provides a simple string matcher. If the published string is exactly the same as a subscriber has subscribed to then it will receive a message.

case class MessageEvent(val channel: String, val message: Any)
case class Message(text: String)

class LookupEventBus extends ActorEventBus with LookupClassification {
  type Event = MessageEvent
  type Classifier = String

  protected def mapSize(): Int = 10

  protected def classify(event: Event): Classifier = {
    event.channel
  }

  protected def publish(event: Event, subscriber: Subscriber): Unit = {
    subscriber ! event.message
  }
}

The example below shows a subcriber which listens to events send to the "/events" channel, it then publishes a new messages on the "/events" channel and the message is received and printed.

The second publish/subscribe shows exactly the same but on a more specific channel "/events/10". The message is only received by one subscriber because the Lookup classification does not support subchannels.

  object LookupClassificationApp extends App {

    val system = ActorSystem()
    val eventBus = new LookupEventBus

    val subscriber = system.actorOf(Props(new Actor {
      def receive = {
        case Message(text) => println(text)
      }
    }))

    eventBus.subscribe(subscriber, "/events")
    eventBus.publish(MessageEvent("/events", Message(text = "hello world!")))
    // Output: hello world!

    eventBus.subscribe(subscriber, "/events/10")
    eventBus.publish(MessageEvent("/events/10", Message(text = "hello world!")))
    // Output: hello world!
  }

Subclass classification

The subclass specification is a EventBus which supports a hierachy of channels or classifiers. This first example focuses on a simple string based approach which allows to listen to a top level channel and their leafs.

case class MessageEvent(channel: String, message: Any)
case class Message(text: String)

object SubChannelClassificationEventBus extends EventBus with SubchannelClassification {
  type Event = MessageEvent
  type Classifier = String
  type Subscriber = ActorRef

  protected def classify(event: Event): Classifier = event.channel

  protected def subclassification = new Subclassification[Classifier] {
    def isEqual(x: Classifier, y: Classifier) = x == y
    def isSubclass(x: Classifier, y: Classifier) = x.startsWith(y)
  }

  protected def publish(event: Event, subscriber: Subscriber): Unit = {
    subscriber ! event.message
  }
}

The example below shows subcribers and publishers which become increasingly more specific, and that previous subscribers which are higher up the hierachy will catch the upcoming more specific publishers.

object ChannelClassification extends App {

  val system = ActorSystem()
  val eventBus = new SubChannelClassificationEventBus

  val subscriber = system.actorOf(Props(new Actor {
    def receive = {
      case Message(text) => println(text)
    }
  }))

  // Top level
  eventBus.subscribe(subscriber, "/")
  eventBus.publish(MessageEvent("/", Message(text = "hello world!")))
  // Output: hello world!

  // Only events
  eventBus.subscribe(subscriber, "/events")
  eventBus.publish(MessageEvent("/events", Message(text = "hello world!")))
  // Output: hello world!
  // Output: hello world!

  // Specific events
  eventBus.subscribe(subscriber, "/events/10")
  eventBus.publish(MessageEvent("/events/10", Message(text = "hello world!")))
  // Output: hello world!
  // Output: hello world!
  // Output: hello world!
}

Downfalls

The sender is not preserved when receiving a message which passed through a EventBus, to solve this you need to pass the ActorRef manually

Continue Reading

24 Sep

Scraping with Scala

A few months ago i got asked if i could scrape data from various websites, store it in a database and output some nice reports. Back then i've written a simple scraper with Ruby and Nokogiri, saved everything to a Microsoft SQL database and created reports using Excel. Because of the tight budget the script became a quick and dirty solution, but it did the job and the client was happy.

Now that the scrape requests are becoming more common i wanted to design a framework in Scala which should make scrape jobs a lot easier, faster and fun to write. But why should i develop a new webscraping framework when there are already plenty of alternatives in various languages?

Firstly because i like learning new things in Scala, and webscraping is something i haven't tried out in Scala yet. Secondly because after some research the available frameworks all lack something, the following things i was not happy with:

  • Most frameworks are single-threaded and can't download pages in parallel
  • They do not scale horizontally
  • The amount of code it requires do something simple is to much

Akka solves point 1 with Round Robin Routing, and point 2 with Remote Actors. Points 3 can be solved with Scala alone.

So far i came up with the following DSL to search for php elephants on Google and read the results

import org.rovak.scraper.ScrapeManager._

object Google {
  val results = "#res li.g h3.r a"
  def search(term: String) = {
    "http://www.google.com/search?q=" + term.replace(" ", "+")
  }
}

// Open the search results page for the query "php elephant"
scrape from Google.search("php elephant") open { implicit page =>

  // Iterate through every result link
  Google.results each { x: Element =>

    val link = x.select("a[href]").attr("abs:href").substring(28)
    if (link.isValidURL) {
      // every found link in the found page
      scrape from link each (x => println("found: " + x))
    }
  }
}

To make common tasks easier there are spiders available which search through a website, clicking on every link with the allowed domains and open the webpage for reading. A simple spider reads the entire domain and stop when it has nothing left to do.

new Spider with EmailSpider {
  startUrls ::= "http://events.stanford.edu/"
  allowedDomains ::= "events.stanford.edu"

  onEmailFound ::= { email: String =>
    // Email found
  }

  onReceivedPage ::= { page: WebPage =>
    // Page received
  }

  onLinkFound ::= { link: Href =>
    println(s"Found link ${link.url} with name ${link.name}")
  }
}.start()

The project is far from finished, but hopefully it gives you an idea of the upcoming features. The project can be found on Github

Continue Reading

19 Aug

Scalext: Ext JS module for Play Framework

I've been working on a project called Scalext for a while and want to share the current version and upcoming ideas. Scalext is a module for Play Framework 2.1 which makes working with Ext JS a lot easier.

I've already implemented a Ext JS Module for Zend Framework 2 in the past which made writing a similair module in Scala a way to compare the language with PHP. The current version of the module has most features from KJSencha implemented and took a lot less code than the PHP version.

The most notable improvements are:

  • Parallel execution of Direct API actions
  • Compiler checked annotations
  • DSL for javascript objects

Direct API

The api generator scans a project for classes which have a @Remotable annotation (in later versions a trait) configured and generates Ext JS Direct configuration which can be called in a Ext JS Project.

When the Direct API calls a javascript method then it will be routed to the Scalext module which then dispatches the request to the correct Direct action. The incoming requests are JSON objects which will be deserialized by Google GSON to Scala values. This makes working with javascript requests a lot easier. Any return values given by Direct action are again automatically serialized by Google GSON again and returned as JSON.

The combination of generating the API plus handling serialization and deserialization makes working with the Ext JS Direct API a lot more enjoyable.

Component DSL

A common issue for newcomers to Ext JS is configuring Ext JS classes. Writing classes is not a problem, but the massive amount of available options overwhelm most users. Only a few IDE's have plugins available to support Ext JS syntax, and most do not support the Ext.define('Classname', { key: value}) syntax which prevents the IDE to build a autocompletion list for available classes or properties. Ofcourse there is the Ext JS documentation which is a pleasure to work with but it may take some time to find what you need, and you have to leave the IDE. A second issue which newcomers often encounter is case-sensitive properties.

To tackle these issues in future versions of Scalext there needs to be autocompletion and syntax checking available. Both can be achieved by implementing Ext JS classes in Scala and output them as javascript files. Although this is perfect to configure class and their properties, there is still the question how can inline functions be defined? Something like the following snippet will be hard to implement:

Ext.create('Ext.Window', {
    title: 'Popup',
    html:'This is a test popup',
    buttons: [
        {
            text: 'OK',
            // How to handle inline functions?
            handler: function() {
                alert('Confirm!');
                this.up('window').close();
            }
        }
    ]
});

Hopefully Scala.js solves this problem, it is still an early version but the Reversi example shows that a simple game can already be build.

Apart from the syntax checking and autocompletion it also tackles the following problems:

  • Building highly dynamic components
  • Components which are restricted by ACL rules, unauthorized parts will never reach the client-side
  • i18n

The current version only supports Play Framework 2.1, upcoming versions will depend on other modules which can easily be used in different frameworks.

More info can be found by checking out Scalext on Github or running the Example Application

Continue Reading

21 Jan

Managing assets in Zend Framework 2

When you are developing a module for Zend Framework 2 in combination with assets you will most likely run into the question on how to serve the files to your public folder.

There are quite a few methods to do this, i will explain the 3 most commonly used methods.

The examples assume that you are using the default module structure as demonstrated by the Zend Skeleton application, and that you are familiar with writing simple ZF 2 applications.

Default module structure

ModuleName/
    config/
        module.config.php
    public/
        css/
        images/
        js/
    src/
        ModuleName/
            <code>/
    view/
    Module.php

Method 1: Symlink

Creating symlinks is the most easy way to make sure that your files are available in the public folder, the downside is that you have to create these symlinks yourself and you have to manage them while deploying your application. Some IDE's also have trouble with the duplicate files that show up when you symlink them in your project.

Method 2: Rewriting

If you are using Apache then you can use AliasMatch to rewrite the requests to your module folders. Using the following rule which you can add to your Virtual Host configuration (this does not work in a .htaccess file) you can rewrite http://server/folder/module/file locally to /modulepath/modulename/public/folder/file.

AliasMatch /(css|js|images)/([^/]+)/(.*) /path/to/module/$2/public/$1/$3

While this method is very flexible and does not require any configuration when adding new modules it does have a few downsides. First it assumes that all the modules are placed in the same folder, which is not the case when you are using Composer. In the example only css, js and images are rewritten, when you need more folders then you have to add them manually. It also requires configuring your Virtual Host, something that may not be used when you are working on a dev environment.

Method 3: Asset Manager

My preferred way to serve files is using the AssetManager Module. It has a lot of functionality to filter, minify and optimize your files. It is based on Assetic which contains a lot more options.

What this module does is listen to requests that are being sent to the server, when the request is not being handled by a valid route or controller then it will look through the modules for a public file that matches the request. If it found any file then it will be served through the PHP request.

To get started you can install the module using Composer:

./composer.phar require rwoverdijk/assetmanager

After that you can configure your module to serve public files through AssetManager. First you need tell where the files can be found, to do this add the following configuration to your module.config.php:

 array(
        'resolver_configs' => array(
            'paths' => array(
                'module_name' => __DIR__ . '/../public/',
            )
        )
    )
);

Now AssetManager will look in the /module/public/ folder when it needs to find a file. You can verify if it works by requesting a file through the public folder of your application.

While this module makes it really easy to handle your assets it has the downside that files are being served through a PHP call. This creates a lot of overhead especially in a production environment and slows down the loading times of your website.

The module provides a few options to counter this problem by caching the files, or writing files to your public folder. After a request the caching layer will write the result of the file to the same location in your public folder so the second time it will not have to pass through PHP.

In order to do this you can add the following configuration which enables caching:

 array(
        'caching' => array(
            'default' => array(
                'cache'     => 'FilePath',
                'options' => array(
                    'dir' => 'public',
                )
            )
        )
    )
);

In this example i use default which means it caches every file that is being requested, if you want to specify the files that are being cached then replace default with the path to your asset.

To take this even further you can minify every javascript and css file that is being requested.

 array(
        'filters' => array(
            // Filter by MIME typ
            'application/javascript' => array(
                array(
                    'filter' => 'JSMin',
                ),
            ),
            // Filter by extension
            'js' => array(
                array(
                    'filter' => 'JSMin',
                ),
            ),
        ),
  ),
);

You have to provide the filter service yourself which is really to implement, in this example i am using JSMin

setContent(JSMin::minify($asset->getContent()));
    }
}

And then add the filter to the AssetManager configuration:

 array(
        'filters' => array(
            'application/javascript' => array(
                array(
                    'filter' => 'Application\Service\JSMinFilter', // FQCN
                ),
            ),
        ),
    ),
);

This is only a small demonstration of all the possibilities that the module has to offer, there are a lot of filters available in Assetic which optimize your images, compresses your files, parse SASS, LESS etc and many more.

Take a look at the available filters and read the wiki to get started with AssetManager in your project!

Continue Reading

16 Dec

Inline Attachment 1.0

Github recently added a new feature in which you can add images inside a ordinary textarea by simply dragging an image inside it, or pasting (only in chrome) an image. While i was working on the markdown editor for Socialog i needed a similar way to add images to a blog post. Therefore i wrote a javascript plugin which does the exact same thing. The result is a plugin which can be used with jQuery, CodeMirror or standalone, so you can easily use it in your project.

During this project i've gained experience with Grunt.js, structuring a jQuery plugin and the HTML 5 File API which i will briefly describe.

While writing out some quick requirements for the plugin i quickly noticed that there had to be a way to build my javascript files into 3 different versions. The standalone version should be compiled inside the jQuery and CodeMirror version without having to copy the code manually, and ideally there should also be a minified production version.

Meet Grunt.js, a JS command line build tool

Grunt.js has proven itself in the jQuery project and became my weapon of choice to combine and minify the required files.

Like any other build tool you need a script which tells it what to do, create a grunt.js in your project directory and setup some tasks. Here are the tasks which i use in my plugin, for simplicity i removed the codemirror version.

grunt.initConfig({

    // Grunt can read json files, here we load the package.json values inside the pkg property so it can be used inside the banner
    pkg: grunt.file.readJSON('package.json'),

    meta: {
        banner: '/*!  - v - ' +
            ' */'
    },

    // Concat action provided by UglifyJS, it automatically runs every child object
    concat: {
        normal: {
            src: ['', 'src/inline-attach.js' ],
            dest: 'dist/inline-attach.js'
        },
        jquery: {
            src: ['', 'src/inline-attach.js', 'src/jquery.inline-attach.js' ],
            dest: 'dist/jquery.inline-attach.js'
        }
    },

    // Minify action provided by UglifyJS, automatically runs every child object
    min: {
        normal: {
            src: [ '', 'dist/inline-attach.js' ],
            dest: 'dist/inline-attach.min.js',
            separator: ';'
        },
        jquery: {
            src: [ '', 'dist/jquery.inline-attach.js' ],
            dest: 'dist/jquery.inline-attach.min.js',
            separator: ';'
        }
    }
});

// Load the plugin that provides the "concat" and "min" tasks
grunt.loadNpmTasks('grunt-contrib-uglify');

// Register the default task
grunt.registerTask('default', ['concat', 'min']);

The config is defined first, it contains data like the pkg variable which loads package.json so it can be used in for example the banner. The meta.banner property contains the string which can be placed above every generated file. It can be called with the `` tag.

Secondly the tasks are loaded using grunt.loadNpmTasks('grunt-contrib-uglify')

And last the default task is defined when grunt is called without arguments, if you are familiar with Apache Ant then the equivelant is:

When everything is done you can run grunt and it will generate everything in the dist/ folder, that saves a lot of time!

jQuery plugin

Writing your code as a jQuery plugin saves you a lot of time when reusing the plugin and makes it easy for other developers to integrate your code into their own projects.

Setting up the structure is really easy and a tutorial can be found at the jQuery plugin tutorial

HTML 5 File API

Handling files with the latest HTML 5 File API provides you with a lot of new options to handle files, like handling files which are dragged from the desktop, listening to a paste event inside a textbox and showing a preview of the just pasted image, or resizing a picture before uploading it to save time and data.

The end result is an easy image attachment system in which you can paste or drop images inside a textbox and include an image without a WYSIWYG editor.

The project can be found at Github

Continue Reading

09 Nov

Razko.nl

Welcome on my new blog, here i will post my thoughts on web development, personal projects and code that i'm working on.

There isn't much to see right now as i am working on some upcoming features like the code sandbox, portfolio and Github integration.

Continue Reading