If you caught our recent tech post, The Bleeding Edge, Arnon Rotem-Gal-Oz described how we process and store events to Amazon S3 using the Spark and Parquet format (if you didn’t read it yet, it might be a good time to do so before you proceed to reading this article).
As you might imagine, even though the Parquet format is optimized for access, which makes data processing by machines much easier, it remains completely unreadable by humans. Moreover, trying to find a specific kind of event for a given timeframe in terabytes of data is unfeasible. What if we need to investigate a support case by looking at the historical data for up to 3 month back?
The Solution
SQL is a well adopted language, so we’ve thought that providing an interface where support engineers would write a query, which selects events with all the required fields and filters them by date and/or customer related information would be great.
This is the simplest interface that we came up with:

On the right side we have all the available topics that can be observed. On the left side, there’s a SQL editor, where user writes his query, and clicks on the “Run” button. Once the query finishes running (flying, actually, if you could see a little blinking Mig-23 icon in the query status), a download link that points to a TSV (Tab Separated Values) file containing the result is generated.
Behind the Scenes
On the backend we have Livy, which provides REST API for submitting jobs to Spark cluster and montoring their statuses. The only downside is that at the moment of writing this, Livy only supported jobs defined in Scala, Java or Python language. So, as a workaround, we kindly wrap SQL query in a Scala code, like this:
The service itself is written in Clojure, the endearing programming language used here at AppsFlyer.
On the frontend:
- Bootstrap & jQuery (the best combination for simple applications like this)
- Moment.js
- CodeMirror (for SQL syntax highlighting and code completion)
Access to Mojito goes through the Bouncer, which performs all the authentication and authorization tasks.
Future Developments
SQL interface for raw data stored on S3 as Parquet files is a good start. In the future, we’d like to make this tool a standard interface to data stored in other locations, like Redshift. As long it’s easily extendable this is not a problem at all.
The below diagram describes the way Mojito works at AppsFlyer:

Summary
To summarize, Mojito plays a very important role at AppsFlyer. Not only it provides a unified interface for accessing events history, but in conjunction with the Bouncer, it serves as a guard preventing unauthorized access to the data.
By the way, and did I mention that this tool is open source now? Please try it, and send us your feedback.
The post Meet Mojito: Our New (Not Just) Support Tool appeared first on AppsFlyer.