When NoSQL isn't enough, but SQL is too much
Sometimes we need not only the performance of NoSQL transactions, but the analytical power of SQL engines; how can we get the best of both worlds?
Charles works at YottaDB, a free/open source database startup with a rich heritage of unique database designs, as a technology guru; all to say, he spends his days writing code, working with new technology, and performing minor feats of software black magic. Prior to YottaDB, Charles spent a great deal of time studying the complexity of software engineering at Rensselaer Polytechnic Institute, with a particular interest in gaining insights to the impact software complexity has on education and contributions diversity.
Recent years have seen a massive rise in the prevalence of NoSQL technologies, despite the yelling and screaming of academics. Much of the popularity of these systems stems from the performance gains they have over SQL due their ability to directly manipulate data, without needing to go through a SQL engine. This is particularly important for ACID transaction processing: where a SQL engine would need complex logic to ensure that no touched rows are updated during the transaction, a simpler NoSQL engine can do a transaction with more ease. However, academics warn that in the end, we will need SQL to get the complex-query performance and flexibility required for any meaningful analytic on the data we store in these databases. But since when do academics get things right?
As it turns out, they got lucky this time. Users of these NoSQL engines have discovered that, although not needed for performance critical operations such as transaction processing, SQL is needed to perform meaningful analytic using much of the tooling available. In response to this new demand, many NoSQL engines have started adding support for SQL queries. However, implementing these SQL engines can be quite a task, especially as one attempts to generate code to fetch information from these data stores that is not only correct, but also performant. For many implementations, we hit difficulties when we examine more interesting SQL features, such as outer joins, sub queries, and set operations. Of course, the task of writing a query optimizer is a research area in and of, itself; decades of research has gone into the topic, and it is still alive and well. How can these new systems and developers utilize these expansive databases of research to make things run better?
YottaDB (https://yottadb.com/) is an free/open-source NoSQL data store with full support for transaction processing, whose codebase has long been used for mission-critical applications in banking and healthcare. It stores data in a hierarchical fashion, delivering blazing performance for simple (such as setting a value, or verifying that a key does or doesn't exist) and complex (such as ACID transactions across many tables) operations by providing primitives to iterate over the hierarchy and features to enable transaction processing.
This presentation discusses the process of implementing the SQL engine for YottaDB - an engine which provides a complete SQL '92 SELECT implementation, along with numerous optimizations to give exceptional analytical performance, in addition to the performance benefits we see from the YottaDB NoSQL engine. We discuss the pipeline that a SQL query goes through, from being parsed to rendered as machine-executable code. Time permitting, a deep delve into some of the optimizations the engine does will provide insights not only into YottaDB, but also into performance constraints of all existing SQL implementations.
- 45 min
- LinuxFest Northwest 2019