What is Sclera?◄
Sclera is a stand-alone SQL processor with native support for machine learning, data virtualization and streaming data. Sclera can be deployed as an independent application, or can embed within your applications to enable advanced real-time analytics capabilities.
I am a BI professional. Why do I need Sclera?◄
As a BI professional, you are conversant with SQL, and understand the need to move to advanced analytics. But so far, exploring advanced analytics has meant sparing valuable time learning the myriad APIs, and getting down to coding in Java, R, Python, or whatever it takes.
For you, Sclera is the most efficient way to build analytics applications. You only need to learn a handful of SQL extensions to start exploiting the power of sophisticated libraries such as Weka, incorporate external data from web-services, and perform complex event processing and stream analytics, and more -- all using familiar SQL. Moreover, it works on your existing database systems, with no need for any hardware setup and no need to move your data.
I am an analytics consultant. Why do I need Sclera?◄
As an analytics consultant, you develop analytics solutions for your clients.
You understand that a project is never fully specified. So, it is crucial for your solutions to be agile, and be able to incorporate incremental changes as quickly as possible.
Sclera gives you a modular architecture out of the box, providing exceptional agility to your solutions.
Specifically, Sclera separates the analytics logic from the processing and data access. The analytics logic is specified declaratively as SQL queries with Sclera's analytics extensions. This is just a few lines of code, which can be changed easily. The analytics libraries, database systems and external data sources form their own modules and are separated from the analytics logic. The analytics queries are compiled by Sclera into optimized workflows that dynamically tie everything together.
Sclera thus provides a highly modular and customizable end-to-end stack for analytics, and enables you to experiment and iterate at the speed of thought.
Even after deployment of the solution, Sclera's modular architecture significantly reduces the maintenance complexity and simplifies upgrades. For instance, new technologies (database systems, analytics libraries) can be incorporated by just adding appropriate drivers, with minimal change to the application.
I am a data scientist. Why do I need Sclera?◄
As a data scientist, you are an expert in machine learning. You know the underlying mathematics, and your task is to develop algorithms to gain insights.
However, to get access and experiment on real-life data, you need to get into implementation. You need to combine data from multiple sources. Occasionally, you find that you need to understand nontrivial APIs to figure how to perform your computations -- and your bookshelf is full of "In Action" and "Definitive Guides" that you need time to read. You spend more time poring over system logs and complex undocumented open-source code than working on your algorithms. This makes experimentation hard, distracts and slows you down.
Sclera gives you a standardized interface to run your algorithms within SQL, and a clean API (in the Sclera's Extensions SDK) that you can use to integrate your algorithms, in just a few lines of code. You can now quickly experiment, iterate and thus focus on your analytics tasks, while Sclera takes care of the legwork.
What is the productivity impact of using Sclera?◄
Sclera simplifies the path from idea to insights. Using Sclera saves the need for deep systems skills, months of resources, and hundreds of lines of code in building analytics applications, the accompanying test and maintenance, and so on.
This simplicity enables you to quickly experiment and iterate over alternatives, and focus on your analytics tasks without bothering about the complex implementation details.
What is the performance impact of using Sclera?◄
Sclera's SQL engine has three components: the query compiler, the embedded streaming SQL processor, and the embedded analytics evaluator.
The query compiler compiles the input query into a plan -- this happens once per query, before the evaluation, and the compilation time is negligible as compared to the evaluation time. If the entire query gets pushed to an underlying database system, the cost is thus effectively zero.
The embedded streaming SQL processor is used to evaluate SQL (relational) operators on streaming data, or on intermediate results to avoid materialization. This evaluation proceeds in a pipeline, in a single pass, with minimal memory overheads, and in the same JVM as your application.
The embedded analytics evaluator, likewise, proceeds in a pipeline, in a single pass, and in the same JVM as your application. A handcoded Java program would have identical overheads when it uses the same library.
Sclera also includes a query optimizer that optimizes the workflow before each run. These optimizations can potentially speed up the evaluation in ways that a handcoded Java application cannot.
We are working on performance benchmarks, and will share the results soon.
How fast is Sclera?◄
Sclera aggressively pushes down query computations to the underlying database systems, and uses external analytics libraries for analytics evaluation where needed. Thus, the performance of Sclera is thus determined by the performance of the underlying database systems and analytics libraries.
This means that you can start with your existing infrastructure, identify the bottlenecks, and add more resources intelligently as and when needed -- all without modifying your applications.
How is Sclera different from a database system?◄
To the user, Sclera is just like a relational database system -- with SQL as the interface language, and JDBC as the access mechanism from the application programs. However:
- Sclera does not store data. It works on data from the connected database systems and/or external data sources (on-disk file, web-service, etc.) specified in your query. Sclera queries can work on data across multiple database systems and external data sources.
- Sclera natively supports analytics. Analytics operations (such as classification) are provided as SQL language extensions, and analytics objects (such as classifiers) as first class-objects, at par with SQL tables.
- Sclera includes an embedded SQL processor, but also pushes SQL computation to an underlying database systems wherever possible. Sclera's optimizer understands the capabilities of the underlying database systems and the data stored therein, and intelligently decides where to locate the computations.
Does Sclera replace my database system?◄
No, Sclera complements your database systems. Sclera works with your database systems, and extends their capability to perform advanced analytics.
How is Sclera different from Apache Drill?◄
Apache Drill is a data virtualization solution, enabling standard SQL on Hadoop distributions, NoSQL datastores, cloud storage and local files with a variety of data formats. For data virtualization, thus, it is more versatile than Sclera.
Unlike Sclera, however, Apache Drill does not provide the ability to plug in your own data processing extensions. Supporting standard SQL, it also does not provide stream pattern matching, machine learning, text analytics, data cleaning, visualization, and other capabilities that are baked into Sclera.
In the near term, we plan to provide a plugin based on Apache Drill that brings its extensive virtualization capabilities to Sclera.
How does Sclera compare with R?◄
R is a powerful programming language. The main advantage R brings over other languages is its extensive set of modules, developed over the years by statisticians, and its excellent charting capabilities.
At the same time, since R is a low-level programming language, working with R needs programming skills. Especially when you need to move beyond using the provided modules in R. You also need to be conversant with dataframes, factors and such -- which are unique to the statistician's world view. Also, R's execution engine was not designed handle large volumes of data.
Further, it is hard to efficiently integrate R with the rest of your eco-system. Working with R thus needs an independent setup -- this adds to the number of processing and data silos.
Sclera does not claim to provide the functionality in the hundreds of R modules -- but its own set of analytics extensions should provide most of the capabilities you need, and are well-integrated with your existing ecosystem.
How does Sclera Visualization compare with ggplot2 and D3?◄
ggplot2 is a graphics library for R.
ggplot2 is inspired by the Grammar of Graphics, which makes it very expressible and powerful. However,
ggplot2 can only be used to generate static graphs -- this means no interactivity and no support for streaming data. Also, since it is a part of the R ecosystem, you need to be a proficient R programmer and will need to go through some hoops to get it working with the rest of your ecosystem.
ScleraViz brings the expressibility of
ggplot2 and the power of D3 to SQL users. Unlike
ggplot2, ScleraViz can clean, analyze and plot streaming data. Also, unlike visualization implemented in D3, Sclera pushes expensive computations to the backend database servers, keeping the rendering lean and efficient.
Can Sclera work with my database in the cloud?◄
Yes. Sclera works with any database system that can be accessed with an API. You just need a connector to interface with the system.
Google Cloud SQL is compatible with MySQL, and Amazon RDS provides MySQL, PostgreSQL and Oracle instances in the cloud. These database systems can be accessed using the relevant included database connector (sclera-mysql, sclera-postgresql or sclera-oracle), simply by putting the appropriate JDBC URL in the ADD LOCATION statements.
Can Sclera work with web-services?◄
Sclera provides a Sclera Extensions SDK that enables ingestion of data from of any data source into Sclera. A specific connector for the Google Finance web-service is included as an illustrative example. The code for the same can be accessed at GitHub.
Can Sclera work with my legacy data store?◄
The Sclera Extensions SDK for external data access or the Sclera Extension SDK for database systems can be used to build custom connectors to your legacy data store. The former is used when you just want to source data from the data sore, and the latter when you want to push computation (such as filter, join) to the data store.
Can Sclera work with my reporting software?◄
Sclera understands a large subset of PostgreSQL's dialect of SQL. Therefore, Sclera should work with any reporting tool that works with PostgreSQL.
In such tools, Sclera's JDBC driver, downloaded as a part of the installation, can be used as a drop-in replacement for PostgreSQL JDBC driver. The details on the usage of the driver appear in the Sclera JDBC reference document.
Alternatively, Sclera can work in a server mode that implements the PostgreSQL backend protocol 3.0, which is compatible with PostgreSQL 7.4+.
How do I use Sclera's analytics operators in my reporting software?◄
Connected to Sclera, your reporting software can be used to query data across multiple underlying data sources. However, since these tools do not understand Sclera's extensions, they cannot generate queries with embedded analytics.
A simple workaround is to create views in Sclera -- the view definition can contain arbitrary Sclera extensions, but to the external tool, they are equivalent to a relational table. Any query on such a view generated by the tool will evaluate the analytics operators included in the view definition.
Can Sclera work with an analytics library of my choice?◄
Yes, but the support is currenly limited to classification, clustering and association rule mining. You will need to map the classification, clustering and/or association rule API in the Sclera Extensions SDK to your library's API.
How do I ingest data streams into Sclera?◄
Sclera provides a very simple API though the Sclera Extensions SDK. A connector built using this API can be used to ingest data streams -- these data streams can then be used in the
FROM clause of SQL queries.
What do I need to know before using Sclera?◄
If you know SQL, you can start firing cross-system queries rightaway. Then, incorporate analytics using the analytics extensions we have baked into SQL.
Sclera enables use of advanced analytics constructs such as classification and clustering within SQL, but assumes you know what they are useful for. For the background, we recommend reading up a good book on data science.