Sclera 3.0

An online preview of Sclera 3.0 is available at http://www.scleradb.com/shell. Examples showcasing data access, cleaning, transformation and visualization appear at http://www.scleradb.com/scleraviz.

Sclera Visual Shell

The Sclera Visual Shell is an alternative to the regular command-line shell. It can display the query results as tables as well as plots.

The visual shell can be previewed at http://www.scleradb.com/shell. For further usage details, please see the documentation.

Sclera Visualization

Programmable Visualization - ScleraViz

Sclera’s visualization component, ScleraViz, enables quick and easy visualization of your query results. ScleraViz is integrated with ScleraSQL; this means a few lines of ScleraSQL can fetch, clean, analyze and visualize your data in a single sweep.

ScleraViz is inspired by Grammar of Graphics, specifically R’s ggplot2 – but is implemented as an extension to ScleraSQL and uses D3 as the rendering engine.

ScleraViz brings the expressibility of ggplot2 and the power of D3 to SQL users. Unlike ggplot2, ScleraViz can clean, analyze and plot streaming data. Also, unlike visualization implemented in D3, Sclera pushes expensive computations to the backend database servers, keeping the rendering lean and efficient.

An online preview of ScleraViz is available at http://www.scleradb.com/scleraviz. For further details, see the extensive documentation.

Sclera Visualization Sclera Visualization

Innovative Data Cleaning and Analytics Operators

Sclera 3.0 introduces a number of innovative operators for data preparation/cleaning and analytics.

Type Inference Operator

Data sourced from web-services typically does not include the data types – everything from numbers to dates are available as strings. These string values need to be cast to appropriate data types before they can be used in further computations.

When the data types are known, the type-casting is can be hardcoded; but this is dangerous when the data format from an external service changes without notice.

When the data types are not known, as in the case of ad-hoc data access, the data types need to be inferred manually. This is not only error-prone, it also does not scale to data sets with large number of columns.

Sclera’s TYPEINFER operator automates the type inference and casting. It intelligently infers the type of the specified columns from the data (you can optionally provide hints on how many rows are enough), and casts the string values accordingly.

For further details, see the documentation. You can also see it in action here.

Text Parsing Operator

The PARSE operator allows you to extract information from arbitrary strings (CHAR/VARCHAR) in an input column. This is done by specifying a regular expression pattern that is compiled and matched against the input strings to extract substrings.

This is very useful when parsing arbitrary date strings on the fly – see it in action here.

For further details, see the documentation.

Data Imputation Operator

Sclera supports a new clause, IMPUTED WITH, that enables automatic data cleaning using classifiers. The idea is to train a classifier on clean data, and then apply this trained classifier on the incoming data to fill-in values on the fly.

For further details, see the documentation.

Sequence Alignment Operator

The ALIGN operator aligns two sequences using dynamic time warping. Think of this as JOIN where the rows of one input match the rows of the other input – but the matching is not based on a column value; instead, the rows are matched so as to minimize the total of the “distance” between the matched rows.

For further details, see the documentation.

Inbuilt Connection Pooling

Sclera 3.0 natively enables connection pooling on connections to the underlying relational data sources. The connection pools use the high-performance HikariCP library.

Connector to Heroku PostgreSQL

The Heroku database connector helps you access the data stored in a Heroku Postgres database.

Details on how to link your Heroku Postgres source to with Sclera can be found in the Sclera Database System Connection Reference document.

Previous release: Sclera 2.2

JDBC/ODBC Server

Sclera 2.2 can work in a server mode. You can now start a server in the background, and through this server, Sclera can interface with the latest PostgreSQL ODBC and JDBC drivers, PostgreSQL’s shell (psql), and anything else that uses PostgreSQL’s native protocol (libpq).

This means that you can now connect to Sclera using Microsoft Excel, Tableau, QlikView, or any other tool of your choice. Configure the tools as you would to connect to PostgreSQL, but point the connection to the Sclera server instead.

Please see the documentation for additional details.

User and Password Management

Starting with Sclera 2.2, the shell can be used to create and remove users and manage passwords.

Please see the documentation for additional details.

Previous release: Sclera 2.1

Analytics on Ordered Data Streams

Sclera now provides highly scalable operators that enable complex pattern matching and analytics across rows – all in real-time, with zero I/O and minimal memory requirements.

Embedded Streaming Data SQL Processor

Sclera 2.1 includes an embedded SQL processing engine. This engine can evaluate most standard SQL operators in a single pass. New operators include a single-pass operator to select rows with the optimal (maximum or minimum) value of an expression, and a single-pass PIVOT.

Enhanced Sclera Extensions SDK

Sclera Extensions SDK now includes support for external data sources (includes streaming data), database systems, and machine learning and text analytics libraries. A number of nontrivial sample extensions have been added to the Sclera Extensions repository on Github.

Friendlier SQL

SQL had been criticized for its tedious syntax, and Sclera is working to make SQL modern and easier to use. Sclera 2.1 makes Sclera’s SQL even friendlier, making the use of the boilerplate “SELECT *” optional in all cases, and also baking in novel constructs for common tasks such as selecting rows with the optimal (maximum or minimum) value of an expression.

Support for Weka 3.7.11

Sclera now supports Weka 3.7.11, which includes the HoeffdingTree classifier. This classifier requires a single pass to train a decision tree, and can be applied to large volumes of data, including when the training data is arriving in a stream.

Scala Library Upgrade to Scala 2.11.1

Sclera now works on Scala 2.11.1, and takes advantage of many of the associated performance improvements.