Pragmatic Works Nerd News

Overview of HDInsight HBase

Written by Chris Seferlis | Jun 27, 2018

In continuation of my series on HDInsight and the different clusters within it, today I'll cover HBase. HBase is a NoSQL database that provides random access and strong consistency for structured, unstructured and semi-structured data.

It’s a schema-less (or organized by families of columns) database. Another way to describe it is it’s sort of modeled after Google’s Bigtable, where data is stored in the rows of a table and then grouped by a column family. As it’s schema-less, neither the columns themselves or the data types inside of the columns need to be defined before using the data.

Some other key things to be aware of with HBase:

  • As with all the HDInsight components, this get implemented as a managed cluster and a Platform as a Service offering in which we can separate compute nodes from storage.
  • It has a scale out architecture that helps provide automatic sharding or horizontal partitioning of tables, where essentially rows of a table are held separately rather than splitting those columns as we would in a typical table normalization.
  • Strong consistency for read and write as it’s part of the architecture of HBase.
  • Automatic failover built in, so you have multiple clusters that you can failover to multiple nodes.
  • In-memory caching for reads and writes, which helps with performance, as well as moving your data in and out quicker.

Some of the most common workloads:

    • A search engine like I mentioned with Google’s Bigtable, which builds indexes that map terms to webpages that contain them.
    • A key value store. Facebook uses HBase for their messaging system because it’s ideal for storing and managing internet communications.
    • Also, a good repository for collecting sensor data, so where large amounts of data are being pulled into this NoSQL Table and it can be used to build dashboards for reporting.

I still have a few HDInsight technologies to cover in this series. Many of these are interrelated and work together to complete and update data architecture. At Pragmatic Works, we are doing a lot of big data work in many different scenarios with customers and this is another option for us to service with.

If you have questions about any of the HDInsight clusters I’ve talked about, HBase, Hadoop, Spark or about anything Azure related, we’re here to help. Click the link below or contact us—we are your best resource.