Skip to main content

Row Count : HBase Aggregation example

With the coprocessors HBase 0.92 introduces a new way to process data directly on a region server. As a user this is definitively a very exciting feature : now you can easily define your own distributed data services.

This post is not intended to help you how to define them (i highly recommend you to watch this presentation if you want to do so) but to quickly presents the new aggregation service shipped with HBase 0.92 that is built upon the endpoint coprocessor framework.

1. Enable AggregationClient coprocessor

You have two choices :

You can enable aggregation coprocessor on all your tables by adding the following lines to hbase-site.xml :
 <property>
   <name>hbase.coprocessor.user.region.classes</name>
   <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
 </property>
or ...you can enable coprocessor only on a table throught the HBase shell :

1. disable the table
hbase> disable 'mytable'

2. add the coprocessor
hbase> alter 'mytable', METHOD => 'table_att','coprocessor'=>'|org.apache.hadoop.hbase.coprocessor.AggregateImplementation||'

3. re-enable the table

hbase> enable 'mytable'


2. Increase RPC timeout (opt)
In some cases the following exception could be raised :
java.net.SocketTimeoutException: Call to node2/55.37.68.154:60020 failed on socket timeout exception
On large table or if you are expecting slow I/O this might be necessary:
 <property>
   <name>hbase.rpc.timeout</name>
   <value>300000</value>
 </property>
n.b. : you can also specify this property directly in the client source code, see next section
3. ''RowCount' Snippet
The following snippet uses the "row count" feature of the aggregation service.

public class MyAggregationClient {

    private static final byte[] TABLE_NAME = Bytes.toBytes("mytable");
    private static final byte[] CF = Bytes.toBytes("d");

    public static void main(String[] args) throws Throwable {

        Configuration customConf = new Configuration();
        customConf.setStrings("hbase.zookeeper.quorum",
                "node0,node1,node2");

        // Increase RPC timeout, in case of a slow computation
        customConf.setLong("hbase.rpc.timeout", 600000);
        // Default is 1, set to a higher value for faster scanner.next(..)
        customConf.setLong("hbase.client.scanner.caching", 1000);
        Configuration configuration = HBaseConfiguration.create(customConf);
        AggregationClient aggregationClient = new AggregationClient(
                configuration);
        Scan scan = new Scan();
        scan.addFamily(CF);
        long rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
        System.out.println("row count is " + rowCount);

    }
}

 
4. What else ?

The full list of functions provided by the aggregation client can be found here : http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.html

See also https://issues.apache.org/jira/browse/HBASE-5123 for discussion about forthcoming functions.

Note that this version is designed to work on data whose type is Long. If you plan to use other data types you should provide your own column interpreter. See https://issues.apache.org/jira/browse/HBASE-5122

Credits : thanks to Gary Helmling for his contribution on the HBase mailing list

Comments

Popular posts from this blog

Orientée colonnes ?

Les bases NoSQL sont arrivées avec leur cortège de nouveautés et pour certaines d'entre elles une notion héritée de BigTable : celle de base de donnée orientée colonne. Cependant faire le lien entre l'article de Wikipedia et comprendre ce que permet réellement un base de donnée comme HBase n'est pas une chose évidente. En effet le simple fait de définir cette notion ne suffit pas toujours a bien comprendre quels sont les principes de conception du monde SQL qui peuvent être oubliés et ceux qui doivent être appris. Colonne or not colonne ? Prenons un modèle très simple de donnée et essayons de le transposer dans un modèle "orienté colonne": Comme on peut le voir on est passé d'un modèle à 2 dimensions (ligne x colonne) vers un modèle où une valeur est accédée au travers de 2  coordonnées qui sont ici (ligne, colonne) Cette notion de coordonnées est  importante  (c'est pour ça que je la met en gras 2 fois de suite) si l'on veut c

Zookeeper, Netflix Curator and ACLs

If you have one or more Zookeeper "multi-tenant" clusters you may want to protect znodes against unwanted modifications. Here is a very simple and short introduction to the ACL and custom authentication features. This post is not intended to give you best practices about security and Zookeeper, the only goal is to give you a complete example of a custom authentication handler. Complete source code with JUnit test is available here : https://github.com/barkbay/zookeeper-acl-sample/ Use case Let say that your Zookeeper cluster is used by several users. In order to restrict user actions you have decided that each user must prefix all paths with the first letter of his name. User foo is only allowed to create, read, delete and update znodes under the /f znode. User bar is only allowed to create, read, delete and update znodes under the /b znode. Get client authentication data on the server side Zookeeper client authentication can be easily customized , a