So what I supposed to make, was a Hadoop based platform for preparing, processing and dynamic warehousing Terabyte size text files and making them a relational shape at the end.
I knew Hadoop 2.2.0 rocks. So I just decided to use Hadoop 2.2.0 regarding it’s big improvements on the YARN flexible computation model and advanced recovery techniques.
After preparing Hadoop I just deployed HBase as a big data world columnar database. I was going to make Hive up over HBase. I just made it but found it soon that Hive works much faster when it runs without HBase. So I just ignored HBase in this case.
The UI developer was very interested to talk to the service in JSON/REST. I have a respect for REST either. So I just developed a REST client to connect to HiveServer2 client which talks the HiveServer2 on a node of the cluster. It works like a charm. I think spring boot made it much easier than ever to make services based on Microservice architecture.
This little middle server would be a good place for implementing business and security policies.
The Apache Sqoop also is a part of the platform. Fortunately the sqoop client provides a cool REST API. So I just wrapped it by my Spring app.
I’ve developed a number of Linux bash scripts for preparing data. The spring app wraps them either within it’s REST API.
What we have now is a very scalable automated dynamic data warehouse platform based on Hadoop 2.2.0 which can be used by the front-end tools as simple as an RDBMS.
Running the app as a service just by giving it a call through terminal is pretty cool. No Tomcat, no JSP container and no configuration. All you need is just to say it ‘run’.