Software is Capturing Us

Seems things in the Software Development business are really getting easy ever. Sometimes minor modifications make major improvements. Today Software Development contains a bunch of pretty interesting languages, frameworks and tools. Human had never ever a toy set like this. We are surrounded by many mature popular concepts. Everyone recognizes a good software while she cant explain it. People use apps many times a day. With a little ignorance we can say, software development is the most available joyful engineering activity all around the world. I remember the time I had to learn what an XML file is. It was something new and was going to change things. While now people are getting used to it without any effort. They just know by heart what markup languages are. Seems children learn software concepts beside learning the mother language. Last year when I was leaving my country I just brought a touch screen notebook for my mom. I wanted to prepare things for having Skype talks. Now she has got a number of accounts for emails, facebook, chats and twitter! Seems software is be coming a part of human race.

It was the open-source philosophy that opened the software knowledge gates. I think open-sourcing is the most humanized and modern sharing all around the world. It is not about sharing news or stories such as what Wikipedia or youtube does. It is about opening minds. Something like a cloud of developer’s brains. I think in the way of improvement, Machine Learning is a very important milestone we are passing right now. Now software applications are able to pretend that they are alive. They are becoming parts of us. Wearable technology is just a start point. Later, maybe drinkable technology!

Posted in Big Data, Linux, Mac OS X, Machine Learning, Open Source, Software Engineering, Software Market Demands, Web | Leave a comment

A Simple Spring Boot REST Wrapper over Hadoop and Hive

Running REST services in the Java world as much easy as Linux commands, has became possible by Spring Boot. I am making a data warehouse based on hadoop, sqoop, hive, hbase and a number of other big-data giants. It supposed to be used by a data analyst who doesn’t know these kind of stuffs at all. So I just wrapped them all by making a Java application, based on Spring Boot. She would be able to call underlying methods just by calling it’s simple REST API or through a JavaScript front-end.

During making the app I had to resolve a few number of JAR/Versions conflicts. The following is a well working pom.xml:

Look at those exceptions.

The following is a simple controller which loads a table from HDFS into the Hive:

Load into HDFS...

So I just developed the following controllers and services so far:

overview

This is pretty easy to wrap more methods by REST controllers whenever we need.
For running something like this, it is enough to give the terminal a command like this:

java -jar mint-0.1.0.jar

Providing these kind of Microservice architecture services, is highly scalable, manageable and maintainable. Spring Boot is really what makes a Java developer’s life comfortable.

Posted in Big Data, Java, Linux, Mac OS X, Open Source, Software Engineering, Web | Leave a comment

Wrapping a Big Data warehouse platform by Spring/REST

Three months ago, my client just asked me to provide them a Hadoop based data warehouse platform. They had the experience of OLAP things and RDBMSs. Meantime they were working on a JavaScript based data visualizer.  The notified that they have to migrate to the big-data world.

So what I supposed to make, was a Hadoop based platform for preparing, processing and dynamic warehousing Terabyte size text files and making them a relational shape at the end.

I knew Hadoop 2.2.0 rocks. So I just decided to use Hadoop 2.2.0 regarding it’s big improvements on the YARN flexible computation model and advanced recovery techniques.

After preparing Hadoop I just deployed HBase as a big data world columnar database. I was going to make Hive up over HBase. I just made it but found it soon that Hive works much faster when it runs without HBase. So I just ignored HBase in this case.

The UI developer was very interested to talk to the service in JSON/REST. I have a respect for REST either. So I just developed a REST client to connect to HiveServer2 client which talks the HiveServer2 on a node of the cluster. It works like a charm. I think spring boot made it much easier than ever to make services based on Microservice architecture.

This little middle server would be a good place for implementing business and security policies.

The Apache Sqoop also is a part of the platform. Fortunately the sqoop client provides a cool REST API. So I just wrapped it by my Spring app.

I’ve developed a number of Linux bash scripts for preparing data. The spring app wraps them either within it’s REST API.

What we have now is a very scalable automated dynamic data warehouse platform based on Hadoop 2.2.0 which can be used by the front-end tools as simple as an RDBMS.

Running the app as a service just by giving it a call through terminal is pretty cool. No Tomcat, no JSP container and no configuration. All you need is just to say it ‘run’.

Posted in Big Data, Java, Linux, Open Source, Software Engineering, Software Market Demands, Web | Comments Off on Wrapping a Big Data warehouse platform by Spring/REST

My Data Life

Less than a month ago I just came back to Tehran. Before it I was working in Toronto for a couple of months. There was a terrible perfect winter. For a man from middle-east, the always changing weather of Toronto is something weird. Moreover I was getting far from what I was looking for. Happily I just started again to work on those things are more interesting to me. May be the this fantastic spring and good weather are boosting my good feelings about everything.

I am currently work on three different projects, all in the bigdata area.

One of them has been started 20 days ago. It is about capturing huge amount of incoming log records and processing them applying Apache Hadoop, HBase and simple pattern recognition tectonics. The company needs to provide very fast responses on demand. Two years ago I just suggested them to use a Hadoop based solutions. I remember I’ve provided them a demonstration. The technical lead was thinking they could still use RDBMS for those purposes and the made it perfectly. I was lucky their business is blasting and they’ve reached the limitations very soon.

The second project is mine. I am working on a very interesting sentiment analysis project which supposed to gather simple data of user’s daily life and recommend them regarding it’s knowledge base. I need to learn Android programming to make it’s interface. it is cool.

The third one is an interesting project as well. It is about replacing an underlying document management system storage with HDFS.

Seems my professional life just dedicated to the data matters.

Posted in Big Data, Java, Linux, Machine Learning, Open Source | Leave a comment

Android Studio

I had a busy time during past months. Immigrating from Tehran to Toronto which was containing multiple relocating to different places was just a part of the story. Meantime I made a three months contract to join a small startup which was experiencing the latest phases of the project and was facing changing requirements on the fly. Hopefully they all finished now and I’ve got an opportunity to work on an idea of mine that is a part of my Master of Software Engineering thesis. Regrading preparing a mobile application for feeding a sentiment analysis service (hadoop/mahout), I just started to learn Android platform and it’s development best practices.

Getting familiar with the concepts and technologies which defined by Google, is just exciting. You see there is a mind behind them all. I don’t know if Micro Architecture affected by the Android’s App Component idea or it just happened by the market pressure. I just see Micro Architecture can be implemented by Android developers transparently.

It is very cool to start development with a solid practical approach of making departed reusable things as a whole unified single app. A wise demonstration of reusing software components.

This is something normal if a mechanical engineer fits a Toyota dynamo into a Chevrolet engine. May be a little machinery or welding be needed. This is not something new. While in the software world re-useability is still not much easy as we software engineers claim. Incompatible protocols or complex signatures raise the issue dramatically.  I am just looking into a certain car (app) hood and assumed REST/SOAP are some kind of outside app connections (not interconnection) . Android provided a wise solution to reduce this issue. While we’ve seen the same idea in reusing underlying services in the APIs by re-registering and reusing them (RPC, CORBA), but Google provided it much easier to use.

Providing an abstraction by well-defined naming inside/outside methods is the key. You won’t need to make a search using reflection or something to find the proper existed component which you don’t know it’s address in detail. A very high-level naming makes you able to find those things.

I also have the chance of developing Objective C long time a go. I confess that my mind is already biased by Java.  By the way as an experienced Java developer, it is easier for me to develop an Android app instead of an iOS one. While Apple’s monopole made it easier by defining a very limited number of possible hardware/platforms, this is still looks more interesting to develop for those target hardware/platform which you’ve never tried so far. While I am an Apple fan, I believe I would have a more growing market in Android open-source territory. I remember XCode was a little messy and confusing to me. I was looking for something like InttelijIDEA. The AppCode came finely. I don’t know how many iOS developers discovered it. There are two different view points to the appliction development. So I am not sure XCode accustomed developers be enough motivated to give the AppCode a try.

It could be a big deal to migrate from a convenient intelligent IDE such as IntelllijIDEA to something as classic/mature and fanatic as XCode. I think Java developers almost would have a good experience when they start using Android Studio.

I am checking if the Android Development became as much easy as playing with LEGO is. Will write about it more.

Posted in Java, Open Source, Software Engineering, Software Market Demands | 1 Comment

Verbous Log Files

During past decades programmers almost tried to simplify complex situations by translating phenomenons into the mapped numeric or logical very abstracted models. For example in an HRMS for showing that an employee is not at her desk, we may used to set a FALSE for “isWorkingRightNow” variable. While in the real world it could be explained with different meanings something like the followings:

  • “She didn’t come so far.”
  • “She would be late.”
  • “Se is arriving”.
  • “She is having lunch right now.”

This is amazing that machine is now enabled to use the verbose statements, much better than the TRUE/FALSE flags. I think this is the time for us to generate more verbose log files either.

It could be better to use log files as much rich as possible now. While boolean flags and restricted cutting binary or numerical variables where a big trick for modeling the real world boundaries into the small computers with tiny CPUs… This is the time to use huge arsenal of clusters so just let machines talk as we do.

Posted in Big Data, Java, Machine Learning | Leave a comment

Erlang Philosophy

Mike Williams, one of the three inventors of Erlang quoted the following as Erlang Philosophy:

  • Find the right methods/Design by Prototyping.
  • It is not good enough to have ideas, you must also be able to implement them and know [how|that] they work.
  • Make mistakes on a small scale, not in a production project.

Read more here.

I really enjoyed it. I think what he said is all about a rapid creation process.

If it takes Long, something is Wrong

What he mentioned is exactly one of the best practices that experienced guys do with other technologies either. This is not just an Erlang rule. We almost start with modeling by developing the real instance. The model would play the role of a seed which we improve it little by little.

Experts also implement their ideas right at the time they are thinking about. Having a look on what famous geeks made is interesting. Linus Torvalds has made Linux and GIT each one less than a month. While at the same time other professional teams were working years to made something like what Linus made in a month. Mark Zukerberg and facebook are the same story.

Implementing ideas, trims them by facing imaginations with reality. If the idea is as much complex as you can’t implement it easily, there could be something wrong within the idea itself or the way you implement it. A problem that has been understood is easy to the problem solvers.

Keep it Simple, Make it Working

Good ideas come to be existed quickly. If your idea can’t touch it’s real instances very soon, it could be better to change your mind. I accept there are complex things, but the way you implement it should be the simplest one.

For me the period is two months per project. I break bigger projects into two months packages or less. Each package should be a representative/standalone product. Don’t follow complex business scenarios if you starting up something new.  Just do the simplest thing.

Posted in Software Engineering, Software Market Demands | Leave a comment

Accelerating JEE apps applying ANSI C and C++

Java Enterprise is very easy to learn, develop and ultimately scalable. While it is so lazy and is not a good option for developing a high-throughput platform (because of unpredictable garbage collecting  interrupts and JVM burden).  While ANSI C takes more time for development/test and can’t be scale up as much easy as JEE.  ANSI C is just blasting fast.

It has been a while since I’ve started mixing them up both together to achieve scalability and performance both within a high-throughput solution. Creating objects and releasing them in Java is significantly more CPU/memory consuming in comparison with C. Managing hundred thousands of Java objects in a high-throughput concurrent solution would be a nightmare for every developer. While this is just a peace of cake for a wise C solution such as Redis.

A graph with more than twenty five thousand nodes and one hundred thousand edges, is such a serious thing in Java while, is actually nothing in ANSI C.

By the way I am a Java developer so I would still use Java as main platform. Moreover I do like it. Meantime accelerating Java apps by Redis or MongoDB, would be an answer to Java laziness.

 

Posted in Big Data, C++, Cloud Computing, Java, Linux, Networking, Open Source, Software Engineering | 2 Comments

An Item-Based Recommender Engine and Business Challenges

This is four weeks that a recommender engine I’ve developed based on Apache Mahout, is under pressure by real users and testing scenarios. During the time I’ve collected a number of business requirements notified there are some business demands that the standard technique won’t bring any answer for.

Seems this is not enough to just recommend users while business owners still are looking for a sensible advantage more than increasing users satisfaction. For example the following parameters might be applied to support the user and the business both (I am not sure if these are just Iranian business requirements or worldwide demands).

  • Expiration Date: A CEO believes the items are getting expired should be recommended more than those items don’t have the expiration date or won’t expire soon.
  • Classic Items: Time passing changes some items popularity. For example a sport or a political news should be recommended in a short period of time while there are many classic items which are time independent such as scientific reports, historic stories or poems. This classic items at least would be popular for a longer period of time.
  • Retired Items: Every product has a life cycle. The iPhone 5 has retired the iPhone 4 series softly. Business owners prefer to don’t keep the risky products. So they want the recommender offers them more.
  • Events: Calendar events could change the people taste. Assume a ceremony (We have got a lot of holidays/ceremonies in Iran) shifts people taste and everything would be done right after midnight while the recommender doesn’t have the chance of recommeding people correctly, because it relies on past weeks ingestions. There is always a lag.
  • Loyalty: Budget is an important factor to the users buy products. High degree of inflation pushes people to choose cheap low quality products. While many users prefer to choose cheap items, but recommending these items damages the recommender (business) credit and decreases the users loyalty.

I am currently working on these issues to fit it on a number of businesses requirements while I have no idea how Amazon.com, Netflix or other giants faced these issues.

Posted in Big Data, Java, Machine Learning, Open Source, Software Engineering, Software Market Demands | 2 Comments

What Your Visitors Are Looking For?

I am implementing a number of my ideas which are about providing big data processing and machine learning services mainly using HTTP. I just finished another one.
This is an introduction to a service which helps business owners to find out what is the taste of their websites visitors or customers. Moreover the service recommends visitors based on a machine learning technique which has been called Collaborative Filtering. I’ve called it “Next”.

Terminology
“Item”  is whatever the website provides for visitors such as a web content, a product in the products page of a company, an advertisement or anything else the business owner provides for the visitors.
“Visitor”  is an anonymous or authenticated user which surfs an online-shop, a news website, a technical blog a real-state website or whatever else. Visitor always is a customer or could be potentially. So we think she needs to be recommended by proper messages are interesting for her.

A “Recommender Engine” is what I developed based on a number of well-known techniques and open-source products which learn what the visitors do with the items. Then it recommends visitors the other items they might be interested to visit (read, see, or buy).
“Recommendation” is a message contains a list of items provided for the current visitor based on all users interactions.

Who Already Uses The Same Technology?

Amazon.com

  • Recommend additional books
  • Frequently bought together books

Pandora Radio

  • Plays music with similar characteristics
  • Content based filtering based on properties of song/artist
  • Based also on user’s feedback

Last.fm

  • Recommends songs by observing the tracks played by user and comparing to behavior of other users
  • Suggests songs played by users with similar interests
  • Collaborative filtering based on user’s previous ratings and watching behaviors (compared to other users)

Netflix

  • Predictions of movies

Stackoverflow.com

  • Recommends users during searching through questions and answers.

Many other businesses use this technique as well.

Scenarios
If you need what the mentioned companies do to recommend their users, and you don’t have such a team to develop an instance for your website then my solution would be interesting for you.
The recommender service provider doesn’t need to know anything about the visitors which surfing the website. Instead it just needs to mark each visitor by a unique id. It also doesn’t need to know anything about the item as well. A book, a mobile phone or a posted article, they all would be the same in the way it learns. A unique id (The URL) would be enough for recognizing each item at all.
By using this service, rating and comparing items (from the point of visitors view) would be available.
The following questions would be answered on demand:

  • Which items would be interesting for a certain visitor?
  • Which users might like an item more than others?
  • How much two certain items are close together (By the visitor’s point of view)?
  • Who would like to know about a certain new item?

How To Use It?
I’ve made it as a HTTP service. You won’t need to have any extra server, software, admin, knowledge or anything else. You just need to ask it. Adding a line of JavaScript in the item page is enough. The service responses in a number of formats such as HTML table, HTML List or JSON. That’s it.

Who Can Be Served?
The solution is pretty scalable and blazing fast. It means growing in number of visitors, items and incoming messages all would be handled by adding more machines to the cluster.

It is Growing Up
I am adding more administrative features such as Big Data Visualization and Content Based Recommendations algorithms while it already serves a candidate client.

Posted in Big Data, Cloud Computing, Java, Linux, Mac OS X, Machine Learning, Networking, Software Engineering, Software Market Demands, Web | Leave a comment