Skip navigation.

Nilesh Jethwa

Syndicate content
Updated: 13 hours 50 min ago

Trends in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards

Tue, 2014-12-02 10:34

How has the interest in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards changed over the years?

One easy way to gauge the interest is to measure how much news is generated for the related term and Google Trends allows you do that very easily.

After plugging all of the above terms in Google trends and further analysis leads to the following visualizations.

Aggregating the results by year

Image

 

It is very amazing to see that the stream representing Dashboards has remained constant through out the years.

So does the stream for Analytics and Business Intelligence in general exihibit similar trend.

Analytics is kind of widening its mouth as we move forward and that is being helped by the combination of terms such as Hadoop + Big Data + Analytics being used almost together.

Now check the line chart below

Image

 

Looks like the Trend for Dashboards define the lower bound and the trend for Business Intelligence define the upper bound. The trend for Hadoop started around first Quarter of 2007. The trend for Big Data started around third Quarter of 2008 and ever since they both are rapidly increasing. It remains to see whether they will cross “Business Intelligence” in terms of popularity of kind of merge and find a stable position somewhere in the middle.

Before Big Data and Hadoop came into picture the term “Analytics” exhibited a stable ground closer to dashboards but now the trend for Analytics seems to be following Big Data and Hadoop.

Let us take a deeper look into each week since 2004

Image

 

Look at the downward spikes occuring around Christmas time. Nobody wants to hear about Big Data or Dashboards during holidays.

And finally, here is a quarterly cyclical view

Image

Click here to view the full interactive Visualizations

Auto Sales Data Visualization by Manufacturer

Mon, 2014-12-01 14:42

Data: Edmunds

Image

 

Top Manufacturer

Image

 

Quarterly breakup of units sold by manufacturer

Image

 

View the interactive visualizations

Holiday Sales by category

Tue, 2014-11-25 13:26

Image

Fresh and Frozen Fruit consumption – U.S. Bureau of Labor Statistics

Thu, 2014-11-20 14:47

The south and the West consume highest amount of Fruits.

Image

Here is more individual breakdown by Quarterly expenditure on Fruits (figures in 100 million)

Image

Image

Image

Visualization on How the undergraduate tuition has increased over the years

Mon, 2014-11-17 12:54

Average undergraduate tuition and fees and room and board rates

Source: http://nces.ed.gov/

Image

These figures are inflation adjusted and look how just the tuition fees have increased compared to the Dorm and Board rates

Now comparing the rate increase for 2-year program

Image

So for the 2 year program, the board rates have remained at the same level compared to the dorm rates.

Now check out the interesting graph for 4 year program below

Image

 

Comparing the slope of 2 year Board rates to the 4 year Board rates, the 4 year has significant increase

Image

If price of meals is same for both programs then both 4 year and 2 year programs should have the same slope. So why is the 4 year slope different than 2 year?

Now, let see about the Dorm rates

Image

 

And finally the 4 year vs 2 year Tuition rates

Image

Here is the data table for the above visualization

50 years of killing Deer – Data visualization and analysis

Mon, 2014-11-03 10:04

Virginia maintains the summary of Deer kills way back from 1947

The stack bar gives a total view of the killings and how it has grown over the years

Image

 

By comparing the killings on a line chart we see that the female Deer killings has an uptick from 2008 onwards

Image

USA War Casualties

Fri, 2014-10-31 10:35

iCasualties.org maintains documented list of all fatalities for Iraq and Afghanistan wars.

Analysing the dataset for Afghanistan, we summarize the results by the year

NOTE: This contains only Afghanistan metrics. We will later update the visuals to reflect Iraq war.

 

Image

USA war fatalities by year

We are approaching the levels of 2002 and hope for the best that we don’t have to suffer another wars.

Here is another view by year and month

 

InfoCaptor : Analytics & dashboards

 

The dataset contains the age of each person died in the war so summarizing by Age

Image

War Deaths by Age

Checking it against the year

Image

Why so many young deaths between age 20 and 30 for the year 2014?

Image

Where did most of the deaths occur?

Image

 

Where were the soldiers from?

Image

Deaths by Rank

 

InfoCaptor : Analytics & dashboards

 

Cause of Death

Attack Types

Image

Image

Image

Helicopter Crash is the one of the top death cause in Non Hostile situations

github DMCA takedown notices in the rise!

Tue, 2014-10-28 04:54

GitHub maintains a list of all DMCA takedown notices along with counteractions and retractions if any.

Analysing all the notices from 2011, it seems that the takedown notices are on the rise.

Year View : Notice the sharp increase in 2014

Image

 

Quarterly view : Now looking at the quarterly breakup, seems like the takedowns are cooling off in the later quarters.

Image

 

So who is issuing these DMCA takedowns?

Here is the complete list of all companies who issued DMCA takedowns

Image

NOTE: The names were extracted from the description text

And here are the counteractions and retractions

Image

See the full list of companies with notice type

So the important question is “Why the DMCA takedown notices have increased?”

One important thing to note is sites like Stackoverflow encourage to replicate the content of the web page from where the original idea/algorithm or source code is copied from. To be honest it is a good thing because lot of times these referring sites become zombies and you don’t want to lose this knowledge. But could it be the case that such non-referenceable source codes end up in GitHub and hence causing the increase in the takedown notices as companies start discovering them?

Fastest growing and rapidly declining job industry

Mon, 2014-10-27 09:17

Data source : http://www.bls.gov/emp/tables.htm#occtables

 

Fastest growing job industry

Image

Original Visualization

Most rapidly declining job industry

Image

Rapidly declining jobs link

Top 100 analytics companies ranked and scored by Mattermark

Wed, 2014-10-22 09:12

Let us move on from Grass Eating Sauropods and talk about who’s who in the analytic space.

For every dime there are dozen analytic companies. Everybody who provides a freaking dashboard is an analytic company. Anybody that merely mentions Google, Facebook, Hadoop etc in the same sentence is somehow into BigData. Haven’t you stumbled across company pages where they claim to be expert in analytics and big data but they want you to schedule a call with them. They don’t have any products or solutions to show case yet they are Big Data/analytics folks.

So to make things easy, Mattermark released this highly curated list of 100 analytic companies. No offense to BigData, but small datasets like these are always juicy.

Image

 

Mattermakr ranks each company using their own algorithm and calls it “Mattermark Score”. After loading it up, we came up with these visualizations

 

InfoCaptor : Analytics & dashboards

 

 

For each funding stage, it shows the listing of companies by Mattermark score.

Some interesting questions

1. How many companies by funding stage?

Image

2. What is the funding by location and stage?

 

InfoCaptor : Analytics & dashboards

 

Another interesting visual by plotting the score against the total funding.

Image

 

We thought the above visual would tell us what kind of logic did Mattermark used to rank the companies. As suspected, apparently we cannot reverse engineer it without some additional information about the companies.

Y Combinator companies has more funding than the sum total of all remaining accelerators

Sun, 2014-10-19 13:05

After finishing our call with Bed Bugs , we decided to check out what the startup scene looks like. We used the data from seed-db to let our analytical juices flowing.

First we asked what is the top most program (duh!!) but by how much and who are next in the list and so on.

Like most Data scientists who believe in the power of simple bar graphs we used our first “chart weapon” of choice and here it is what it rendered.

Image

Y Combinator is freaking huge like a dinasaur, infact very much resembles the grass eating Sauropods. In fact we had to create a chart that was 3000 pixels wide just to accommodate all.

Image

See the resemblance between the chart and the Sauropod?

To get better perspective we rendered it in a Treemap as shown

Image

Looking at the treemap, Y Combinator occupies more than the sum total of all the remaining accelerators. That is super amazing but the problem our charts were not coming up beautiful. YC is clearly the outlier and was causing us difficulty to understand the remainder startup ecosystem.

We said, lets cut off the head to dig deeper.

The moment we filtered out YC from our analysis, all of the regions became colorful and that was certainly a visual treat.

Image

Now we could clearly see what are the other accelerators/programs that are roughly the same size.

For example,

TechStars Boulder and AngelPad are roughly the same

TechStars NYC, TechStars Boston and 500Startups are in the same club

Similarly DreamIT, fbFund and Mucker Lab share the same color.

Now let us try to see from the location angle

Image

So we re-established that YC is freaking huge and having them on a chart with other accelerators does not create beautiful visualizations.

Bed bugs in Boston – Analysis of Boston 311 public dataset

Fri, 2014-10-17 11:22

Digging into the Boston public Dataset can reveal interesting and juicy facts.

Even though there is nothing juicy about Bed bugs but the data about Boston open cases for Bed bugs is quite interesting and worth looking at.

We uploaded the entire 50 mb data dump which is around 500K rows into the Data Visualizer and filtered the category for Bed Bugs. Splitting the date into its date hierarchy components we then plotted the month on the Y axis.

InfoCaptor : Analytics & dashboards

It seems that the City of Boston started collecting this data around 2011 and has only partial data for that year.

Interestingly, the number of Bed bug cases seem to rise during the summer months.

Now if we break the lines into Quarters (we just add the quarter hierarchy to the mix)

InfoCaptor : Analytics & dashboards

Color Analysis of Flags – Patterns and symbols – Visualizations and Dashboards

Mon, 2014-10-13 13:54

Recently, here at InfoCaptor we started a small research on the subject of flags. We wanted to answer certain questions like what are the most frequently used colors across all country flags, what are the different patterns etc.

Read more Color and Pattern analysis on Flags of Countries – Simple visualization but interesting data

Data Visualization and Self Service Big Data Analytics

Wed, 2014-10-08 15:59

The innovation engine in the field of Business Intelligence and Data visualization tools , is certainly cranked up. Qlikview, Tableau and Tibco Spotfire introduced new category of Data Visualization term in the field of Business Intelligence.

Now every vendor offers some form of Data Discovery. Oracle is also working on something similar adding to their confusing mix of OBIEE stack.

With the launch of new InfoCaptor, you can perform ad-hoc data visualizations and build dashboards all within the browser. Now that is refreshing. The browser is the key here. Once you deploy on the server, users can simply login, upload their datasets or point to existing database connection. Before you know users are already slicing and dicing their datasets and swimming in the world of beautiful visualizations. Yes, the visualizations are absolutely stunning and why shouldn’t they be. It is based on the excellent d3js.org library.

The key here is that the browser is your canvas and it is pretty huge, for e.g the detfault size for the visuals takes up my entire browser screen real estate. I like big visuals and if I am producing a Trellis chart then I can simply drag the corners and resize it. The visualization library is very comprehensive and offers around 30 visuals. It provides the bullet graph as well for KPI tracking.

Here are some screenshots from the website

d3 visualizations

 

InfoCaptor is also available on the cloud as a service and based on that there are few live analysis to try out without login or installing anything.

I would say with this release small business owners have truly found their Tableau or Qlikview alternative.

Go check out the new InfoCaptor Data Visualizer