Comments for Big Data Partnership

Comment on Partners by BIG DATA PARTNERSHIP SECURES SERIES A FUNDING from BERINGEA » Big Data Partnership

Thu, 05 Jun 2014 11:20:38 +0000

[…] Big Data Partnership is vendor-agnostic and partners with many of the world’s leading big data technology providers including Hortonworks, MapR Technologies, DataStax, Elasticsearch, IBM and Microsoft’s HDInsight. For a full list of Big Data Partnership’s partners please visit: /partners/ […]

]]>

Comment on Big Data Partnership Expands Hadoop Offerings for EMEA Enterprises by Big data SI teams with Intel to speed time to business value

Mon, 17 Feb 2014 13:36:14 +0000

[…] Europe, one of the emerging leaders is London-based Big Data Partnership, which today unveils a partnership to deliver Intel’s Apache Hadoop distribution, adding to a line-up that already includes close alliances with Hortonworks, MapR and […]

]]>

Comment on Partners by Big Data Partnership Expands Hadoop Offerings for EMEA Enterprises - Big Data Partnership

Mon, 17 Feb 2014 08:01:47 +0000

[…] Big Data Partnership is vendor agnostic and partners with many of the world’s leading big data technology providers. For a full list of Big Data Partnership’s partners please visit: /partners/ […]

]]>

Comment on Techspace London expands Old Street coworking hub to meet start-up demand by Small Players In a Big Data World Big Data Partnership

Thu, 06 Feb 2014 23:31:30 +0000

[…] Clawson | Forbes | Published: 10:03, 06 February […]

]]>

Comment on Big Data Consulting Services by Big Data Partnership Announces Partnership with Hortonworks - Big Data Partnership

Sat, 23 Nov 2013 17:50:36 +0000

[…] Professional Services […]

]]>

Comment on Clustering with Mahout by Praveenesh

Wed, 02 May 2012 02:58:45 +0000

@mbalija
Thanks for the explanation. However, I understand the fact, tuning and choosing the right algorithm is a must and also a tricky thing to do.
My question is more focused on the iterative nature of the algorithms.
Given the fact, that mahout jobs when run on hadoop, will run as map-reduce jobs, and the more the iterations will be, the more time it will take(no matter how much tuning we do – there is a cost of M/R job execution that will always be there)
So I was just wondering, how people are tackling this situation. How can we use mahout clustering algorithms or any iterative algorithms in real time situations ?
Like mahout’s K-means clustering/LDA generally takes 2-5 minutes depending on the iterations( even on good hardware), so my question was more biased towards using iterative based algorithms in real time.
I understand the fact, that hadoop/mahout may not be build for the supporting real time scenarios, but as we are looking for making enterprise based applications on hadoop, I am just wondering are there some cool ways/work arounds to do the above things in effective manner or its still the long way to go.

]]>

Comment on Clustering with Mahout by mbalija

Wed, 02 May 2012 01:56:42 +0000

@Praveenesh Kumar
1) Mahout is a pack of many different algorithms. We should throughly understand the data and the problem before we choose any learning algorithm. Mahout has many built-in features which are useful in converting raw data to the required format. Also we can develop our own infrastructure to do that.
2) Best practices are not common for all kinds of data/problem, so it is always a good approach to tune the different parameters and evaluate the generated results and finally pick the best one among them. For general problems we can depend on some well defined approaches.

]]>

Comment on Clustering with Mahout by Praveenesh

Mon, 30 Apr 2012 09:32:31 +0000

Very nice introduction to clustering algorithms.
One question — How we can leverage Mahout algorithms for real-time data. As far as I understood, clustering is a 3 way process:
1. Vector generation
2. Creating inputs.
3. Running cluster algorithm
Is there some kind of best practices we can follow to make clustering effective/fast ?

]]>

Comment on Map Side and Reduce Side Joins by Praveenesh

Mon, 30 Apr 2012 09:21:09 +0000

Very high level overview of how joining happens in Pig 0.8 onwards:

If ( Input can fit into memory):
    if (no outer join on small input):
    Replicated Join
if ( Both input data can be sorted on the join key )
    if (no outer join)
        Merge Join
    else if ( no skewed Join key)
        Default Join
    else
       Skewed Join

PS – Default, Skewed,Merge and Replicated are the types of Join Pig supports.

]]>

Comment on “Introducing YARN” – Hadoop No More a Baby Elephant by Praveenesh

Mon, 30 Apr 2012 08:57:52 +0000

I think, in the current version of hadoop 0.23, only capacity scheduler is supported so far. No support for Fair scheduler yet.

]]>