Comments for Big Data Partnership Unlock Value from Complex Data Thu, 05 Jun 2014 11:20:38 +0000 hourly 1 http://wordpress.org/?v=3.9.3 Comment on Partners by BIG DATA PARTNERSHIP SECURES SERIES A FUNDING from BERINGEA » Big Data Partnership /partners/#comment-24 Thu, 05 Jun 2014 11:20:38 +0000 http://bdp2.bigdatapartnership.com/?page_id=443#comment-24 […] Big Data Partnership is vendor-agnostic and partners with many of the world’s leading big data technology providers including Hortonworks, MapR Technologies, DataStax, Elasticsearch, IBM and Microsoft’s HDInsight. For a full list of Big Data Partnership’s partners please visit: /partners/ […]

]]>
Comment on Big Data Partnership Expands Hadoop Offerings for EMEA Enterprises by Big data SI teams with Intel to speed time to business value /intel-hadoop/#comment-17 Mon, 17 Feb 2014 13:36:14 +0000 /?p=2963#comment-17 […] Europe, one of the emerging leaders is London-based Big Data Partnership, which today unveils a partnership to deliver Intel’s Apache Hadoop distribution, adding to a line-up that already includes close alliances with Hortonworks, MapR and […]

]]>
Comment on Partners by Big Data Partnership Expands Hadoop Offerings for EMEA Enterprises - Big Data Partnership /partners/#comment-16 Mon, 17 Feb 2014 08:01:47 +0000 http://bdp2.bigdatapartnership.com/?page_id=443#comment-16 […] Big Data Partnership is vendor agnostic and partners with many of the world’s leading big data technology providers. For a full list of Big Data Partnership’s partners please visit: /partners/ […]

]]>
Comment on Techspace London expands Old Street coworking hub to meet start-up demand by Small Players In a Big Data World Big Data Partnership /techspace-london-expands-old-street-coworking-hub-meet-start-demand/#comment-15 Thu, 06 Feb 2014 23:31:30 +0000 /?p=2936#comment-15 […] Clawson | Forbes | Published: 10:03, 06 February […]

]]>
Comment on Big Data Consulting Services by Big Data Partnership Announces Partnership with Hortonworks - Big Data Partnership /services/#comment-12 Sat, 23 Nov 2013 17:50:36 +0000 http://demo.rocknrolladesigns.com/wp/jarvis/callouts/?page_id=48#comment-12 […] Professional Services […]

]]>
Comment on Clustering with Mahout by Praveenesh /clustering-with-mahout/#comment-7 Wed, 02 May 2012 02:58:45 +0000 http://bigdatapartnership.com/?p=1244#comment-7  @mbalija 
Thanks for the explanation. However, I understand the fact, tuning and choosing the right algorithm is a must and also a tricky thing to do.
My question is more focused on the iterative nature of the algorithms.
Given the fact, that mahout jobs when run on hadoop, will run as map-reduce jobs, and the more the iterations will be, the more time it will take(no matter how much tuning we do – there is a cost of M/R job execution that will always be there)
So I was just wondering, how people are tackling this situation. How can we use mahout clustering algorithms or any iterative algorithms in real time situations ?
Like mahout’s K-means clustering/LDA generally takes 2-5 minutes depending on the iterations( even on good hardware), so my question was more biased towards using iterative based algorithms in real time.
I understand the fact, that hadoop/mahout may not be build for the supporting real time scenarios, but as we are looking for making enterprise based applications on hadoop, I am just wondering are there some cool ways/work arounds to do the above things in effective manner or its still the long way to go.

]]>
Comment on Clustering with Mahout by mbalija /clustering-with-mahout/#comment-6 Wed, 02 May 2012 01:56:42 +0000 http://bigdatapartnership.com/?p=1244#comment-6  @Praveenesh Kumar
1) Mahout is a pack of many different algorithms. We should throughly understand the data and the problem before we choose any learning algorithm. Mahout has many built-in features which are useful in converting raw data to the required format. Also we can develop our own infrastructure to do that.
2) Best practices are not common for all kinds of data/problem, so it is always a good approach to tune the different parameters and evaluate the generated results and finally pick the best one among them. For general problems we can depend on some well defined approaches.

]]>
Comment on Clustering with Mahout by Praveenesh /clustering-with-mahout/#comment-5 Mon, 30 Apr 2012 09:32:31 +0000 http://bigdatapartnership.com/?p=1244#comment-5 Very nice introduction to clustering algorithms.
One question — How we can leverage Mahout algorithms for real-time data. As far as I understood, clustering is a 3 way process:
1. Vector generation
2. Creating inputs.
3. Running cluster algorithm
Is there some kind of best practices we can follow to make clustering effective/fast ?

]]>
Comment on Map Side and Reduce Side Joins by Praveenesh /map-side-and-reduce-side-joins/#comment-8 Mon, 30 Apr 2012 09:21:09 +0000 http://bigdatapartnership.com/?p=1308#comment-8 Very high level overview of how joining happens in Pig 0.8 onwards:
 
If ( Input can fit into memory):    
    if (no outer join on small input):        
    Replicated Join
if ( Both input data can be sorted on the join key )    
    if (no outer join)        
        Merge Join
    else if ( no skewed Join key)        
        Default Join    
    else        
       Skewed Join   
 
PS – Default, Skewed,Merge and Replicated are the types of Join Pig supports.

]]>
Comment on “Introducing YARN” – Hadoop No More a Baby Elephant by Praveenesh /introducing-yarn-hadoop-no-more-a-baby-elephant/#comment-9 Mon, 30 Apr 2012 08:57:52 +0000 http://bigdatapartnership.com/?p=1311#comment-9 I think, in the current version of hadoop 0.23, only capacity scheduler is supported so far. No support for Fair scheduler yet.
 

]]>