Technology → Big Data Partnership → Unlock Value from Complex Data

Posts Tagged:Technology

Map Side and Reduce Side Joins

Joins:- ======= Joins is one of the interesting features available in MapReduce. Joins performed by Mapper are called as Map-side Joins. Joins performed by Reducer can be treated as Reduce-side joins. Frameworks like Pig, Hive, or Cascading has support for performing joins. Before diving into the implementation let us understand the problem throughly. If we…

Read More →

Bloom Filter Vs Feature Hashing

Bloom Filter A Bloom filter is a space-efficient probabilistic data structure that is used to efficiently encode sets and perform set membership tests, whether an element is a member of a set. False positives are possible, but false negatives are strictly not possible. i.e. a query returns either “inside set (may be wrong)” or “definitely…

Read More →

Clustering with Mahout

Clustering Introduction:- Clustering is one of the most popular techniques available in Machine learning field. This allows the system to group numurous entities into separate clusters/groups based on certain characteristics/features of the entities. Clustering is a widely used technique in many grouping problems like grouping similar news articles, blogs, emails, malwares etc based on their…

Read More →

Back to Top