Skip to content

These are the various programs which i used for my hadoop projects.

Notifications You must be signed in to change notification settings

besunny95/HADOOP-BIGDATA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HADOOP-BIGDATA

These are the various programs which i used for my hadoop projects. -------------Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task. Programming involves tasks such as: analysis, generating algorithms, profiling algorithms' accuracy and resource consumption, and the implementation of algorithms in a chosen programming language. The source code of a program is written in one or more languages that are intelligible to programmers, rather than machine code, which is directly executed by the central processing unit.-------------

In this regression task we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as it involves just two variables.

----**Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers. i have done several projects on hadoop using various platforms like mapreduce, hive , pig , spark ,flume etc. HDFS is a distributed file system that handles large data sets running on commodity hardware. It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes. HDFS is one of the major components of Apache Hadoop, the others being MapReduce and YARN.

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.

Apache Spark is an open-source program used for data analytics. It's part of a greater set of tools, including Apache Hadoop and other open-source resources for today’s analytics community. Experts describe this relatively new open-source software as a data analytics cluster computing tool.

About

These are the various programs which i used for my hadoop projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published