HADOOP-BIGDATA

These are the various programs which i used for my hadoop projects. -------------Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task. Programming involves tasks such as: analysis, generating algorithms, profiling algorithms' accuracy and resource consumption, and the implementation of algorithms in a chosen programming language. The source code of a program is written in one or more languages that are intelligible to programmers, rather than machine code, which is directly executed by the central processing unit.-------------

In this regression task we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as it involves just two variables.

----**Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers. i have done several projects on hadoop using various platforms like mapreduce, hive , pig , spark ,flume etc. HDFS is a distributed file system that handles large data sets running on commodity hardware. It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes. HDFS is one of the major components of Apache Hadoop, the others being MapReduce and YARN.

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.

Apache Spark is an open-source program used for data analytics. It's part of a greater set of tools, including Apache Hadoop and other open-source resources for today’s analytics community. Experts describe this relatively new open-source software as a data analytics cluster computing tool.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
#TASK 6 --Prediction using decision tree algorithem.ipynb		#TASK 6 --Prediction using decision tree algorithem.ipynb
#Task 5 Exploratory Data Analysis sports .ipynb		#Task 5 Exploratory Data Analysis sports .ipynb
README.md		README.md
TASK1 prediction using suprvd learning.ipynb		TASK1 prediction using suprvd learning.ipynb
TASK2 K-means clustering..ipynb		TASK2 K-means clustering..ipynb
Task7 stock market.ipynb		Task7 stock market.ipynb
airport project.txt		airport project.txt
bookassgn3updated.txt		bookassgn3updated.txt
flag project.txt		flag project.txt
moviesproject.txt		moviesproject.txt
my first pow bi project1.pdf		my first pow bi project1.pdf
project details		project details
sqlassignment5.txt		sqlassignment5.txt
sqlproject.txt		sqlproject.txt
stack overflow project.txt		stack overflow project.txt
youtube project.txt		youtube project.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HADOOP-BIGDATA

About

Releases

Packages

Languages

besunny95/HADOOP-BIGDATA

Folders and files

Latest commit

History

Repository files navigation

HADOOP-BIGDATA

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages