Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
-
Updated
Sep 3, 2024 - HTML
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
Using hadoop to utilize data from an automobile tracking platform that tracks the history of important incidents after the initial sale of a new vehicle.
Add the MapReduce codes in any language in defined folder to maintain a repository to help students learn Big Data
A simple project on the use of map and reduce in Hadoop.
Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.
A MapReduce implementation in python in a docker simulated distributed system
Short projects on UTDallas Big Data course C6350 using PySpark MapReduce and Graphframe library
An analysis of NYC Subway Data using Hadoop Map Reduce
PageRank algorithm using Hadoop Streaming
Lambda to start EMR and run a map reduce job
A distributed map-reduce implemented by Python 3 and gRPC
Hadoop Applications. In repo have Big Data tools like Spark(pyspark), HIVE(pyhive), Elastic Search, Oozie. I can use all these tools using python libraries after setup all the configration.
Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.
This repository have codes that extracts meaningful information from News headline data-set.
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
A Hadoop based Map-Reduce based SQL engine
Performing Map reduce to get the page rank on the WDC data.
Using mapreduce in hadoop and python to score sentiments
Modified from big-data-europe/docker-hadoop
Add a description, image, and links to the mapreduce-python topic page so that developers can more easily learn about it.
To associate your repository with the mapreduce-python topic, visit your repo's landing page and select "manage topics."