Ming Fang |
Master of Science in Computer Science Research Assistant Carnegie Mellon Database Group School of Computer Science Carnegie Mellon University Email: mingf AT cs DOT cmu DOT edu Github: https://github.com/mindbergh |
I am a master student in School of Computer Science at Carnegie Mellon University. My interest is in building systems for managing and processing Big Data.
Research Projects:
Peloton
Worked with a team to build an experimental distributed main memory database designed for supporting both OLTP and OLAP workloads.- Ported PostgreSQL to C++11 to provide a clean and modern code base.
- Implemented a number of components in Peloton, including query plan mapper from PostgreSQL, query execution engine and multi-version concurrency control.
- Led team effort on profiling and optimizing the performance of the system.
- Maintained a continuous integration server for the project using Jenkins, including setting up building and testing framework, adding supports for code coverage and documentation.
- Deployed Spark Streaming and evaluated its capabilities of performing incremental computation for a number of applications.
Weiss
Developed a system-initiative, plan-based dialogue system that can discusses subjective information with users.- Built a daily processing pipeline in Python for scraping, transforming and loading human discussions from multiple sources.
- Designed and implemented the dialogue system as a web-based service with mobile application APIs using Django.
- Conducted user study and evaluated the efficacy of the system.
cuSpark
http://www.yaomuyang.com/cuspark/
Built a GPU-accelerated, functional parallel data processing framework, inspired by Spark.- Designed and implemented a number of abstract functional primitives for data processing.
- Developed GPU auto-partitioning mechanisms to allow larger than memory datasets.
- Implemented logistic regression and k-mean using cuSpark and achieved comparable performance with Thrust implementations.
- Won the 3rd Place out of 74 teams in the 2015 Parallelism Competition at CMU.
Selective Coursework:
Summer 2014
- 15-513 Introduction to Computer Systems
Fall 2014
- 15-641 Computer Networks
- 10-601 Machine Learning
- 18-342 Fundamentals of Embedded Systems
- 15-619 Cloud Computing
Spring 2015
- 15-618 Parallel Computer Architecture and Programming
- 15-615 Database Applications
- 15-640 Distributed Systems
- 15-210 Parallel and Sequential Data Structures and Algorithms
Fall 2015
- 15-605 Operating System Design and Implementation
- 15-412 Information Security & Privacy
Course Projects:
Pebbles - A Linux-like Operating System Kernel
Implemented a multi-process, multi-threaded, pre-emptible kernel for X86 32-bit processor that has the following components:- Virtual memory module using fixed 4K page size,
- Full sets of system calls including fork(), thread_fork(), exec(), vanish(), wait(), swexn(), deschedule(), yield(), new_pages(), remove_pages(), etc,
- Pluggable task scheduler with round-robin, randomized scheduling policy,
- User level thread library built on top of the system calls with full synchronization primitives supports, such as mutex, semaphore, read-write lock and condition variable
- User space hardware driver API
- Simple interprocess communication library
Twitter Analytics
Deployed a web service on Amazon AWS that responses a number of analytical queries for Twitter data using Undertow web framework. Processed 1TB of twitter data using Hadoop MapReduce and HBase with Amazon AWS.Liso - An HTTP/1.1 web server
Designed and implemented a HTTP 1.1 web server in C. which supports:- Concurrent connections based on select().
- Persistent HTTP/HTTPS connections via TLS.
- CGI interactions.
Congestion Control with BitTorrent
Implemented a p2p file transfer application on top of UDP with the following features:- Congestion control similar to TCP (slow start and congestion avoidance)
- Simultaneously download/upload different chunks of a file from/to different peers
- Discover available chunks via SHA
- Timeout for potentially crashed peers
Video CDN
Implemented a video content distribution network and ran simulations on different network topologies. The CDN includes:- A proxy that is able to adaptively select bit-rate according to network throughput.
- A simple DNS server supporting load balancing. The algorithms for load balancing includes Round Robin and OSPF.
- An accompanying DNS resolution library
Gravel - Real-Time Operating System Kernel for ARM processor
Implemented a multitask RTOS Kernel with the following features:- basic system calls (read, write, exit, sleep, time, mutex_lock, mutex_unlock, etc).
- context switching and rate monotonic task scheduling.
- timer and interrupt controller drivers.
- mutexes for concurrency control.
- priority inheritance using Highest Locker Priority Protocol.
- a Snake game based on system calls of the implemented RTOS.