Ming Fang

Master of Science in Computer Science
Research Assistant
Carnegie Mellon Database Group
School of Computer Science
Carnegie Mellon University
Email: mingf AT cs DOT cmu DOT edu
Github: https://github.com/mindbergh

I am a master student in School of Computer Science at Carnegie Mellon University. My interest is in building systems for managing and processing Big Data.

Research Projects:



Worked with a team to build an experimental distributed main memory database designed for supporting both OLTP and OLAP workloads.
  1. Ported PostgreSQL to C++11 to provide a clean and modern code base.
  2. Implemented a number of components in Peloton, including query plan mapper from PostgreSQL, query execution engine and multi-version concurrency control.
  3. Led team effort on profiling and optimizing the performance of the system.
  4. Maintained a continuous integration server for the project using Jenkins, including setting up building and testing framework, adding supports for code coverage and documentation.
  5. Deployed Spark Streaming and evaluated its capabilities of performing incremental computation for a number of applications.



Developed a system-initiative, plan-based dialogue system that can discusses subjective information with users.
  1. Built a daily processing pipeline in Python for scraping, transforming and loading human discussions from multiple sources.
  2. Designed and implemented the dialogue system as a web-based service with mobile application APIs using Django.
  3. Conducted user study and evaluated the efficacy of the system.



Built a GPU-accelerated, functional parallel data processing framework, inspired by Spark.
  1. Designed and implemented a number of abstract functional primitives for data processing.
  2. Developed GPU auto-partitioning mechanisms to allow larger than memory datasets.
  3. Implemented logistic regression and k-mean using cuSpark and achieved comparable performance with Thrust implementations.
  4. Won the 3rd Place out of 74 teams in the 2015 Parallelism Competition at CMU.

Selective Coursework:

Summer 2014

  1. 15-513 Introduction to Computer Systems

Fall 2014

  1. 15-641 Computer Networks
  2. 10-601 Machine Learning
  3. 18-342 Fundamentals of Embedded Systems
  4. 15-619 Cloud Computing

Spring 2015

  1. 15-618 Parallel Computer Architecture and Programming
  2. 15-615 Database Applications
  3. 15-640 Distributed Systems
  4. 15-210 Parallel and Sequential Data Structures and Algorithms

Fall 2015

  1. 15-605 Operating System Design and Implementation
  2. 15-412 Information Security & Privacy

Course Projects:

Pebbles - A Linux-like Operating System Kernel

Implemented a multi-process, multi-threaded, pre-emptible kernel for X86 32-bit processor that has the following components:
  1. Virtual memory module using fixed 4K page size,
  2. Full sets of system calls including fork(), thread_fork(), exec(), vanish(), wait(), swexn(), deschedule(), yield(), new_pages(), remove_pages(), etc,
  3. Pluggable task scheduler with round-robin, randomized scheduling policy,
  4. User level thread library built on top of the system calls with full synchronization primitives supports, such as mutex, semaphore, read-write lock and condition variable
  5. User space hardware driver API
  6. Simple interprocess communication library

Twitter Analytics

Deployed a web service on Amazon AWS that responses a number of analytical queries for Twitter data using Undertow web framework. Processed 1TB of twitter data using Hadoop MapReduce and HBase with Amazon AWS.

Liso - An HTTP/1.1 web server

Designed and implemented a HTTP 1.1 web server in C. which supports:
  1. Concurrent connections based on select().
  2. Persistent HTTP/HTTPS connections via TLS.
  3. CGI interactions.

Congestion Control with BitTorrent

Implemented a p2p file transfer application on top of UDP with the following features:
  1. Congestion control similar to TCP (slow start and congestion avoidance)
  2. Simultaneously download/upload different chunks of a file from/to different peers
  3. Discover available chunks via SHA
  4. Timeout for potentially crashed peers

Video CDN

Implemented a video content distribution network and ran simulations on different network topologies. The CDN includes:
  1. A proxy that is able to adaptively select bit-rate according to network throughput.
  2. A simple DNS server supporting load balancing. The algorithms for load balancing includes Round Robin and OSPF.
  3. An accompanying DNS resolution library

Gravel - Real-Time Operating System Kernel for ARM processor

Implemented a multitask RTOS Kernel with the following features:
  1. basic system calls (read, write, exit, sleep, time, mutex_lock, mutex_unlock, etc).
  2. context switching and rate monotonic task scheduling.
  3. timer and interrupt controller drivers.
  4. mutexes for concurrency control.
  5. priority inheritance using Highest Locker Priority Protocol.
  6. a Snake game based on system calls of the implemented RTOS.