Master of Science in Computer Science
Carnegie Mellon Database Group
School of Computer Science
Carnegie Mellon University
Email: mingf AT cs DOT cmu DOT edu
I am a master student in School of Computer Science at Carnegie Mellon University. My interest is in building systems for managing and processing Big Data.
- Ported PostgreSQL to C++11 to provide a clean and modern code base.
- Implemented a number of components in Peloton, including query plan mapper from PostgreSQL, query execution engine and multi-version concurrency control.
- Led team effort on profiling and optimizing the performance of the system.
- Maintained a continuous integration server for the project using Jenkins, including setting up building and testing framework, adding supports for code coverage and documentation.
- Deployed Spark Streaming and evaluated its capabilities of performing incremental computation for a number of applications.
- Built a daily processing pipeline in Python for scraping, transforming and loading human discussions from multiple sources.
- Designed and implemented the dialogue system as a web-based service with mobile application APIs using Django.
- Conducted user study and evaluated the efficacy of the system.
- Designed and implemented a number of abstract functional primitives for data processing.
- Developed GPU auto-partitioning mechanisms to allow larger than memory datasets.
- Implemented logistic regression and k-mean using cuSpark and achieved comparable performance with Thrust implementations.
- Won the 3rd Place out of 74 teams in the 2015 Parallelism Competition at CMU.
- 15-513 Introduction to Computer Systems
- 15-641 Computer Networks
- 10-601 Machine Learning
- 18-342 Fundamentals of Embedded Systems
- 15-619 Cloud Computing
- 15-618 Parallel Computer Architecture and Programming
- 15-615 Database Applications
- 15-640 Distributed Systems
- 15-210 Parallel and Sequential Data Structures and Algorithms
- 15-605 Operating System Design and Implementation
- 15-412 Information Security & Privacy
Pebbles - A Linux-like Operating System KernelImplemented a multi-process, multi-threaded, pre-emptible kernel for X86 32-bit processor that has the following components:
- Virtual memory module using fixed 4K page size,
- Full sets of system calls including fork(), thread_fork(), exec(), vanish(), wait(), swexn(), deschedule(), yield(), new_pages(), remove_pages(), etc,
- Pluggable task scheduler with round-robin, randomized scheduling policy,
- User level thread library built on top of the system calls with full synchronization primitives supports, such as mutex, semaphore, read-write lock and condition variable
- User space hardware driver API
- Simple interprocess communication library
Twitter AnalyticsDeployed a web service on Amazon AWS that responses a number of analytical queries for Twitter data using Undertow web framework. Processed 1TB of twitter data using Hadoop MapReduce and HBase with Amazon AWS.
Liso - An HTTP/1.1 web serverDesigned and implemented a HTTP 1.1 web server in C. which supports:
- Concurrent connections based on select().
- Persistent HTTP/HTTPS connections via TLS.
- CGI interactions.
Congestion Control with BitTorrentImplemented a p2p file transfer application on top of UDP with the following features:
- Congestion control similar to TCP (slow start and congestion avoidance)
- Simultaneously download/upload different chunks of a file from/to different peers
- Discover available chunks via SHA
- Timeout for potentially crashed peers
Video CDNImplemented a video content distribution network and ran simulations on different network topologies. The CDN includes:
- A proxy that is able to adaptively select bit-rate according to network throughput.
- A simple DNS server supporting load balancing. The algorithms for load balancing includes Round Robin and OSPF.
- An accompanying DNS resolution library
Gravel - Real-Time Operating System Kernel for ARM processorImplemented a multitask RTOS Kernel with the following features:
- basic system calls (read, write, exit, sleep, time, mutex_lock, mutex_unlock, etc).
- context switching and rate monotonic task scheduling.
- timer and interrupt controller drivers.
- mutexes for concurrency control.
- priority inheritance using Highest Locker Priority Protocol.
- a Snake game based on system calls of the implemented RTOS.