This thesis is devoted to the study of computing with unreliable resources, a paradigm emerging in a variety of technologies, such as circuit design, cloud computing, and crowdsourcing. In circuit design, as we approach the physical limits, semiconductor fabrication has been increasingly susceptible to fabrication flaws, resulting unreliable circuit components. In cloud computing, due to co-hosting, virtualization and other factors, the response time of computing nodes are variable. This calls for computation frameworks that take this unreliable quality-of-service into account. In crowdsourcing, we humans are the unreliable computing processors due to our inherent cognitive limitations.
We investigate these three topics in the three parts of this thesis. We demonstrate that it is often necessary to introduce redundancy to achieve reliable computing, and this needs to be carried out judiciously to attain an appealing balance between reliability and resource usage. In particular, it is crucial to take unreliability into account during system design, rather than to handle it as an afterthought.
In the first part, we investigate the topic of circuit design with unreliable circuit components. We first analyze the design of Flash Analog-to-Digital Converter (ADC) with imprecise comparators. Formulating this as a problem of scalar quantization with noisy partition points, we analyze fundamental limits on ADC accuracy and obtain designs that increase the yield of ADC (e.g., 5% to 10% for 6-bit Flash ADCs). Our results show that, given a fixed amount of silicon area, building more smaller and less precise comparators leads to better ADC accuracy. We then address the problem of digital circuit design with faulty components. To achieve reliability, we introduce redundant elements that can replace faulty elements via a reconfigurable interconnect. We show that the required number of redundant elements depends on the amount of interconnect available, and propose designs that achieve near-optimal trade-off between redundancy and interconnect overhead.
The second part of this thesis explores the problem of executing a collection of tasks in parallel on a group of computing nodes. This setting is often seen in cloud computing and crowd sourcing, and the response times of computing nodes are random due to their variability. In this case, the overall latency is determined by the response time of the slowest computing node, which is often much larger than the average response time. Task replication, which sends the same task to multiple computing nodes and obtains the earliest result, reduces latency, but in general incurs additional resource usage. We propose a theoretical framework to analyze the trade-off between latency and resource usage. We show that, while in general there is a tension between latency and resource usage, there exist scenarios where replicating tasks judiciously reduce both latency and resource usage simultaneously! Our investigation gives insights on when and how replication helps, and provides efficient scheduling policies for a variety of computing scenarios.
Lastly, we research the problem of crowd-based ranking via pairwise comparisons, with humans as unreliable comparators. We formulate this as the problem of approximate sorting with noisy comparisons. By developing a rate-distortion theory on permutation spaces, we obtain information-theoretic lower bounds for the query complexity of approximate sorting with both noiseless and noisy comparisons. Our lower bound shows the optimality of certain existing algorithms with respect to noiseless comparisons and provides a benchmark for approximate sorting with noisy comparisons.
Thesis Supervisor(s): Gregory Wornell
Thesis Committee: Yury Polyanskiy, Devavrat Shah